Reflect table schema changes in Dataiku flow

Options
jlbellier
jlbellier Registered Posts: 22

Hello everybody,

I face to an important problem that occured in production environment t last night.

I have a table <my_table> with a column (let's call it <my_column>) whose type was defined in PostGreSQL as character varying(250).

We have a process collecting data from an API and store them in &lt;my_table&gt;. Some users put information that is supposed to be stored in <my_colum> with a size exceedind 250 characters (it is a path to a file).

So what I did is the following :

1) I updated the column size using alter table &lt;my_table&gt; alter column &lt;my_column&gt; ...

2) In my Dataiku flow, I propagated the schema change from the dataset defined by &lt;my_table&gt; (let's call it DS1) , and I restarted the first recipe with input DS1 and with output dataset DS2.

After this run I got the following message :

ERROR: value too long for type character varying(250) Where: DS2, line 546, column &lt;my_column&gt;: "bla bla bla..."

That lets me think that the size of &lt;my_column&gt; has not been propagated properly.
I also edit the column schema by selecting it in the view and choosing and validated the schema without any change (type string, Auto-detect) and propagated the schema again.
I get the same result.

Is there a way to update such changes or do I need to create a new dataset DS1_1 based on &lt;my_table&gt; and plug DS1_1 instead of DS1 ?

Thank you in advance for your help.

Best regards,
Jean-Luc.


Operating system used: Windows


Operating system used: Windows

Answers

  • jlbellier
    jlbellier Registered Posts: 22
    Options

    Hello again,

    I come back to you with new information, because this problem is urgent (the production data cannot be updated).


    By looking at the job results, the target table of the recipe is created with the old column size (i.e. character varying(250)), whereas the size of the same column in the source tablehas been increased.

    What I tested is to create a new source dataset (with the new size) to DS1 and I copied the recipe R to R1 and plugged R1 to DS1, with a new target dataset for R1.
    I still have the same error when I execute R1 : Caused by: org.postgresql.util.PSQLException: ERROR: value too long for type character varying(250)

    Is there something like a cache (the source table has the same name, but schema was updated) or are there limitations on the size of columns when we use 'string' and 'Auto-detect' ?

    Best regards,

    Jean-Luc.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @jlbellier
    ,

    For an urgent production issue I strongly suggest that you submit a support ticket.

    https://support.dataiku.com/support/home

    There are really great folks working in the support team at Dataiku. They tend to get back to me promptly and help me sort the issue.

    With postgresql back in the day I have seen some limitations with the length of the query that dataiku tries to uses to get my results. Recently I have also run into problems with the width of the column names I was trying to create with dataiku that present in a similar way.

    That said I’d encourage you to get a support ticket going if you have not already done so.

Setup Info
    Tags
      Help me…