How can I overwrite using new list?

Options
kishiken
kishiken Partner, Registered Posts: 3 Partner

Hi,

I want to overwrite one columns using new list but I don't know how to do that with Dataiku. I tried a lot cases with "prepare" but it was impossible.

Here is example.

a18
b9
c10
d7
e12

And new list is like

b15
e4

I want to make new list like

a18
b15
c10
d7
e4

It's really simple but ... How can I overwrite with Dataiku?

Answers

  • Liev
    Liev Dataiker Alumni Posts: 176 ✭✭✭✭✭✭✭✭
    edited July 17
    Options

    Hi @kishiken

    In your case, there are two datasets, one with the original data (A) and another with the new values (B).
    One way of achieving your desired results would be to create a join between the two.

    The join should be left (A - B) or right (B-A) join.

    In the computed columns you should define a formula such as this one:

    if(isBlank(B_value), A_value, B_value)

    As you can see, it will use the new value from B if one is available, or the existing one.

    Good luck!

  • kishiken
    kishiken Partner, Registered Posts: 3 Partner
    edited July 17
    Options

    Hi Liev,
    Thank you very much for your quick answering! It's really helpful.

    I didn't know "computed columns"...! I tried it but it was impossible because my formula was invalid. For me, it's impossible to understand formula in Dataiku....

    if(isBlank(B_value), A_value, B_value)

    I named two files as A and B, and I typed same formula in the box however it was invalid.


    What is the precisely formula text in case I name two dataset as A and B?

  • Liev
    Liev Dataiker Alumni Posts: 176 ✭✭✭✭✭✭✭✭
    edited July 17
    Options

    if your datasets have two columns each:

    • name
    • value

    in the join recipe, under selected columns section, give the datasets prefix, let's say a and b. This would make the results of the join fields to be called

    • a_name
    • a_value
    • b_name
    • b_value

    In this case the formula under post-join computed columns would be

    if(isBlank(b_value), a_value, b_value)
    

  • kishiken
    kishiken Partner, Registered Posts: 3 Partner
    Options

    Hi Liev,

    Thank you very much for your considerate answer.

    I tried it but something went wrong...

    Error message is "ERROR Invalid computed column: Operator cannot be translated to SQL: isBlank"

    So, I changed "mode" from DSS formula to SQL expression.

    The former error disappeared but anther error occurred.

    "Failed to compute execution plan".

  • Liev
    Liev Dataiker Alumni Posts: 176 ✭✭✭✭✭✭✭✭
    edited July 17
    Options

    The reason for this is that this is indeed invalid SQL syntax.

    You have two options:

    - Under Mode leave it as DSS formula, but change the execution engine to DSS. This can be done under the RUN button of the recipe, by clicking on the little cogs.

    - Under Mode use SQL, in which case your formula should change to something SQL compatible like:

    COALESCE(b_value, a_value)
    

Setup Info
    Tags
      Help me…