How can I overwrite using new list?
Hi,
I want to overwrite one columns using new list but I don't know how to do that with Dataiku. I tried a lot cases with "prepare" but it was impossible.
Here is example.
a | 18 |
b | 9 |
c | 10 |
d | 7 |
e | 12 |
And new list is like
b | 15 |
e | 4 |
I want to make new list like
a | 18 |
b | 15 |
c | 10 |
d | 7 |
e | 4 |
It's really simple but ... How can I overwrite with Dataiku?
Answers
-
Hi @kishiken
In your case, there are two datasets, one with the original data (A) and another with the new values (B).
One way of achieving your desired results would be to create a join between the two.The join should be left (A - or right (B-A) join.
In the computed columns you should define a formula such as this one:
if(isBlank(B_value), A_value, B_value)
As you can see, it will use the new value from B if one is available, or the existing one.
Good luck!
-
Hi Liev,
Thank you very much for your quick answering! It's really helpful.I didn't know "computed columns"...! I tried it but it was impossible because my formula was invalid. For me, it's impossible to understand formula in Dataiku....
if(isBlank(B_value), A_value, B_value)
I named two files as A and B, and I typed same formula in the box however it was invalid.
What is the precisely formula text in case I name two dataset as A and B? -
if your datasets have two columns each:
- name
- value
in the join recipe, under selected columns section, give the datasets prefix, let's say a and b. This would make the results of the join fields to be called
- a_name
- a_value
- b_name
- b_value
In this case the formula under post-join computed columns would be
if(isBlank(b_value), a_value, b_value)
-
Hi Liev,
Thank you very much for your considerate answer.
I tried it but something went wrong...
Error message is "ERROR Invalid computed column: Operator cannot be translated to SQL: isBlank"
So, I changed "mode" from DSS formula to SQL expression.
The former error disappeared but anther error occurred.
"Failed to compute execution plan".
-
The reason for this is that this is indeed invalid SQL syntax.
You have two options:
- Under Mode leave it as DSS formula, but change the execution engine to DSS. This can be done under the RUN button of the recipe, by clicking on the little cogs.
- Under Mode use SQL, in which case your formula should change to something SQL compatible like:
COALESCE(b_value, a_value)