Business Analyst Quick Start: ERROR while merging datasets
Hello,
I am following the Business Analyst course in the Academy, and while I am following the prompts from the tutorial, I have run into an error merging the datasets, and unable to create the recipe:
Select the web_last_year dataset from the Flow.
Choose Join with from the “Visual recipes” section of the Actions sidebar near the top right of the screen.
Choose crm_last_year as the second input dataset.
Name the output dataset training_data instead of the default name.
Leave the default options for “Store into” and “Format”.
If you are using Dataiku Online, these options will be “dataiku-managed-storage” and “Parquet”, as opposed to “filesystem_managed” and “CSV” if you are using a local instance.
Create the recipe.
I have started this project twice because I thought I did something wrong, but I don't know what could possibly be wrong and how to fix it. Any help would be great!
Answers
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
Hi @willow25
and welcome to the Dataiku Community. Sorry for the trouble you are having but let me see how we can help. To confirm, I completed this same step in the Quick Start and did not produce the same error as you:Below is the video which walks through all of the corresponding steps for the Join Recipe. If you haven't already, can you watch the video and confirm if any of these steps are different from how you executed it?
//play.vidyard.com/47Vtt3painYKXWYMzJsNy3.html?
Sometimes it may just help to visualize the steps (I know it helps me.) I hope this helps!
-
@CoreyS
I appreciate the video. I think I should probably mention that I have the online version.Nonetheless, I followed their recommended setting and got the error every time:
Leave the default options for “Store into” and “Format”.
If you are using Dataiku Online, these options will be “dataiku-managed-storage” and “Parquet”, as opposed to “filesystem_managed” and “CSV” if you are using a local instance
So I tried your setting and still got the same error message.
I do not have any other option than ' dataiku managed storage' .
-
Hi @willow25
, Nancy from the Dataiku Academy team hereYour error has do to with a recent small UI change in Dataiku Online that our team was also not aware of until now, so thank you for flagging this!
The reason you're getting an error is because the Dataiku Online UI no longer auto-detects and fills the join condition: previously, it would automatically detect and set the recipe to use the "customer_id" column in web_last_year and the "customerid" column in crm_last_year as the join keys, but now it's no longer detecting it, so the join condition is empty by default.
This means that in order to complete this step of the tutorial, you will need to set up the join condition yourself. To do so:1) While on the "Join" step of the recipe, click the blue "Add a Condition" button.
2) Click the "+" button to add a condition.
3) Normally, at this point Dataiku should auto-detect and fill the join condition with "customer_id = customerid", but if it doesn't, or if it detects the wrong columns, you can click on the "=" sign and select these columns as the join keys from the dropdown menus.
Once you've done this, the error should no longer appear and you should be able to move on with the tutorial.
Hope this helps, let me know if it works out or if you have any more questions!
Best,Nancy