core designer certification
Hello every one,
I am passing the core designer certification, and i have a problem with the step 3 : Merge the information from the three datasets into a single dataset. We recommend using the CO2_and Oil.csv dataset as the base (left dataset) for merging other datasets.
In fact, when I left join I don't have all the data from the datasets to have the 2008 from 2012 data ...
My question is, what are the steps to finish the step 3 and to have correctly the step 4 ...
Answers
-
JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 297 Dataiker
Hi @emma123
,Please make sure that you have inferred the storage types on each input dataset. Note, you may have to select "check now or check again" for the infer types button to be clickable.
I've just gone through the steps without an issue, please read through the directions again.
Note, you may need 2 post-filter conditions on the join recipe...one keeps only rows that satisfies conditions >= 2008 and <= 2012.
Hope this helps!
Thanks!
Jordan
-
Hey, thanks for you response, i already did that ...
It's juste when i am here there is a problem ... it only propose me the country 'Afghanistan', i dont know what to choose tho ...
also, I checked that all the datasets sources are from 1800 to 2012. And when i left join I have only from 1800 to 1848, correspond to the afghanistan dates ...
-
JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 297 Dataiker
Hi @emma123
,Can you check your dates? You mentioned "1800 to 2012. And when i left join I have only from 1800 to 1848, correspond to the afghanistan dates ...". Note, the dates are 2008 to 2012.
Do not worry that you can only see Afghanistan, it is only showing a sample of your data here and it is currently sorted. You should not need to touch anything in the entities.
All you need to do is:
1. Add the 3rd dataset to join all three datasets:
2. Add the post-filter (>= 2008, <= 2012)
Please then run the join recipe and you should see the correct results in the output dataset (985 rows, 11 columns).
Thanks,
Jordan -
Thank you so much ! it works now !! but just a question for the step 5, when I report Oil production (Etemad & Luciana) (terawatt-hours), meat_prod_tonnes, and Food Balance Sheets: Eggs - Production (FAO (2017)) (tonnes), I dont have any data inside the colonn ... I puted this code for example for one column created :
[["Oil production (Etemad & Luciana) (terawatt-hours)"]] / [[Population]]But I dont have any numbers inside the new colomn, and it's the same for the 2 others news columns ..
Do you know why?
-
I put that as you told me : numval([["Oil production (Etemad & Luciana) (terawatt-hours)"]]) / numval([["Population"]]). But again I don't have results in my columns ...
It's still empty...
-
hello community dataiku,
i need your help, i don't get the number of row '985' rows like they say in note. i changed the storate type of year from string to int so i can add the postfilter year on my join recipe. the i started to create a left join between CO2_and Oil dataset and meat_and_eggs dataset with key join( code, entity and year) then i add join (left) between the result of the first join and the dataset URBANISATION AND POPULATION , i add post filter with various method year>= 2008 and year<=2012, / year is between 2008 and 2012. they all give me "208235" row ??? do you have any idea where is my error?? THANK YOU