core designer certification

Options
emma123
emma123 Dataiku DSS Core Designer, Registered Posts: 5

Hello every one,

I am passing the core designer certification, and i have a problem with the step 3 : Merge the information from the three datasets into a single dataset. We recommend using the CO2_and Oil.csv dataset as the base (left dataset) for merging other datasets.

In fact, when I left join I don't have all the data from the datasets to have the 2008 from 2012 data ...

My question is, what are the steps to finish the step 3 and to have correctly the step 4 ...

Answers

  • JordanB
    JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 293 Dataiker
    Options

    Hi @emma123
    ,

    Please make sure that you have inferred the storage types on each input dataset. Note, you may have to select "check now or check again" for the infer types button to be clickable.

    Screenshot 2023-12-12 at 9.47.26 AM.png

    I've just gone through the steps without an issue, please read through the directions again.

    Note, you may need 2 post-filter conditions on the join recipe...one keeps only rows that satisfies conditions >= 2008 and <= 2012.

    Hope this helps!

    Thanks!

    Jordan

  • emma123
    emma123 Dataiku DSS Core Designer, Registered Posts: 5
    Options

    Hey, thanks for you response, i already did that ...

    It's juste when i am here there is a problem ... it only propose me the country 'Afghanistan', i dont know what to choose tho ...

    Capture d'écran 2023-12-12 161158.png

    Capture d'écran 2023-12-12 161252.png

    also, I checked that all the datasets sources are from 1800 to 2012. And when i left join I have only from 1800 to 1848, correspond to the afghanistan dates ...

  • JordanB
    JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 293 Dataiker
    Options

    Hi @emma123
    ,

    Can you check your dates? You mentioned "1800 to 2012. And when i left join I have only from 1800 to 1848, correspond to the afghanistan dates ...". Note, the dates are 2008 to 2012.

    Do not worry that you can only see Afghanistan, it is only showing a sample of your data here and it is currently sorted. You should not need to touch anything in the entities.

    All you need to do is:

    1. Add the 3rd dataset to join all three datasets:

    Screenshot 2023-12-12 at 11.35.37 AM.png

    2. Add the post-filter (>= 2008, <= 2012)

    Please then run the join recipe and you should see the correct results in the output dataset (985 rows, 11 columns).

    Thanks,
    Jordan

  • emma123
    emma123 Dataiku DSS Core Designer, Registered Posts: 5
    Options

    Thank you so much ! it works now !! but just a question for the step 5, when I report Oil production (Etemad & Luciana) (terawatt-hours), meat_prod_tonnes, and Food Balance Sheets: Eggs - Production (FAO (2017)) (tonnes), I dont have any data inside the colonn ... I puted this code for example for one column created :
    [["Oil production (Etemad & Luciana) (terawatt-hours)"]] / [[Population]]

    But I dont have any numbers inside the new colomn, and it's the same for the 2 others news columns ..

    Do you know why?

  • JordanB
    JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 293 Dataiker
    Options

    @emma123
    You need to look at "Hint 2" and select the link provided to find the syntax for using formulas on columns with spaces. Use the syntax shown in the link to create your formula.

  • emma123
    emma123 Dataiku DSS Core Designer, Registered Posts: 5
    Options

    I put that as you told me : numval([["Oil production (Etemad & Luciana) (terawatt-hours)"]]) / numval([["Population"]]). But again I don't have results in my columns ...

    It's still empty...

  • JordanB
    JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 293 Dataiker
    Options

    Hi @emma123
    ,

    You only need to use numval() on the column that has spaces (i.e. numval("Oil production (Etemad & Luciana) (terawatt-hours)")/Population)

    Thanks,

    Jordan

  • salimaazi
    salimaazi Dataiku DSS Core Designer, Registered Posts: 2
    Options

    hello community dataiku,

    i need your help, i don't get the number of row '985' rows like they say in note. i changed the storate type of year from string to int so i can add the postfilter year on my join recipe. the i started to create a left join between CO2_and Oil dataset and meat_and_eggs dataset with key join( code, entity and year) then i add join (left) between the result of the first join and the dataset URBANISATION AND POPULATION , i add post filter with various method year>= 2008 and year<=2012, / year is between 2008 and 2012. they all give me "208235" row ??? do you have any idea where is my error?? THANK YOU

Setup Info
    Tags
      Help me…