Core Practitioner

Shivoy
Shivoy Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1 ✭✭✭

While joining the CO2 , Meat Production and Urbanization Dataset - I am not getting a output - recurring error of "CSV quoting style Unix is not supported for a hive table". How do I bypass this error

Answers

  • Sean
    Sean Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer Posts: 168 Dataiker

    Hi @Shivoy
    , a few things to check:

    - Have you uploaded all of the datasets correctly?

    - Do the data types of the columns you are joining match? You might need to infer types from schema before joining.

    Can you send a screenshot, a job diagnosis, and tell us what kind of instance you are using?

  • S18
    S18 Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 2 Partner

    Hi,

    I am working through the Core Designer certificate and facing similar issue.

    I am trying to join 3 datasets using Country, Year and Code as Key join columns. There should be 5 rows per country on joining.

    However, Dataiku is creating a single row for each year and assigns duplicate values. I am including a snapshot below. I tried the same recipe in a cropped dataset and it works fine.

    S18_0-1662604877593.png

    What am I doing wrong? I would appreciate any suggestion.

    Thanks!

  • Sean
    Sean Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer Posts: 168 Dataiker

    Hi @S18
    , if you look at your screenshot, you can see that you have multiple rows for a single year of a single country when you know in the output you want to have one row per country per year.

    This suggests something has gone wrong when joining the datasets. I'd suggest returning to the Join recipe and examining what happens, thinking about the data you have. You might want to join two datasets first to make sure you understand the output, and then return to add the third dataset in a similar way.

  • S18
    S18 Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 2 Partner

    @SeanA

    Thanks for your reply . I already tried what you suggested and the join recipe works fine for the first two datasets. But when I join the third dataset it fails. I tried the inner join as well as the left join when adding the third dataset. Nothing works.

    I singled out the first country for the three datasets and tried the join on these cropped dataset and it worked great. But not on the main dataset.

  • Sean
    Sean Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer Posts: 168 Dataiker

    Hi @S18
    , I can understand the temptation when the output is not giving the expected result to try different combinations or settings. Ultimately though, you'll need to think through why you're getting the wrong output based on the data and settings you are supplying. I'm afraid there's not much help I can offer other than to encourage you to stick with it and think carefully about how to join these three datasets. And use the hints provided, for example "We recommend using the CO2_and Oil.csv dataset as the base (left dataset) for merging other datasets."

Setup Info
    Tags
      Help me…