Empty dataset after running prepare, filter and Python recipe.

Options
vinayk
vinayk Partner, Dataiku DSS Core Designer, Registered Posts: 12 Partner

Hello Team,

I have loaded the data from snowflake into Dataiku, now I want to filter the data for specific country, When I used prepare or filter or Python recipe the out put dataset is showing empty rows, where in prepare recipe we can some rows for that specific country.

This is first time facing this issue, can you please tell me what can be issue, why output dataset have 0 rows.

Thanks in advance.

Answers

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 315 Dataiker
    Options

    Hi @vinayk
    ,

    I think it would be easiest if we could take a look at a job diagnostic for the job(s) that are resulting in the empty output datasets. I would also suggest doing just a simple sync recipe or a prepare recipe with no steps just to verify if the sync/empty prepare recipe results in expected output data. It would also be helpful to see some screenshots of the input and output datasets showing the data that should be included in the output dataset. In order to pass along the job diagnostic, you can either chat with us via the "chat" icon on our website, or open a support ticket with us.

    Thank you!
    Sarina

  • vinayk
    vinayk Partner, Dataiku DSS Core Designer, Registered Posts: 12 Partner
    Options

    Thank you @SarinaS
    , it was because of the schema issue, sync recipe helped.

  • vinayk
    vinayk Partner, Dataiku DSS Core Designer, Registered Posts: 12 Partner
    Options

    @SarinaS
    But after the schema issue, it is taking more time to filter the data for specific country, is there any way to reduce the timing of filtering the data, the data has 80 million records.

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 315 Dataiker
    Options

    Hi @vinayk
    ,

    Thank you for the update, I'm glad that it seems like you were able to resolve the schema issue! For the performance, I do think we need to look at a job diagnostic for the job to see if any performance improvements can be made. If you want, you can open a chat on our website with us or a support ticket to send us the diagnostic.

    That will allow us to review all of your settings. But for a quick understanding, I understand that your input dataset is a Snowflake dataset. What type of output dataset are you using? And what engine is your recipe running on?

    Thanks,
    Sarina

Setup Info
    Tags
      Help me…