joins in spark engine

Options
Mahesh_M
Mahesh_M Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1 ✭✭✭

when I use Left join with Join recipe using Spark engine and parquet datasets, it is not giving expected results.

I have 2M records in left table and 1 M in right table, but the result is only 2 record.

Tagged:

Answers

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 315 Dataiker
    Options

    Hi @Mahesh_M
    ,

    Can you please open a support ticket, and attach a job diagnostic for the job that results in the unexpected number of output rows? In addition, can you make sure to attach screenshots for both input datasets highlighting several rows of data that you expected to join together based on the join condition that then do not appear in the output dataset? That should help with troubleshooting!

    Thanks,
    Sarina

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @Mahesh_M
    ,

    Definitely, followup with support.

  • vinayshankar
    vinayshankar Partner, Registered Posts: 2 Partner
    Options

    Hi,

    I am facing same issue. I am using Dataiku 9.0.3 version.

    If your issue is resolved. can you please post resolution here.

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 315 Dataiker
    Options

    Hi @vinayshankar
    ,

    The issue is likely specific to your data and join setup. If you would like us to review it, please also open a support ticket, and attach a job diagnostic for the job that results in the unexpected number of output rows? In addition, can you make sure to attach screenshots for both input datasets highlighting several rows of data that you expected to join together based on the join condition that then do not appear in the output dataset? That should help with troubleshooting!

    Thanks,
    Sarina

Setup Info
    Tags
      Help me…