joins in spark engine
when I use Left join with Join recipe using Spark engine and parquet datasets, it is not giving expected results.
I have 2M records in left table and 1 M in right table, but the result is only 2 record.
Answers
-
Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
Hi @Mahesh_M
,
Can you please open a support ticket, and attach a job diagnostic for the job that results in the unexpected number of output rows? In addition, can you make sure to attach screenshots for both input datasets highlighting several rows of data that you expected to join together based on the join condition that then do not appear in the output dataset? That should help with troubleshooting!
Thanks,
Sarina -
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Definitely, followup with support.
-
Hi,
I am facing same issue. I am using Dataiku 9.0.3 version.
If your issue is resolved. can you please post resolution here.
-
Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
Hi @vinayshankar
,
The issue is likely specific to your data and join setup. If you would like us to review it, please also open a support ticket, and attach a job diagnostic for the job that results in the unexpected number of output rows? In addition, can you make sure to attach screenshots for both input datasets highlighting several rows of data that you expected to join together based on the join condition that then do not appear in the output dataset? That should help with troubleshooting!
Thanks,
Sarina