Announcing the winners & finalists of the Dataiku Frontrunner Awards 2021! Read their inspiring stories

Prepare Recipe Custom Step & Spark

lina
Level 1
Prepare Recipe Custom Step & Spark

Hi,

We are working on a custom prepare recipe step that adds a user-input row to the dataset. It's working on local DSS. However, when tested on Spark, it adds a row for each file the dataset is partitioned into.

For Example, the dataset is stored in 10 HDFS files, using the recipe step adds 10 duplicated rows instead of 1. Is there a way to bypass this other than converting the code into a visual recipe?

Thanks

0 Kudos
1 Reply
SarinaS
Dataiker
Dataiker

Hi @lina,

If you are still curious, I think it would be easiest for us to help if you could open a support ticket or a chat via the chat box and attach a job diagnostic of both the local execution of the job and the Spark execution of the job so that we can take a look. 

Thank you,
Sarina

0 Kudos
A banner prompting to get Dataiku DSS