Partitioning in Dataiku DSS - Extended Q&A

CoreyS · May 2020

If you were not able to join us for the latest Online Event on Partitioning in Dataiku DSS, @Malick-K
answered a number of great questions from attendees about Partitioning. However, he was not able to answer every single question. So to ensure we leave no stone unturned, here are some of the questions that were asked and could not be answered due to time constraints:

1. Is Malick running jobs on his local computer with the DSS engine or remotely with some other engine?

→ Yes the demo was made on my local engine. Some of the recipes used the posgreSQL engine of a local database whereas others used the DSS’s engine.

2. If I have multiple txt files as input, which are named different, is partitioning also possible?

→ The most important in order to partition files is to have them stored in a way that allows to identify a pattern distinguishing a file from each partition. The name itself is not so important if these files are not stored exactly in the same folder. If that’s the case (all your files are stored exactly in the same folder + they don’t have a name allowing to discriminate them depending on the partitions) maybe a workaround could be to have a first python script reading these files in order to find the information allowing to store this files in other folders (here I’m talking about “DSS folders” : that’s important to keep the original files intact or stored in the same location if you are not the person who put them here) which would be, this time, structured with the information of their partitions of belonging.

3. Is it possible to factorize Recipes (same Recipe with multiple [input,output] pairs) ?

→ It is possible, by default, to “copy” a given recipe. This way it is possible to change its output(s). If you then want to change the inputs you will need to edit the input(s)/output(s) section of the recipe.

Partitioning in Dataiku DSS - Extended Q&A

Categories

Setup Info

Tags