Want to Stop Rebuilding "Expensive" Parts of your Flow? Explicit Builds are the Answer!READ MORE

Let users create and publish custom prepare recipe processors

The prepare recipe in Dataiku is very powerful. It's the fastest way to script up complex data transformations by far. But sometimes, the processors are not quite enough. Whether I use Dataiku Formula Language, Python, or SQL, I often need to create custom steps in my recipes. While I can always create a totally separate code recipe to do the processing I'm looking for, it's usually more convenient to create code steps in a prepare recipe between other data preparation steps. This works well enough if the transformation is specific to my dataset, but sometimes, I need to execute the same transformation against many datasets. For this, it'd be very helpful to be able to create custom processors.

I'd like to be able to write some code that can hook into the script step UI elements- for example, the column selector and configuration radio buttons, then publish it so it's available for everyone in my team to use, listed in the processors library in the prepare recipe. That way, I can use it everywhere. It'd also be great if these custom processors could be added via plugin or otherwise installed from the internet so useful processors could be contributed back to the larger community and made accessible for everyone. This would make the prepare script immensely more powerful in enterprises where processors specific to a company's needs can be built and distributed throughout organizations. Among the many use-cases, I could imagine the processors hooking into internal APIs (one of my main uses of Python processors today) and simplifying otherwise complex tasks. 

I'd also like to be able to save specific configurations of existing processors to be reused. For example, if I write regex that can extract patterns specific to my company's data, I'd love to be able to save those patterns into a library and recall them later, for example. I'd love the same feature in find and replace. For the tokenize text processor, I'd like to be able to define and publish a custom tokenizer. Etc. etc.

I think these features could be a really powerful addition to the prepare recipe that, combined with Enable full Dataiku API in shaker script Python processors - Dataiku Community would allow for compounding orders of magnitude more productivity embedded in a single step.

5 Comments
tgb417
Neuron
Neuron

Here is a related conversation about flow reuse.

https://community.dataiku.com/t5/Using-Dataiku/Flow-Zone-Reuse-Can-one-flow-zone-be-reused-from-mult...

Dataiku staff recommended the use of application recipes.  This turned out to be complicated to understand and implement. I’ve still not succeeded in setting up my use case using this approach.  

Turribeach
Level 6

This is a very interesting idea but I wonder why it hasn't been aknowledged by Dataiku yet. 

ktgross15
Dataiker
Dataiker

Hi @natejgardner !

You actually already can create plugins for custom prepare recipe processors: https://doc.dataiku.com/dss/latest/plugins/reference/preparation.html

Let me know if this solves your use case.

Best,

Katie

CoreyS
Community Manager
Community Manager
Status changed to: Needs Info
 
natejgardner
Neuron
Neuron

This looks promising, I'll give it a try and write back