Let users create and publish custom prepare recipe processors

The prepare recipe in Dataiku is very powerful. It's the fastest way to script up complex data transformations by far. But sometimes, the processors are not quite enough. Whether I use Dataiku Formula Language, Python, or SQL, I often need to create custom steps in my recipes. While I can always create a totally separate code recipe to do the processing I'm looking for, it's usually more convenient to create code steps in a prepare recipe between other data preparation steps. This works well enough if the transformation is specific to my dataset, but sometimes, I need to execute the same transformation against many datasets. For this, it'd be very helpful to be able to create custom processors.

I'd like to be able to write some code that can hook into the script step UI elements- for example, the column selector and configuration radio buttons, then publish it so it's available for everyone in my team to use, listed in the processors library in the prepare recipe. That way, I can use it everywhere. It'd also be great if these custom processors could be added via plugin or otherwise installed from the internet so useful processors could be contributed back to the larger community and made accessible for everyone. This would make the prepare script immensely more powerful in enterprises where processors specific to a company's needs can be built and distributed throughout organizations. Among the many use-cases, I could imagine the processors hooking into internal APIs (one of my main uses of Python processors today) and simplifying otherwise complex tasks. 

I'd also like to be able to save specific configurations of existing processors to be reused. For example, if I write regex that can extract patterns specific to my company's data, I'd love to be able to save those patterns into a library and recall them later, for example. I'd love the same feature in find and replace. For the tokenize text processor, I'd like to be able to define and publish a custom tokenizer. Etc. etc.

I think these features could be a really powerful addition to the prepare recipe that, combined with Enable full Dataiku API in shaker script Python processors - Dataiku Community would allow for compounding orders of magnitude more productivity embedded in a single step.

7 Comments

Here is a related conversation about flow reuse.

https://community.dataiku.com/t5/Using-Dataiku/Flow-Zone-Reuse-Can-one-flow-zone-be-reused-from-mult...

Dataiku staff recommended the use of application recipes.  This turned out to be complicated to understand and implement. I’ve still not succeeded in setting up my use case using this approach.  

--Tom

Here is a related conversation about flow reuse.

https://community.dataiku.com/t5/Using-Dataiku/Flow-Zone-Reuse-Can-one-flow-zone-be-reused-from-mult...

Dataiku staff recommended the use of application recipes.  This turned out to be complicated to understand and implement. I’ve still not succeeded in setting up my use case using this approach.  

This is a very interesting idea but I wonder why it hasn't been aknowledged by Dataiku yet. 

This is a very interesting idea but I wonder why it hasn't been aknowledged by Dataiku yet. 

ktgross15
Dataiker

Hi @natejgardner !

You actually already can create plugins for custom prepare recipe processors: https://doc.dataiku.com/dss/latest/plugins/reference/preparation.html

Let me know if this solves your use case.

Best,

Katie

Hi @natejgardner !

You actually already can create plugins for custom prepare recipe processors: https://doc.dataiku.com/dss/latest/plugins/reference/preparation.html

Let me know if this solves your use case.

Best,

Katie

CoreyS
Dataiker Alumni
 
Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
Status changed to: Gathering Input
 

This looks promising, I'll give it a try and write back 

This looks promising, I'll give it a try and write back 

MichaelG
Community Manager
Community Manager
 
I hope I helped! Do you Know that if I was Useful to you or Did something Outstanding you can Show your appreciation by giving me a KUDOS?

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
Status changed to: Gathering Input
 

Today, I do a certain amount of copy and paste visual recipient steps from one recipient to another.  This works fairly well.  However, you have to remember, which other recipe has the best copy of these steps.  And it hard to have a “golden set”. I’m wondering if there could be an option to paste saved recipe, steps, like we have saved python and R code snip its. This might leverage those existing underpinnings and help in this area.   

--Tom

Today, I do a certain amount of copy and paste visual recipient steps from one recipient to another.  This works fairly well.  However, you have to remember, which other recipe has the best copy of these steps.  And it hard to have a “golden set”. I’m wondering if there could be an option to paste saved recipe, steps, like we have saved python and R code snip its. This might leverage those existing underpinnings and help in this area.