Let users create and publish custom prepare recipe processors

natejgardner
natejgardner Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 151 Neuron

The prepare recipe in Dataiku is very powerful. It's the fastest way to script up complex data transformations by far. But sometimes, the processors are not quite enough. Whether I use Dataiku Formula Language, Python, or SQL, I often need to create custom steps in my recipes. While I can always create a totally separate code recipe to do the processing I'm looking for, it's usually more convenient to create code steps in a prepare recipe between other data preparation steps. This works well enough if the transformation is specific to my dataset, but sometimes, I need to execute the same transformation against many datasets. For this, it'd be very helpful to be able to create custom processors.

I'd like to be able to write some code that can hook into the script step UI elements- for example, the column selector and configuration radio buttons, then publish it so it's available for everyone in my team to use, listed in the processors library in the prepare recipe. That way, I can use it everywhere. It'd also be great if these custom processors could be added via plugin or otherwise installed from the internet so useful processors could be contributed back to the larger community and made accessible for everyone. This would make the prepare script immensely more powerful in enterprises where processors specific to a company's needs can be built and distributed throughout organizations. Among the many use-cases, I could imagine the processors hooking into internal APIs (one of my main uses of Python processors today) and simplifying otherwise complex tasks.

I'd also like to be able to save specific configurations of existing processors to be reused. For example, if I write regex that can extract patterns specific to my company's data, I'd love to be able to save those patterns into a library and recall them later, for example. I'd love the same feature in find and replace. For the tokenize text processor, I'd like to be able to define and publish a custom tokenizer. Etc. etc.

I think these features could be a really powerful addition to the prepare recipe that, combined with Enable full Dataiku API in shaker script Python processors - Dataiku Community would allow for compounding orders of magnitude more productivity embedded in a single step.

5
5 votes

Considered · Last Updated

Comments

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    Here is a related conversation about flow reuse.

    https://community.dataiku.com/t5/Using-Dataiku/Flow-Zone-Reuse-Can-one-flow-zone-be-reused-from-multiple/m-p/22049/highlight/true

    Dataiku staff recommended the use of application recipes. This turned out to be complicated to understand and implement. I’ve still not succeeded in setting up my use case using this approach.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,913 Neuron

    This is a very interesting idea but I wonder why it hasn't been aknowledged by Dataiku yet.

  • Katie
    Katie Dataiker, Registered, Product Ideas Manager Posts: 106 Dataiker

    Hi @natejgardner
    !

    You actually already can create plugins for custom prepare recipe processors: https://doc.dataiku.com/dss/latest/plugins/reference/preparation.html

    Let me know if this solves your use case.

    Best,

    Katie

  • natejgardner
    natejgardner Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 151 Neuron

    This looks promising, I'll give it a try and write back

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    Today, I do a certain amount of copy and paste visual recipient steps from one recipient to another. This works fairly well. However, you have to remember, which other recipe has the best copy of these steps. And it hard to have a “golden set”. I’m wondering if there could be an option to paste saved recipe, steps, like we have saved python and R code snip its. This might leverage those existing underpinnings and help in this area.

Setup Info
    Tags
      Help me…