Want to Stop Rebuilding "Expensive" Parts of your Flow? Explicit Builds are the Answer!READ MORE

Auto-generated visual recipe concept

0 Kudos

Problem Statement: Right now, there are great, no-code visual recipes in DSS for select, common machine learning algorithms and data wrangling tools. In other cases, code recipes are required and users repeatedly have to both code the algorithms and add the same lines of code wrappers in DSS code recipes to map to datasets for test, train, output, etc. data sets. There is a middle ground of algorithms that are repeatedly created in code by various users and would be ripe for modularization into recipes that do not exist yet. For example, there are many important algorithms in modules like Python's sklearn or R's tidymodels/caret that that do not have visual recipes in DSS.  However, there are so many that maintaining all those recipes would be challenging for Dataiku.

Solution:  What if... Dataiku could parse the requirements for each algorithm in popular packages and auto-generate a form for entry of parameters using a single visual recipe module?  The form fields would be auto-generated by mapping and exposing the modules syntax and variables within a single recipe template. It is a simple trick we use in custom applications now. This would provide a many-to-one mapping of algorithms to one recipe and avoid a large effort of software maintenance.

Scenarios/UseCases: See attached. Using a module requiring code (sklearn adaboost) as an example,  a user might select a new option for autogenerated recipes in the model algorithm selector screen, type in "sklearn.ensemble.AdaBoostRegressor", then dataiku would import the algorithm, assemble the proper input and output datasets, read the documentation, and present a screen with default parameters (see attached).

 

2 Comments
tgb417
Neuron
Neuron

@vcapodanno ,

To take this maybe one step further.  What about using something like github copilot?

vcapodanno
Level 2

Interesting idea @tgb417.  That is a very similar concept for sure, though it would need to be within the visual recipe framework. Lets see what DataIKU says.