Extract underlying code of any recipe on dataiku
I have a similar question to the one posted a few years ago
I have a flow with tons of recipes. I want to convert that into "a" code, python, SQL, pyspark... I do not care.
The solution in the link works only for recipes which are treated with "in database SQL". For those, you can see a "VIEW QUERY" button somewhere in "advanced" or in the from page depending on the recipe.
What if my recipe is not treated using an "in database SQL" or cannot be treated that way. What can I do to still get the underlying code of that recipe.
I appreciate very much your help.
P.S. I tried also the "get_code" suggested by our AI friends but that did not work either
Operating system used: windows
Answers
-
Hi Diaa,
Unfortunately this is currently not possible using DSS (in all cases). As stated, for recipes that can be translated to SQL, DSS does the job of converting the recipe into SQL.
For Python, that's another story, as for this to work, DSS would need to generate the Python code that is "equivalent" to the recipe. This is far from a trivial task, as your recipe might use formulas to compute pre/post columns (which requires a formula parser/interpreter), might geo-locate points (which requires geocoding data) or do some dates conversion (which depends on the type of the input dataset).
Not saying there is no chance it might happen, just that it's a complex task
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Why do you want this? What are you trying to achieve?
-
thanks for confirming what I was afraid to happen
I understand the complexity of the task, but I thought that I could at least can get the code of the whole flow in one go in at least one programming language like SQL. Probably Copilot has to help me out now somehow.
@neuron
, I want to migrate a project to databricks. So I have to translate the flow into code, in SQL or python. -
Hi Diaa,
You can also run your Flow in Databricks using DSS. DSS can execute recipes directly as SQL for Databricks and/or using Spark for Databricks.
Do you have any external constraint that would make this solution not possible ?
-
yes the constraint is not use dataiku at all. That is why I need to decode my recipes.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Well it's good to finally get your real requirement after some backs and forwards. In the future it will be best if you start your post with your real requirement as it will lead to better solution faster.
There is no way to convert a Dataiku flow and its recipes into code. The best you are going to get is to be able to extract the SQL of those Visual and Code recipes which execute in-database rather than DSS engine. And even that SQL won't be 100% compatible with Databricks since Databricks has it's own quirks (all object names in lower case, specific function names, no implicit conversions, any many more we found when we moved from MS-SQL to Databricks) so you will need to convert that SQL too. So depending on how many non-SQL recipes you have you are looking at a full rewrite of those.
Furthermore the nature of Databricks is such a direct convesion is not only impossible but also not recommended. You should rewrite your project in Databricks, whether you are using SQL Warehouse or Computer Cluster, taking advantage of the platform benefits (ie Spark) and avoiding the pitfalls. So whoever came up with the idea that you could just "lift and shift" the SQL from Dataiku into Databricks doesn't know what they are talking about.
As a test I suggest you multi-select all your datasets in your flow using Ctrl and then click on Changhe Connection on the right pane and select a Databricks SQL Warehouse connection you have added to Dataiku. Then try to run your flow using this Databricks SQL Warehouse connection and see how it goes.