SQL "Formula" step in visual recipes
User Story:
As a user who consistently uses multi-step visual recipes and stores data in a SQL database, it would be great to have SQL Formulas as a step type in visual recipes to create a new column. The hope is that these could be easily created in such a way that more visual recipes could run in the database directly and thus improve performance & scalability.
Notes:
Here is an example of a visual formula that might be able to work as a SQL formula step. The following is an example of code in a formula recipe step:
substring(toLowercase(FirstName), 0, 1) + "," +
substring(toLowercase(LastName), 0, 20) + "," +
substring(toLowercase(MailingStreet), 0, 4) + "," +
substring(toLowercase(MailingPostalCode), 0, 3)
This might look something like the below in a SQL visual recipe step.
substring(lower(FirstName), 0,1) || "," ||
substring(lower(LastName), 0,20) || "," ||
substring(lower(MailingStreet), 0,4) || "," ||
substring(lower(MailingPostalCode), 0,3)
Comments
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
Short of writing this in a separate SQL recipe. It would be interesting to hear if there is a performant workaround for a scenario like this.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,139 Neuron
What is the benefit of this idea? You can already create new columns in say a Prepare recipe using a formula. And if your input and output are in SQL the recipe will execute on the SQL database. And like yo said if you prefer to write the SQL yourself you can already do that with a SQL recipe. The other problem with your idea I think is that many recipes will end up with complex SQL that is not so easy to alter. Where exactly the new column changes should go?
Personally I think that since your are good with SQL you should use more SQL recipes and try to reduce the number of steps you perform on the data.
-
While it's a very nice suggestion, it's not easily possible in the way the Prepare recipe is currently working.
The prepare recipe relies on DSS being able to apply the recipe steps to the input dataset so that a preview of the output is displayed interactively.
Having a SQL step would force either DSS to actually run the SQL you would have written on your SQL database to get the output of the step, or parse the SQL to interpret it and run it on the in-memory sample.
So, while the idea is interesting, it would not be easy for Dataiku to implement it