SQL "Formula" step in visual recipes

User Story:

As a user who consistently uses multi-step visual recipes and stores data in a SQL database, it would be great to have SQL Formulas as a step type in visual recipes to create a new column.  The hope is that these could be easily created in such a way that more visual recipes could run in the database directly and thus improve performance & scalability.  

Notes:

Here is an example of a visual formula that might be able to work as a SQL formula step. The following is an example of code in a formula recipe step:

substring(toLowercase(FirstName), 0, 1) + "," +
substring(toLowercase(LastName), 0, 20) + "," +
substring(toLowercase(MailingStreet), 0, 4) + "," +
substring(toLowercase(MailingPostalCode), 0, 3)

 This might look something like the below in a SQL visual recipe step.

substring(lower(FirstName), 0,1) || "," ||
substring(lower(LastName), 0,20) || "," ||
substring(lower(MailingStreet), 0,4) || "," ||
substring(lower(MailingPostalCode), 0,3) 

 

--Tom
4 Comments

Short of writing this in a separate SQL recipe.  It would be interesting to hear if there is a performant workaround for a scenario like this.

--Tom

Short of writing this in a separate SQL recipe.  It would be interesting to hear if there is a performant workaround for a scenario like this.

What is the benefit of this idea? You can already create new columns in say a Prepare recipe using a formula. And if your input and output are in SQL the recipe will execute on the SQL database. And like yo said if you prefer to write the SQL yourself you can already do that with a SQL recipe. The other problem with your idea I think is that many recipes will end up with complex SQL that is not so easy to alter. Where exactly the new column changes should go?

Personally I think that since your are good with SQL you should use more SQL recipes and try to reduce the number of steps you perform on the data.

What is the benefit of this idea? You can already create new columns in say a Prepare recipe using a formula. And if your input and output are in SQL the recipe will execute on the SQL database. And like yo said if you prefer to write the SQL yourself you can already do that with a SQL recipe. The other problem with your idea I think is that many recipes will end up with complex SQL that is not so easy to alter. Where exactly the new column changes should go?

Personally I think that since your are good with SQL you should use more SQL recipes and try to reduce the number of steps you perform on the data.

apichery
Dataiker

While it's a very nice suggestion, it's not easily possible in the way the Prepare recipe is currently working.

The prepare recipe relies on DSS being able to apply the recipe steps to the input dataset so that a preview of the output is displayed interactively.

Having a SQL step would force either DSS to actually run the SQL you would have written on your SQL database to get the output of the step, or parse the SQL to interpret it and run it on the in-memory sample.

So, while the idea is interesting, it would not be easy for Dataiku to implement it ๐Ÿ˜‰

 

While it's a very nice suggestion, it's not easily possible in the way the Prepare recipe is currently working.

The prepare recipe relies on DSS being able to apply the recipe steps to the input dataset so that a preview of the output is displayed interactively.

Having a SQL step would force either DSS to actually run the SQL you would have written on your SQL database to get the output of the step, or parse the SQL to interpret it and run it on the in-memory sample.

So, while the idea is interesting, it would not be easy for Dataiku to implement it ๐Ÿ˜‰

 

CoreyS
Dataiker Alumni
 
Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as โ€˜Accepted Solutionโ€™ to help others like you!
Status changed to: Rejected