SQL is a great language for querying and managing data but when it comes to good development practices like DRY (Don’t Repeat Yourself), it really falls down. The same values or logic may be repeated many times in any decently long SQL script making maintenance a much more risky endeavor than it should be.
Yet SQL is used extensively by data scientists and analysts as it offers unparalleled speed and power for working with large datasets.
Wouldn’t it be great if modern features available in languages like Python such as variables and functions could be added to SQL? This would transform the experience of writing SQL. People could then DRY in SQL!
My idea is for DSS to provide this transformative experience for people who write SQL.
How? By building into SQL recipes the capability to use a templating language like Jinja within the recipe.
Jinja offers a syntax for defining variables and functions. So SQL coders could use this syntax to set and use variables and to create and call functions within their recipes. DSS would then use Jinja to translate the recipe into pure SQL before sending it to the database for execution.
Additionally, Jinja would enable writing SQL that was created from other SQL tables. For example, instead of typing out all of the columns in a table by hand, one could let Jinja fill in those column names automatically (by pulling the column names from that table).
Furthermore, functions could be written to shared across a project and an instance. This would be a great way of sharing common SQL logic that you want everyone on your team or in your company to use.
Another positive aspect of this idea is that it doesn’t seem like it would require a major development effort to accomplish. DSS would be leveraging the open source Jinja package for all the hard work. I’m basing this on my experience of developing a working prototype of using Jinja in DSS in a few hours. Obviously, it’s a lot more involved to add and test functionality across DSS but still seems like the investment would be relatively low especially compared with the return on that investment.
I have added some example use cases in a comment below. I can also share implementation ideas if that would be helpful.
By doing this, Dataiku would be helping to bring basic software engineering principles to the practices of data science and analytics. Besides the benefit to us current DSS users, it seems like that is a message that could resonate out in the marketplace.
I have built an entire internal Python library that uses jinja or mako for SQL templating. I abstract product specific configuration data in dataiku variables and then use them as arguments to the python library. The library functions as a DSL for querying our specific type of data from Snowflake. I use the sqlexecutor2 and the write_to_schema, depending on the user case. I agree with the topic author's concept completely, just not sure if it should be a dataiku feature.
Only members of the Community can comment.