An alternative way to get around lack of Dataiku support for certain datatypes

Hi,

Dataiku doesn't currently support ARRAY, VARIANT, NTZ timestamp data types natively. So, I just wanted to share how we get around this. We use an internal Python library to submit templatized queries and custom SQL to Snowflake and keep the data remote (think templatized queries extract raw data and custom SQL calls transform it). Unless we need to manipulate the data in pandas (i.e., transform not possible in SQL) we always keep the data remote.

To write the resulting table we use SQLExecutor2.exec_recipe_fragment and pass it a simple SELECT query. Thus, we never pay the penalty of bringing large datasets into memory, can manipulate unsupported data types and ensure the fastest possible recipe execution time.

I highly recommend this pattern if you can swing it.

regards

Operating system used: Windows 10

0 Replies

never-displayed

You must be signed in to add attachments

never-displayed

Additional options

Associated Products

An alternative way to get around lack of Dataiku support for certain datatypes

An alternative way to get around lack of Dataiku support for certain datatypes

Labels

Python

SQL databases

Setup info

Sign up to take part

An alternative way to get around lack of Dataiku support for certain datatypes

An alternative way to get around lack of Dataiku support for certain datatypes

Labels

Python

SQL databases

Setup info