An alternative way to get around lack of Dataiku support for certain datatypes

Options
info-rchitect
info-rchitect Registered Posts: 169 ✭✭✭✭✭✭

Hi,

Dataiku doesn't currently support ARRAY, VARIANT, NTZ timestamp data types natively. So, I just wanted to share how we get around this. We use an internal Python library to submit templatized queries and custom SQL to Snowflake and keep the data remote (think templatized queries extract raw data and custom SQL calls transform it). Unless we need to manipulate the data in pandas (i.e., transform not possible in SQL) we always keep the data remote.

To write the resulting table we use SQLExecutor2.exec_recipe_fragment and pass it a simple SELECT query. Thus, we never pay the penalty of bringing large datasets into memory, can manipulate unsupported data types and ensure the fastest possible recipe execution time.

I highly recommend this pattern if you can swing it.

regards


Operating system used: Windows 10

Setup Info
    Tags
      Help me…