Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Sync-recipe to Snowflake

Solved!
risto
Level 1
Sync-recipe to Snowflake

I have a flow that gets data from two Snowflake sources, then Python recipe checks the difference of the max(date columns) of both and extracts the rows that are missing from the other dataset. I first tried Syncing that back to snowflake (like updating the other source set with appending the missing rows) but encountered a time format error. I then made another Python recipe to fix the time format.

This data gets finally Synced to snowflake with Append instead of overwrite-box checked. For some reason the Sync doesnt append after all but overwrites the whole table. How to fix this?

0 Kudos
1 Solution
Turribeach

If you need to maintain an historical dataset I will strongly advise you NOT to use append mode. While append mode does work in some instances it is not safe to use. Any schema changes or differences will cause Dataiku to drop the table and recreate it causing you to loose all your historical data. Even when you use the write_from_dataframe(), which should not change the schema, I have seen cases where append dataset tables get dropped. Dataiku simple does not handle this ETL concept well. There are different solutions but the safest one is to handle the inserts in code (ie Python) which "hides" the output table from Dataiku and prevents accidental table dropping. 

View solution in original post

1 Reply
Turribeach

If you need to maintain an historical dataset I will strongly advise you NOT to use append mode. While append mode does work in some instances it is not safe to use. Any schema changes or differences will cause Dataiku to drop the table and recreate it causing you to loose all your historical data. Even when you use the write_from_dataframe(), which should not change the schema, I have seen cases where append dataset tables get dropped. Dataiku simple does not handle this ETL concept well. There are different solutions but the safest one is to handle the inserts in code (ie Python) which "hides" the output table from Dataiku and prevents accidental table dropping.