Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
We processed data in a SQL Server Table in Dataiku (which uses Snowflake as the backend) and when I go to send it back to SQL Server it insists on dropping the table (bad) and recreating it. So I wrote SQL to truncate the table and reload it from the modified data in the Snowflake Dataiku table, but it keeps failing because all the SQL appears to only work on data in SQL Server and since the Dataiku Snowflake table isn't in SQL it doesn't know it exists (get Invalid object name error). Thoughts? The table has 266 records so it's a bit much if I want to manually code some sort of upsert mechanism.
Generally speaking you can't cross database technologies like that. In fact Dataiku won't even let you cross different connections on the same database which is silly when you are using Cloud serverless databases like BigQuery where there is no database server. One solution will be to pivot via a File System or Cloud Storage dataset.
With regards to the second issue you can change the recipe to do an "Append instead of overwrite" in the Input/Output tab. This will then mean Dataiku will insert data every time rather than drop and recreate the table. However this won't fully prevent Dataiku from droping the table as any schema changes will do just that. This happens because you are using Managed Datasets. Basically you can't have it both ways. If Dataiku manages the dataset then it will responsible of managing the table schema which means it will do a drop for any table changes. If you don't want that then use your table as an external dataset. However is that case you will be responsible of managing the table schema yourself and to make sure it matches what Dataiku is trying to write.
Unfortunately even while doing the "Append instead of overwrite" it still errors and tries to do a CREATE TABLE. 😕 I'm quickly coming to the realization that Dataiku isn't really designed to (easily) do what I am trying to do, scanned through numerous posts and pretty much the end result was everyone ended up having to write UPSERT SQL to do this as Dataiku doesn't really have Recipe/Plugin to handle it like say Alteryx. Thankfully this is just one table I need to do this with so while tedious I can manually write out the SQL. Thanks for the info though, guided me to where I figured I was going to probably end up.
Dataiku can insert data into an existing table, but like I said you can traverse two database technologies without dumping the data first into a storage layer that's compatible with both technologies, like a File System Dataset. To insert without creating the table first you need to add the table as a new Dataset rather than allowing Dataiku to define the output as that will be a managed dataset. Then you can select the output table the Input/Output tab from "Use Existing" option. If you do all the above you will be able to insert data without writing any SQL. But certainly not upsert.
Ok thanks, will try it. Wish we had a simpler approach to this, looking through the myriad of posts on this everyone seems to be expecting something straight-forward. I have had discussion with Dataiku itself in the past and they recognize the hoops you gotta jump through to do a Load. 😕
Well Dataiku has its way of doing things which you may or may not like. The fact is that you are going against that way hence why you are finding so many issues. If you would ignore the drop table issue and work with it then you won't have to write any SQL or worry about anything else. This can be done by having an intermediate "staging" schema where Dataiku could "own" the tables and another final destination schema where you would own the tables. Once you finish inserting into the "staging" schema you can then perform your upsert in your final destination schema using database native SQL. In my opinion this is a better design than trying to force Dataiku to work in a way that is not meant to do. It also keeps the delineation between the two systems much more clear: "staging" schema is owned and managed by Dataiku and final destination schema managed and owned by database.