Update Records In Dataiku
Hi Folks,
I'm trying to find a way in Dataiku to update specific records in a table based off of specified criteria.
Some context: I process data on a daily basis that needs to be transformed to a workable state for a multitude of different reports. There are user-generated date values inside of the report that are invalid for our purposes and need to be adjusted.
We had previously used an Access database and in that database we just had a simple UPDATE query that specified what to look for, and what to change the date values to. However the UPDATE SQL function does not seem to work in Dataiku and I cannot find a way to update the records through the Prepare recipe.
When working in the code notebook I receive a message saying my query works but it does not return results, which is fine because I only want to update some values. When I try and use the SQL Script recipe, after some finicking, it indicates that it ran successfully, however when I go and check my output table it is clear that nothing was changed.
Furthermore, to even get the SQL Script to not kick an error immediately I had to write my input dataset to my export data set so that the schema matched. The output data set would not update it's schema automatically because the query is only meant to update records, not actually pull any information.
My guess is that the issue is caused by both system limitations and a lack of understanding on my part. But any help is much appreciated!
Best Answer
-
*SOLVED*
So I was actually able to insert my UPDATE statement into the "post-write statements" section in the Advanced tab in Settings. It appears to be working as intended and it will update the records every time I rebuild that data set, which was the desired outcome.
Answers
-
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 320 Neuron
Hi @ScottSparks
,Yes, it's not really DSS' forte to update records. That said, it certainly is doable and we have used a couple of approaches.
The most straightforward is to use an Execute SQL step in a Scenario. The Execute SQL script is simply your SQL update statement. In this approach you don't need to create any datasets or recipes so if this the only functionality in your project, then the flow would be empty. You'd be using scenarios to automatically run the update schedule on a particular schedule (or when some data changes).
If you want to update records as part of the flow then you can certainly write update SQL statements in a SQL Script recipe but as you discovered you'll need an output dataset. What we typically do is create what is essentially a dummy dataset for this purpose. For example, in a recent project, the output dataset was simply a summary of the number of rows updated. So as far as DSS knows you have a typical recipe that produces an output dataset (where the underlying SQL table is dropped and recreated each time) but of course this recipe also includes your update statement.
To see changes made by an update script in the dataset explore view, you may need to click on SAVE AND REFRESH SAMPLE after clicking an the gray "handle" at the left side of the screen.
Hope this is helpful.
Marlan
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
Thank you for sharing your solution @ScottSparks
! Please feel free to mark it as an ‘Accepted Solution’, if you wish, to help others like you!