Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on February 3, 2020 2:28PM
Likes: 1
Replies: 2
Hi
I'd like to clarify the best way to update an existing recipe to a new column to an existing output dataset that contains historical data and I want to add a new column but retain the existing data.
Example:
The usual "Schema change" message prompts for the output dataset to "Drop and recreate" but this obviously drops the data. Can this be deselected or would this cause issues?
I appreciate the slightly longer process would be to create a new Join with the 5 columns and then Stack both old and new datasets together, is this the preferred technique?
Thanks
Why do you want to retain the data? Would it take too long to re-compute the values, or are the original values no longer available in the input datasets? Where does the data reside?
Assuming you have a SQL table, you can process the change manually in the database, using ALTER TABLE and UPDATE commands. You can use SQL notebooks to interface directly with the database. Afterwards, you can load the updated schema in the Dataset.
If that's not an option, you can do some Flow wizardry, for example: