Dynamic rename and addition/removal of columns
I am connecting to QuickBase using a URL path and looking for a way to dynamically add or remove columns from the initial schema as well as rename the columns as "Simplify (and lower case)". I do not own the source data, and cannot influence the naming convention or columns at the source. I am pushing this data to GCP/Big Query and overwriting/replacing the table daily.
Initially, there was no issue, but since implementing the flow, additional columns have been added, and each job that runs, I receive a warning that the schema does not match.
How can I go about performing the task of checking and updating the schema and renaming the fields without warning? There are no issues on the output side if columns are added or removed. My only expectation is that the job runs successfully and all columns are added to the output without warning or error.
5/1 - 125 columns
5/8 - 126 columns ("Phone #" was added to the schema)
Thanks for your support.
Curtis
Operating system used: Windows 11
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,059 Neuron
Go to the Dataset => Settings => Schema and press in the Check Now button. If the check detects any changes it will show a "The schema of the dataset does not match the table" message at which point you can press in Reload Schema from Database, save the changes and your dataset will be up-to-date.
-
Thanks @Turribeach
for the quick response. This is how I am addressing this today, but I'm trying to avoid having to do this handful of steps each time the owners make a change. I've have had to do this 4 times in the last 2 weeks, which is not ideal for an automated solution.Curtis
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,059 Neuron
What recipe type are you using to deal with this data? Can you put an image of the flow?
-
I reset and reloaded the schema to show the initial input of a couple of columns. The Prepare step and data are not being used for anything currently.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,059 Neuron
Wasn't aware of the HTTP Dataset so that's the one learnt today. Visual recipes can't have dynamic outputs so you will need to convert the HTTP Dataset and prepare recipe to a Python recipe with no inputs. You can use the requests Python package to pull the data from the external URL. Python recipes can have dynamic outputs so the columns will adjust automatically as per the data but you obviously need to parse the URL output accordingly. This shouldn't be a problem assuming it's a REST API returning a JSON object. The output of the Python recipe can be a BigQuery dataset so provided you don't have any other visual recipes in the mini flow you will not need to update any schemas for this to work autonomously.
-
OK, this may be over my head a little. Kind of new to the platform, but I shared this with our internal team and see what we can do. If we get stuck, I'll post it here. Appreciate your help!
Curtis
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,059 Neuron
Alternatively if you are running your refresh in a scenario you can use the "Reload schema" and the "Schema propagation" steps to push the schema changes in the flow.
-
I separated this piece of the overall flow into it's own step, using the "Before each recipe runs, update the schema of the output dataset(s)" and it ran successfully, but unsure how this will interpret a new column with special characters or spaces.
-
Trying this now. Initial test of removing column 'Phone #' generated the expected output of 'phone_number' but I'm not entirely sure how this works. Requesting for a new 'Test #' column to be added and see what happens. I don't know how it's going to inherit the renaming convention for a new field, but we shall see! Thanks again!
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,059 Neuron
@CurtisC
wrote:but unsure how this will interpret a new column with special characters or spaces.
It won't work as Bigquery won't accept those as valid columns names. You need to rename those columns hence why I suggested the Python recipe as the Prepare recipe can't rename columns dynamically whereas you can do that in Python. There is no other alternative.