Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku! CHECK IT OUT

Dynamic rename and addition/removal of columns

CurtisC
Level 2
Dynamic rename and addition/removal of columns

I am connecting to QuickBase using a URL path and looking for a way to dynamically add or remove columns from the initial schema as well as rename the columns as "Simplify (and lower case)". I do not own the source data, and cannot influence the naming convention or columns at the source. I am pushing this data to GCP/Big Query and overwriting/replacing the table daily.

Initially, there was no issue, but since implementing the flow, additional columns have been added, and each job that runs, I receive a warning that the schema does not match. 

How can I go about performing the task of checking and updating the schema and renaming the fields without warning? There are no issues on the output side if columns are added or removed. My only expectation is that the job runs successfully and all columns are added to the output without warning or error.

5/1 - 125 columns

5/8 - 126 columns ("Phone #" was added to the schema)

Thanks for your support.

Curtis


Operating system used: Windows 11

0 Kudos
10 Replies
Turribeach

Go to the Dataset => Settings => Schema and press in the Check Now button. If the check detects any changes it will show a "The schema of the dataset does not match the table" message at which point you can press in Reload Schema from Database, save the changes and your dataset will be up-to-date. 

0 Kudos
CurtisC
Level 2
Author

Thanks @Turribeach for the quick response. This is how I am addressing this today, but I'm trying to avoid having to do this handful of steps each time the owners make a change. I've have had to do this 4 times in the last 2 weeks, which is not ideal for an automated solution.

Curtis

0 Kudos

What recipe type are you using to deal with this data? Can you put an image of the flow?

0 Kudos
CurtisC
Level 2
Author

I reset and reloaded the schema to show the initial input of a couple of columns. The Prepare step and data are not being used for anything currently.

0 Kudos

Wasn't aware of the HTTP Dataset so that's the one learnt today. Visual recipes can't have dynamic outputs so you will need to convert the HTTP Dataset and prepare recipe to a Python recipe with no inputs. You can use the requests Python package to pull the data from the external URL. Python recipes can have dynamic outputs so the columns will adjust automatically as per the data but you obviously need to parse the URL output accordingly. This shouldn't be a problem assuming it's a REST API returning a JSON object. The output of the Python recipe can be a BigQuery dataset so provided you don't have any other visual recipes in the mini flow you will not need to update any schemas for this to work autonomously.

0 Kudos
CurtisC
Level 2
Author

OK, this may be over my head a little. Kind of new to the platform, but I shared this with our internal team and see what we can do. If we get stuck, I'll post it here. Appreciate your help!

Curtis

0 Kudos

Alternatively if you are running your refresh in a scenario you can use the "Reload schema" and the "Schema propagation" steps to push the schema changes in the flow.

0 Kudos
CurtisC
Level 2
Author

Trying this now. Initial test of removing column 'Phone #' generated the expected output of 'phone_number' but I'm not entirely sure how this works. Requesting for a new 'Test #' column to be added and see what happens. I don't know how it's going to inherit the renaming convention for a new field, but we shall see! Thanks again!

0 Kudos
CurtisC
Level 2
Author

I separated this piece of the overall flow into it's own step, using the "Before each recipe runs, update the schema of the output dataset(s)" and it ran successfully, but unsure how this will interpret a new column with special characters or spaces. 

0 Kudos

@CurtisC wrote:

but unsure how this will interpret a new column with special characters or spaces. 


It won't work as Bigquery won't accept those as valid columns names. You need to rename those columns hence why I suggested the Python recipe as the Prepare recipe can't rename columns dynamically whereas you can do that in Python. There is no other alternative. 

0 Kudos

Labels

?
Labels (2)
A banner prompting to get Dataiku