Dynamic Rename of Headers
Hello All,
First Post here, and very new to Dataiku.
I am attempting to consume data from an API call to QuickBase, and I am looking for a solution that accepts a dynamic number of columns, and that replaces any special characters IE ( | , # so that I can output a file to BigQuery without issue. The team that owns the data is not willing to make this change for me on the QB side..
I see that I can make these changes for the columns in the schema, but when additional columns are added, I'm not seeing a way to manage this scenario if a new column is added. Thinking some regex replace function, but where and how is not known to me currently.
The goal is to just ETL the data with the only modification being to address invalid header naming conventions.
Appreciate your assistance.
Curtis
Operating system used: Windows 11
Best Answer
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,090 Neuron
You will need to use a Python recipe for this since Dataiku can do column renaming but it's never going to be dynamic. The write_with_schema() method already handles changing the schema as needed to match your pandas dataframe:
https://doc.dataiku.com/dss/latest/code_recipes/python.html#writing-a-pandas-dataframe-in-a-dataset
You can set the output to be a GCP bucket so in the next recipe you should be able to do a simple Sync recipe which will upload the data to BigQuery.
Answers
-
Thanks for sharing! I'll check this out.