Set column names in Python recipe

Thomas_K · February 2018

tl;dr: I need to rename the column names of a huge dataset programmatically, preferably in Python code. The values are extracted from a different dataset, I have them as a Python list with the corresponding data type in another list of the same length.

Long version: DataIku GUI only lets me manually change the column names. I have, however, lots of csv files containing data without the column names, and one csv file where all the column names for all the other csv files are written. (I can't do anything about that layout, since it's not my data and I only have read access). This should be a one-liner that does not necessitate touching the actual data. What would be the best way to do this? I managed to extract the values (and the corresponding data types per column) as a Python list in a code recipe, but am not sure what to do with it. My lists might look like this:

col_names = ["ID", "Names", "Count"]

col_types = ["int", "string", "int"]

Clément_Stenac · February 2018

Hi,

You would generally not do that in a recipe, which is part of the flow / rerunnable / supposed more or less to process data, but, first in a Python notebook. You can then automate that as a "Macro" as part of a DSS plugin

This would use the DSS public API (https://doc.dataiku.com/dss/latest/api/public/client-python/index.html)

Something like ("pseudo-code")


import dataiku
client = dataiku.api_client()
project = client.get_project("PROJECT_NAME")
dataset = project.get_dataset("dataset_name")

current_schema = dataset.get_schema()
# current_schema is now a dict, containing "columns", list of dicts. Each dict contains "name"

# Build the new columns list.
new_cols = []
for i in xrange(0, len(col_names)):
    new_cols.append({"name": col_names[i], "type": col_types[i]})

# And update the schema, and save it
current_schema.columns = new_cols
dataset.set_schema(current_schema)

Thomas_K · February 2018

Thanks, I'll try that and give an update later on whether it worked.

Thomas_K · February 2018

I couldn't quite figure out how to do it as a recipe, so I did it as a script as described by you and that works for now. Thanks.

Set column names in Python recipe

Best Answer

Answers

Categories

Setup Info

Tags