Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
What recipe should I use to change the header from (it's now in two different rows)
Business 1 | Business 2 |
ABC | XYZ |
to just one row as a header
Business 1 ABC | Business 2 XYZ |
Hi @GSung,
For most operations like this to perform a simple data transformation you could use a prepare recipe, which has many built in data transformations. This is normally where you would go to rename individual columns and perform similar steps.
I don't think your particular transformation is handled automatically here, so if you wanted to automate combing the dataset header row with the first row of data, I think you can create a Python recipe like this:
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
merge_column_headers = dataiku.Dataset("merge_column_headers")
merge_column_headers_df = merge_column_headers.get_dataframe()
for col in merge_column_headers_df.columns:
# get first row value for this specific column
first_row = merge_column_headers_df.iloc[0][col]
new_column_name = col + " " + first_row
# rename the column with the existing column header plus the first row of that column's data
merge_column_headers_df.rename(columns={col: new_column_name}, inplace=True)
# delete first row of data, now that it's merged into the header
merge_column_headers_df = merge_column_headers_df.iloc[1:]
# Write recipe outputs back to dataset
merge_column_headers = dataiku.Dataset("merge_column_headers")
merge_column_headers.write_with_schema(merge_column_headers_df)
This modifies a single dataset, in this example called "merge_column_header" and appends the values from the first row of data to the header row, and then removes the first row of data from the dataset.
Here's the before an after:
Thanks,
Sarina
Hi, @GSung! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community.
Hi @GSung,
For most operations like this to perform a simple data transformation you could use a prepare recipe, which has many built in data transformations. This is normally where you would go to rename individual columns and perform similar steps.
I don't think your particular transformation is handled automatically here, so if you wanted to automate combing the dataset header row with the first row of data, I think you can create a Python recipe like this:
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
merge_column_headers = dataiku.Dataset("merge_column_headers")
merge_column_headers_df = merge_column_headers.get_dataframe()
for col in merge_column_headers_df.columns:
# get first row value for this specific column
first_row = merge_column_headers_df.iloc[0][col]
new_column_name = col + " " + first_row
# rename the column with the existing column header plus the first row of that column's data
merge_column_headers_df.rename(columns={col: new_column_name}, inplace=True)
# delete first row of data, now that it's merged into the header
merge_column_headers_df = merge_column_headers_df.iloc[1:]
# Write recipe outputs back to dataset
merge_column_headers = dataiku.Dataset("merge_column_headers")
merge_column_headers.write_with_schema(merge_column_headers_df)
This modifies a single dataset, in this example called "merge_column_header" and appends the values from the first row of data to the header row, and then removes the first row of data from the dataset.
Here's the before an after:
Thanks,
Sarina
thank you all !