What recipe should I use to change the header from (it's now in two different rows)
Business 1 | Business 2 |
ABC | XYZ |
to just one row as a header
Business 1 ABC | Business 2 XYZ |
Hi @GSung,
For most operations like this to perform a simple data transformation you could use a prepare recipe, which has many built in data transformations. This is normally where you would go to rename individual columns and perform similar steps.
I don't think your particular transformation is handled automatically here, so if you wanted to automate combing the dataset header row with the first row of data, I think you can create a Python recipe like this:
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
merge_column_headers = dataiku.Dataset("merge_column_headers")
merge_column_headers_df = merge_column_headers.get_dataframe()
for col in merge_column_headers_df.columns:
# get first row value for this specific column
first_row = merge_column_headers_df.iloc[0][col]
new_column_name = col + " " + first_row
# rename the column with the existing column header plus the first row of that column's data
merge_column_headers_df.rename(columns={col: new_column_name}, inplace=True)
# delete first row of data, now that it's merged into the header
merge_column_headers_df = merge_column_headers_df.iloc[1:]
# Write recipe outputs back to dataset
merge_column_headers = dataiku.Dataset("merge_column_headers")
merge_column_headers.write_with_schema(merge_column_headers_df)
This modifies a single dataset, in this example called "merge_column_header" and appends the values from the first row of data to the header row, and then removes the first row of data from the dataset.
Here's the before an after:
Thanks,
Sarina
Hi, @GSung! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community.
Hi @GSung,
For most operations like this to perform a simple data transformation you could use a prepare recipe, which has many built in data transformations. This is normally where you would go to rename individual columns and perform similar steps.
I don't think your particular transformation is handled automatically here, so if you wanted to automate combing the dataset header row with the first row of data, I think you can create a Python recipe like this:
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
merge_column_headers = dataiku.Dataset("merge_column_headers")
merge_column_headers_df = merge_column_headers.get_dataframe()
for col in merge_column_headers_df.columns:
# get first row value for this specific column
first_row = merge_column_headers_df.iloc[0][col]
new_column_name = col + " " + first_row
# rename the column with the existing column header plus the first row of that column's data
merge_column_headers_df.rename(columns={col: new_column_name}, inplace=True)
# delete first row of data, now that it's merged into the header
merge_column_headers_df = merge_column_headers_df.iloc[1:]
# Write recipe outputs back to dataset
merge_column_headers = dataiku.Dataset("merge_column_headers")
merge_column_headers.write_with_schema(merge_column_headers_df)
This modifies a single dataset, in this example called "merge_column_header" and appends the values from the first row of data to the header row, and then removes the first row of data from the dataset.
Here's the before an after:
Thanks,
Sarina
thank you all !