Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

How to combine two rows of Header into one

Solved!
GSung
Level 3
How to combine two rows of Header into one

What recipe should I use to change the header from (it's now in two different rows)

Business 1Business 2 
ABCXYZ

 

to just one row as a header

Business 1 ABC

Business 2 XYZ

 

0 Kudos
1 Solution
SarinaS
Dataiker
Dataiker

Hi @GSung,

For most operations like this to perform a simple data transformation you could use a prepare recipe, which has many built in data transformations.  This is normally where you would go to rename individual columns and perform similar steps.   

I don't think your particular transformation is handled automatically here, so if you wanted to automate combing the dataset header row with the first row of data, I think you can create a Python recipe like this:  

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
merge_column_headers = dataiku.Dataset("merge_column_headers")
merge_column_headers_df = merge_column_headers.get_dataframe()

for col in merge_column_headers_df.columns:
    # get first row value for this specific column
    first_row = merge_column_headers_df.iloc[0][col]
    new_column_name = col + " " + first_row 
    # rename the column with the existing column header plus the first row of that column's data 
    merge_column_headers_df.rename(columns={col: new_column_name}, inplace=True)

# delete first row of data, now that it's merged into the header 
merge_column_headers_df = merge_column_headers_df.iloc[1:]

# Write recipe outputs back to dataset 
merge_column_headers = dataiku.Dataset("merge_column_headers")
merge_column_headers.write_with_schema(merge_column_headers_df)

 

This modifies a single dataset, in this example called "merge_column_header" and appends the values from the first row of data to the header row, and then removes the first row of data from the dataset.  

Here's the before an after:

Screen Shot 2021-02-16 at 2.10.41 PM.png

Screen Shot 2021-02-16 at 2.09.50 PM.png

Thanks,

Sarina 

View solution in original post

3 Replies
CoreyS
Community Manager
Community Manager

Hi, @GSung! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community.

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos
SarinaS
Dataiker
Dataiker

Hi @GSung,

For most operations like this to perform a simple data transformation you could use a prepare recipe, which has many built in data transformations.  This is normally where you would go to rename individual columns and perform similar steps.   

I don't think your particular transformation is handled automatically here, so if you wanted to automate combing the dataset header row with the first row of data, I think you can create a Python recipe like this:  

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
merge_column_headers = dataiku.Dataset("merge_column_headers")
merge_column_headers_df = merge_column_headers.get_dataframe()

for col in merge_column_headers_df.columns:
    # get first row value for this specific column
    first_row = merge_column_headers_df.iloc[0][col]
    new_column_name = col + " " + first_row 
    # rename the column with the existing column header plus the first row of that column's data 
    merge_column_headers_df.rename(columns={col: new_column_name}, inplace=True)

# delete first row of data, now that it's merged into the header 
merge_column_headers_df = merge_column_headers_df.iloc[1:]

# Write recipe outputs back to dataset 
merge_column_headers = dataiku.Dataset("merge_column_headers")
merge_column_headers.write_with_schema(merge_column_headers_df)

 

This modifies a single dataset, in this example called "merge_column_header" and appends the values from the first row of data to the header row, and then removes the first row of data from the dataset.  

Here's the before an after:

Screen Shot 2021-02-16 at 2.10.41 PM.png

Screen Shot 2021-02-16 at 2.09.50 PM.png

Thanks,

Sarina 

GSung
Level 3
Author

thank you all !

0 Kudos