We're excited to announce that we're launching the second installment of Dataiku Product Days Register Now

How to combine two rows of Header into one

Solved!
GSung
Level 2
How to combine two rows of Header into one

What recipe should I use to change the header from (it's now in two different rows)

Business 1Business 2 
ABCXYZ

 

to just one row as a header

Business 1 ABC

Business 2 XYZ

 

0 Kudos
1 Solution
SarinaS
Dataiker
Dataiker

Hi @GSung,

For most operations like this to perform a simple data transformation you could use a prepare recipe, which has many built in data transformations.  This is normally where you would go to rename individual columns and perform similar steps.   

I don't think your particular transformation is handled automatically here, so if you wanted to automate combing the dataset header row with the first row of data, I think you can create a Python recipe like this:  

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
merge_column_headers = dataiku.Dataset("merge_column_headers")
merge_column_headers_df = merge_column_headers.get_dataframe()

for col in merge_column_headers_df.columns:
    # get first row value for this specific column
    first_row = merge_column_headers_df.iloc[0][col]
    new_column_name = col + " " + first_row 
    # rename the column with the existing column header plus the first row of that column's data 
    merge_column_headers_df.rename(columns={col: new_column_name}, inplace=True)

# delete first row of data, now that it's merged into the header 
merge_column_headers_df = merge_column_headers_df.iloc[1:]

# Write recipe outputs back to dataset 
merge_column_headers = dataiku.Dataset("merge_column_headers")
merge_column_headers.write_with_schema(merge_column_headers_df)

 

This modifies a single dataset, in this example called "merge_column_header" and appends the values from the first row of data to the header row, and then removes the first row of data from the dataset.  

Here's the before an after:

Screen Shot 2021-02-16 at 2.10.41 PM.png

Screen Shot 2021-02-16 at 2.09.50 PM.png

Thanks,

Sarina 

View solution in original post

3 Replies
CoreyS
Community Manager
Community Manager

Hi, @GSung! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community.

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos
SarinaS
Dataiker
Dataiker

Hi @GSung,

For most operations like this to perform a simple data transformation you could use a prepare recipe, which has many built in data transformations.  This is normally where you would go to rename individual columns and perform similar steps.   

I don't think your particular transformation is handled automatically here, so if you wanted to automate combing the dataset header row with the first row of data, I think you can create a Python recipe like this:  

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
merge_column_headers = dataiku.Dataset("merge_column_headers")
merge_column_headers_df = merge_column_headers.get_dataframe()

for col in merge_column_headers_df.columns:
    # get first row value for this specific column
    first_row = merge_column_headers_df.iloc[0][col]
    new_column_name = col + " " + first_row 
    # rename the column with the existing column header plus the first row of that column's data 
    merge_column_headers_df.rename(columns={col: new_column_name}, inplace=True)

# delete first row of data, now that it's merged into the header 
merge_column_headers_df = merge_column_headers_df.iloc[1:]

# Write recipe outputs back to dataset 
merge_column_headers = dataiku.Dataset("merge_column_headers")
merge_column_headers.write_with_schema(merge_column_headers_df)

 

This modifies a single dataset, in this example called "merge_column_header" and appends the values from the first row of data to the header row, and then removes the first row of data from the dataset.  

Here's the before an after:

Screen Shot 2021-02-16 at 2.10.41 PM.png

Screen Shot 2021-02-16 at 2.09.50 PM.png

Thanks,

Sarina 

View solution in original post

GSung
Level 2
Author

thank you all !

0 Kudos
A banner prompting to get Dataiku DSS