The Dataiku Frontrunner Awards have launched to recognize your achievements! SUBMIT YOUR ENTRY

Convert columns to rows

pnaik1
Level 2
Convert columns to rows

I want to convert the table:

Cat2020_Prev2020_Curr2021_Prev2021_Curr
A592119
B7131711

 

to something like:

CatVal20202021
APrev521
ACurr919
BPrev717
BCurr1311

 

Can you please let me know which recipe to use and how?

Thanks!!!

0 Kudos
7 Replies
CoreyS
Community Manager
Community Manager

Hi @pnaik1, you would be able to achieve this with a Prepare Recipe. In this case you would utilize a Reshaping Processor and select Fold an array.

This processor takes a column containing arrays (in JSON) and transforms the array into several rows.

Each generated row contains a single value from the input array. All other columns are copied in each generated row.

Fun fact, if you wanted to do the opposite you would utilize the Transpose rows to columns reshaping processor.

I hope this helps!

 

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
pnaik1
Level 2
Author

Unfortunately it is not working. Fold an array recipe is directly giving empty rows when I am filling in parameters and also doesn't make sense as I don't have an column containing array. 

Closest I can reach to my result table is through "Fold Multiple Columns" recipe but that's also giving me 8 rows rather than 4 rows. If possible, can you please provide workflow for achieving the result table.

0 Kudos
VitaliyD
Dataiker
Dataiker

Hi, when you look at the requirement, it is a pretty simple table transformation. However, to make this transformation is complex enough. The one way to do it is to use multiple prepare recipes to create multiple datasets and then join them, but there is a quicker way. Prepare recipe has very powerful "Python function" step so with a bit of Python coding skills and "Python function" step in "rows" mode we can achieve this in one prepare recipe and two steps. In the first step, we will process each row's data and create new rows with new columns and values based on the requirement. In the second step, we just simply remove unwanted original rows.


Below is the Python code I created for this and the screenshots of the output and python code step:

 

def process(row):
ret = []
prev_row = {"Cat": row["Cat"], "Val": "Prev"}
curr_row = {"Cat": row["Cat"], "Val": "Curr"}
for col in row:
if "_" in col:
valarr = col.split("_")
if valarr[1] == "Prev":
prev_row[valarr[0]] = row[col]
if valarr[1] == "Curr":
curr_row[valarr[0]] = row[col]
ret.append(prev_row)
ret.append(curr_row)
return ret

 

Screenshot 2021-05-11 at 22.29.53.png

And then use list view and delete action to remove original unwanted columns.

Screenshot 2021-05-11 at 22.35.16.png

Screenshot 2021-05-11 at 22.32.44.pngI hope the above is clear.

Best regards,
Vitaliy

Ignacio_Toledo

Great solution!

I was able to find a solution using the prepare recipe plus the pivot recipe, but this one is much simpler. 

0 Kudos
tgb417
Neuron
Neuron

@CoreyS@VitaliyD , @Ignacio_Toledo

I have used the Python Function recipes steps from time to time. I find them very powerful as well.  However, I find them a bit confusing to correctly setup.  I've only worked out some of the options to use this.  Are there any good training material or videos on this tool?

 

--Tom
CoreyS
Community Manager
Community Manager

Great question @tgb417! Custom Python Function in the Prepare Recipe is actually covered in the Advanced Prepare Recipe Usage course in the Dataiku Academy.

A readout of that can be found in the Knowledge Base as well.

 

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
tgb417
Neuron
Neuron

@CoreyS ,

Thanks that is helpful.  However, I suspect that this is even more powerful than I understand and this explanation helps with it.  I would love to go a bit deeper.  In terms of use cases to create the Python Functions. 

--Tom
0 Kudos
A banner prompting to get Dataiku DSS
Public