Convert columns to rows
I want to convert the table:
Cat | 2020_Prev | 2020_Curr | 2021_Prev | 2021_Curr |
A | 5 | 9 | 21 | 19 |
B | 7 | 13 | 17 | 11 |
to something like:
Cat | Val | 2020 | 2021 |
A | Prev | 5 | 21 |
A | Curr | 9 | 19 |
B | Prev | 7 | 17 |
B | Curr | 13 | 11 |
Can you please let me know which recipe to use and how?
Thanks!!!
Answers
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
Hi @pnaik1
, you would be able to achieve this with a Prepare Recipe. In this case you would utilize a Reshaping Processor and select Fold an array.This processor takes a column containing arrays (in JSON) and transforms the array into several rows.
Each generated row contains a single value from the input array. All other columns are copied in each generated row.
Fun fact, if you wanted to do the opposite you would utilize the Transpose rows to columns reshaping processor.
I hope this helps!
-
Unfortunately it is not working. Fold an array recipe is directly giving empty rows when I am filling in parameters and also doesn't make sense as I don't have an column containing array.
Closest I can reach to my result table is through "Fold Multiple Columns" recipe but that's also giving me 8 rows rather than 4 rows. If possible, can you please provide workflow for achieving the result table.
-
Hi, when you look at the requirement, it is a pretty simple table transformation. However, to make this transformation is complex enough. The one way to do it is to use multiple prepare recipes to create multiple datasets and then join them, but there is a quicker way. Prepare recipe has very powerful "Python function" step so with a bit of Python coding skills and "Python function" step in "rows" mode we can achieve this in one prepare recipe and two steps. In the first step, we will process each row's data and create new rows with new columns and values based on the requirement. In the second step, we just simply remove unwanted original rows.
Below is the Python code I created for this and the screenshots of the output and python code step:def process(row): ret = [] prev_row = {"Cat": row["Cat"], "Val": "Prev"} curr_row = {"Cat": row["Cat"], "Val": "Curr"} for col in row: if "_" in col: valarr = col.split("_") if valarr[1] == "Prev": prev_row[valarr[0]] = row[col] if valarr[1] == "Curr": curr_row[valarr[0]] = row[col] ret.append(prev_row) ret.append(curr_row) return ret
And then use list view and delete action to remove original unwanted columns.
I hope the above is clear.
Best regards,
Vitaliy -
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 412 Neuron
Great solution!
I was able to find a solution using the prepare recipe plus the pivot recipe, but this one is much simpler.
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
@CoreyS
, @VitaliyD
, @Ignacio_ToledoI have used the Python Function recipes steps from time to time. I find them very powerful as well. However, I find them a bit confusing to correctly setup. I've only worked out some of the options to use this. Are there any good training material or videos on this tool?
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
Great question @tgb417
! Custom Python Function in the Prepare Recipe is actually covered in the Advanced Prepare Recipe Usage course in the Dataiku Academy.A readout of that can be found in the Knowledge Base as well.
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
@CoreyS
,Thanks that is helpful. However, I suspect that this is even more powerful than I understand and this explanation helps with it. I would love to go a bit deeper. In terms of use cases to create the Python Functions.