Convert columns to rows

pnaik1
pnaik1 Registered Posts: 23 ✭✭✭✭

I want to convert the table:

Cat2020_Prev2020_Curr2021_Prev2021_Curr
A592119
B7131711

to something like:

CatVal20202021
APrev521
ACurr919
BPrev717
BCurr1311

Can you please let me know which recipe to use and how?

Thanks!!!

Answers

  • CoreyS
    CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭

    Hi @pnaik1
    , you would be able to achieve this with a Prepare Recipe. In this case you would utilize a Reshaping Processor and select Fold an array.

    This processor takes a column containing arrays (in JSON) and transforms the array into several rows.

    Each generated row contains a single value from the input array. All other columns are copied in each generated row.

    Fun fact, if you wanted to do the opposite you would utilize the Transpose rows to columns reshaping processor.

    I hope this helps!

  • pnaik1
    pnaik1 Registered Posts: 23 ✭✭✭✭

    Unfortunately it is not working. Fold an array recipe is directly giving empty rows when I am filling in parameters and also doesn't make sense as I don't have an column containing array.

    Closest I can reach to my result table is through "Fold Multiple Columns" recipe but that's also giving me 8 rows rather than 4 rows. If possible, can you please provide workflow for achieving the result table.

  • Vitaliy
    Vitaliy Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 102 Dataiker
    edited July 17

    Hi, when you look at the requirement, it is a pretty simple table transformation. However, to make this transformation is complex enough. The one way to do it is to use multiple prepare recipes to create multiple datasets and then join them, but there is a quicker way. Prepare recipe has very powerful "Python function" step so with a bit of Python coding skills and "Python function" step in "rows" mode we can achieve this in one prepare recipe and two steps. In the first step, we will process each row's data and create new rows with new columns and values based on the requirement. In the second step, we just simply remove unwanted original rows.


    Below is the Python code I created for this and the screenshots of the output and python code step:

    def process(row):
    ret = []
    prev_row = {"Cat": row["Cat"], "Val": "Prev"}
    curr_row = {"Cat": row["Cat"], "Val": "Curr"}
    for col in row:
    if "_" in col:
    valarr = col.split("_")
    if valarr[1] == "Prev":
    prev_row[valarr[0]] = row[col]
    if valarr[1] == "Curr":
    curr_row[valarr[0]] = row[col]
    ret.append(prev_row)
    ret.append(curr_row)
    return ret

    Screenshot 2021-05-11 at 22.29.53.png

    And then use list view and delete action to remove original unwanted columns.

    Screenshot 2021-05-11 at 22.35.16.png

    Screenshot 2021-05-11 at 22.32.44.pngI hope the above is clear.

    Best regards,
    Vitaliy

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Great solution!

    I was able to find a solution using the prepare recipe plus the pivot recipe, but this one is much simpler.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @CoreyS
    , @VitaliyD
    , @Ignacio_Toledo

    I have used the Python Function recipes steps from time to time. I find them very powerful as well. However, I find them a bit confusing to correctly setup. I've only worked out some of the options to use this. Are there any good training material or videos on this tool?

  • CoreyS
    CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭

    Great question @tgb417
    ! Custom Python Function in the Prepare Recipe is actually covered in the Advanced Prepare Recipe Usage course in the Dataiku Academy.

    A readout of that can be found in the Knowledge Base as well.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @CoreyS
    ,

    Thanks that is helpful. However, I suspect that this is even more powerful than I understand and this explanation helps with it. I would love to go a bit deeper. In terms of use cases to create the Python Functions.

Setup Info
    Tags
      Help me…