json data manipulation

cbimou
cbimou Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 12 Partner

Hello,

I'm trying to fold array multiple json columns look like this :

latitude

longitude

time

["0.2564","-0.5698","1.3256"]

["3.0254","0.3214","2.0326"]

["10h30","10h45","10h50"]

I want to have this table:

latitude

longitude

time

0.2564

3.0254

10h30

-0.5698

0.3214

10h45

1.3256

2.0326

10h50

Can someone help me please? I have triyed "fold array" method but the results are not what i want.

Thanks for your assistance

Lottie

Answers

  • CoreyS
    CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭

    Hi, @cbimou
    ! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community.

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
    edited July 17

    Hi Lottie,

    I think the key here is that this manipulation is more straightforward based on a row-by-row JSON object that contains the latitude, longitude and time for each “record”.

    For example, if your data is in the format with rows that look like this:

    {“latitude”:”0.2564”,”time”:”10h30”,”longitude”:”3.0254”}
    {“latitude”:”-0.5698”,”time”:”10h45”,”longitude”:”0.3214”}
    {“latitude”:”1.3256”,”time”:”10h50”,”longitude”:”2.0326”}
    

    Then you can apply the unnest processor to split out individual columns for latitude, longitude and time. Here’s an example of what this would look like: 

    Screen Shot 2021-01-22 at 6.12.21 PM.png

    If possible, I would suggest trying to transform your incoming data to match the format of the first column shown, so that you can easily add an “unnest” processor step and convert the data accordingly.

    While I think this is most cleanly handled upon data ingest or a brief Python recipe, here is an example of how I transformed data in the original format into the above format with processor steps:

    1. Add a “Zip JSON arrays” step and add your three columns as “Input columns”: latitude, longitude and time. I call the column zipped in this example. This gives you a new column in the following format:
      [{"latitude":"0.2564","time":"10h30","longitude":"3.0254"}, {"latitude":"-0.5698","time":"10h45","longitude":"0.3214"}, {"latitude":"1.3256","time":"10h50","longitude":"2.0326"}]
    2. Add a “Split and fold” step to the new column zipped with the Separator set to ", ". Note the space after the comma, which allows you to split the records, and not the individual elements within each record. After this step I have 3 rows, representing the three records in this test dataset.
    3. I add a step at this point to remove the original columns latitude, longitude and time
    4. I added two formula steps to my zipped column to remove the leading and trailing [ and ] characters in the first and final rows of the dataset. These are the two formula steps that I added:

      if(zipped[0] == '[', substring(zipped, 1), zipped)

      if(zipped[length(zipped) -1] == ']', substring(zipped, 0, length(zipped) -1 ), zipped)

    5. Now my zipped column matches the format of the column in my initial screenshot, and you can go ahead and apply the unnest step to create the final three latitude, longitude and time columns.

    Thanks,

    Sarina

Setup Info
    Tags
      Help me…