json data manipulation

cbimou
Level 2
json data manipulation

Hello,

I'm trying to fold array multiple json columns look like this :

latitude

longitude

time

["0.2564","-0.5698","1.3256"]

["3.0254","0.3214","2.0326"]

["10h30","10h45","10h50"]

 

I want to have this table:

latitude

longitude

time

0.2564

3.0254

10h30

-0.5698

0.3214

10h45

1.3256

2.0326

10h50

 

Can someone help me please? I have triyed "fold array" method but the results are not what i want.

Thanks for your assistance

Lottie

0 Kudos
2 Replies
CoreyS
Dataiker Alumni

Hi, @cbimou! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if youโ€™ve tried any fixes already?This should lead to a quicker response from the community.

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as โ€˜Accepted Solutionโ€™ to help others like you!
0 Kudos
SarinaS
Dataiker

Hi Lottie,

I think the key here is that this manipulation is more straightforward based on a row-by-row JSON object that contains the latitude, longitude and time for each โ€œrecordโ€.

For example, if your data is in the format with rows that look like this:

{โ€œlatitudeโ€:โ€0.2564โ€,โ€timeโ€:โ€10h30โ€,โ€longitudeโ€:โ€3.0254โ€}
{โ€œlatitudeโ€:โ€-0.5698โ€,โ€timeโ€:โ€10h45โ€,โ€longitudeโ€:โ€0.3214โ€}
{โ€œlatitudeโ€:โ€1.3256โ€,โ€timeโ€:โ€10h50โ€,โ€longitudeโ€:โ€2.0326โ€}

Then you can apply the unnest processor to split out individual columns for latitude, longitude and time. Hereโ€™s an example of what this would look like:โ€ƒ

Screen Shot 2021-01-22 at 6.12.21 PM.png

If possible, I would suggest trying to transform your incoming data to match the format of the first column shown, so that you can easily add an โ€œunnestโ€ processor step and convert the data accordingly.

While I think this is most cleanly handled upon data ingest or a brief Python recipe, here is an example of how I transformed data in the original format into the above format with processor steps:

  1. Add a โ€œZip JSON arraysโ€ step and add your three columns as โ€œInput columnsโ€: latitude, longitude and time.  I call the column zipped in this example. This gives you a new column in the following format: 
    [{"latitude":"0.2564","time":"10h30","longitude":"3.0254"}, {"latitude":"-0.5698","time":"10h45","longitude":"0.3214"}, {"latitude":"1.3256","time":"10h50","longitude":"2.0326"}]
  2. Add a โ€œSplit and foldโ€ step to the new column zipped with the Separator set to ", ". Note the space after the comma, which allows you to split the records, and not the individual elements within each record.  After this step I have 3 rows, representing the three records in this test dataset. 
  3. I add a step at this point to remove the original columns latitude, longitude and time
  4. I added two formula steps to my zipped column to remove the leading and trailing [ and ] characters in the first and final rows of the dataset. These are the two formula steps that I added:

    if(zipped[0] == '[', substring(zipped, 1), zipped)

    if(zipped[length(zipped) -1] == ']', substring(zipped, 0, length(zipped) -1 ), zipped)

  5. Now my zipped column matches the format of the column in my initial screenshot, and you can go ahead and apply the unnest step to create the final three latitude, longitude and time columns.   

 

Thanks,

Sarina 

 

0 Kudos