Learn more about Dataiku 9 and strengthen your Dataiku know-how with Dataiku Product Days! REGISTER NOW

json data manipulation

cbimou
Level 1
json data manipulation

Hello,

I'm trying to fold array multiple json columns look like this :

latitude

longitude

time

["0.2564","-0.5698","1.3256"]

["3.0254","0.3214","2.0326"]

["10h30","10h45","10h50"]

 

I want to have this table:

latitude

longitude

time

0.2564

3.0254

10h30

-0.5698

0.3214

10h45

1.3256

2.0326

10h50

 

Can someone help me please? I have triyed "fold array" method but the results are not what i want.

Thanks for your assistance

Lottie

0 Kudos
2 Replies
CoreyS
Community Manager
Community Manager

Hi, @cbimou! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community.

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos
SarinaS
Dataiker
Dataiker

Hi Lottie,

I think the key here is that this manipulation is more straightforward based on a row-by-row JSON object that contains the latitude, longitude and time for each “record”.

For example, if your data is in the format with rows that look like this:

{“latitude”:”0.2564”,”time”:”10h30”,”longitude”:”3.0254”}
{“latitude”:”-0.5698”,”time”:”10h45”,”longitude”:”0.3214”}
{“latitude”:”1.3256”,”time”:”10h50”,”longitude”:”2.0326”}

Then you can apply the unnest processor to split out individual columns for latitude, longitude and time. Here’s an example of what this would look like: 

Screen Shot 2021-01-22 at 6.12.21 PM.png

If possible, I would suggest trying to transform your incoming data to match the format of the first column shown, so that you can easily add an “unnest” processor step and convert the data accordingly.

While I think this is most cleanly handled upon data ingest or a brief Python recipe, here is an example of how I transformed data in the original format into the above format with processor steps:

  1. Add a “Zip JSON arrays” step and add your three columns as “Input columns”: latitude, longitude and time.  I call the column zipped in this example. This gives you a new column in the following format: 
    [{"latitude":"0.2564","time":"10h30","longitude":"3.0254"}, {"latitude":"-0.5698","time":"10h45","longitude":"0.3214"}, {"latitude":"1.3256","time":"10h50","longitude":"2.0326"}]
  2. Add a “Split and fold” step to the new column zipped with the Separator set to ", ". Note the space after the comma, which allows you to split the records, and not the individual elements within each record.  After this step I have 3 rows, representing the three records in this test dataset. 
  3. I add a step at this point to remove the original columns latitude, longitude and time
  4. I added two formula steps to my zipped column to remove the leading and trailing [ and ] characters in the first and final rows of the dataset. These are the two formula steps that I added:

    if(zipped[0] == '[', substring(zipped, 1), zipped)

    if(zipped[length(zipped) -1] == ']', substring(zipped, 0, length(zipped) -1 ), zipped)

  5. Now my zipped column matches the format of the column in my initial screenshot, and you can go ahead and apply the unnest step to create the final three latitude, longitude and time columns.   

 

Thanks,

Sarina 

 

0 Kudos
A banner prompting to get Dataiku DSS