Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

json data manipulation

Level 2
json data manipulation


I'm trying to fold array multiple json columns look like this :








I want to have this table:














Can someone help me please? I have triyed "fold array" method but the results are not what i want.

Thanks for your assistance


0 Kudos
2 Replies
Dataiker Alumni

Hi, @cbimou! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community.

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos

Hi Lottie,

I think the key here is that this manipulation is more straightforward based on a row-by-row JSON object that contains the latitude, longitude and time for each “record”.

For example, if your data is in the format with rows that look like this:


Then you can apply the unnest processor to split out individual columns for latitude, longitude and time. Here’s an example of what this would look like: 

Screen Shot 2021-01-22 at 6.12.21 PM.png

If possible, I would suggest trying to transform your incoming data to match the format of the first column shown, so that you can easily add an “unnest” processor step and convert the data accordingly.

While I think this is most cleanly handled upon data ingest or a brief Python recipe, here is an example of how I transformed data in the original format into the above format with processor steps:

  1. Add a “Zip JSON arrays” step and add your three columns as “Input columns”: latitude, longitude and time.  I call the column zipped in this example. This gives you a new column in the following format: 
    [{"latitude":"0.2564","time":"10h30","longitude":"3.0254"}, {"latitude":"-0.5698","time":"10h45","longitude":"0.3214"}, {"latitude":"1.3256","time":"10h50","longitude":"2.0326"}]
  2. Add a “Split and fold” step to the new column zipped with the Separator set to ", ". Note the space after the comma, which allows you to split the records, and not the individual elements within each record.  After this step I have 3 rows, representing the three records in this test dataset. 
  3. I add a step at this point to remove the original columns latitude, longitude and time
  4. I added two formula steps to my zipped column to remove the leading and trailing [ and ] characters in the first and final rows of the dataset. These are the two formula steps that I added:

    if(zipped[0] == '[', substring(zipped, 1), zipped)

    if(zipped[length(zipped) -1] == ']', substring(zipped, 0, length(zipped) -1 ), zipped)

  5. Now my zipped column matches the format of the column in my initial screenshot, and you can go ahead and apply the unnest step to create the final three latitude, longitude and time columns.   





0 Kudos