We're excited to announce that we're launching the second installment of Dataiku Product Days Register Now

Handling json list of dictionaries

Solved!
Maurip00
Level 1
Handling json list of dictionaries

hello everyone, I have a JSON file with that structure

{
"Outputs": {
"Active": [
true,
false,
false,
true,
true,
true
],
"DefaultRevision": [
aa,
ee,
ee,
cc,
ff,
ww
],
"EffectiveDateStart": [
"1930-07-01T00:00:00",
"1940-06-01T00:00:00",
"1900-11-01T00:00:00",
"1910-05-01T00:00:00",
"1960-01-01T00:00:00",
"1960-03-01T00:00:00"
],...

How could I handle the import to put it into a dataset with one value into each row and keys (Active, DefaultRevision, EffectiveDateStart) in columns?

I have to import into one single row and then process it with a recipe?

Thanks

0 Kudos
1 Solution
AlexT
Dataiker
Dataiker

Indeed the ideas is to import as single row and fold the array. In most cases using the https://doc.dataiku.com/dss/9.0/preparation/processors/array-fold.html would be enough and would look like this :

Screenshot 2021-10-12 at 18.51.48.png

However, in your example, if I understand it correctly you have 3 array you want to fold. If you try to apply the array fold processor for the other columns it will perform a combination of all values. 

One way to solve this and while still using a visual recipe would mean to split the dataset into 3 separate datasets and concat with pandas via python recipe.

Here is what my flow looks like : 

Screenshot 2021-10-12 at 20.15.58.png

Prepare recipe for each row: 

Screenshot 2021-10-12 at 20.11.56.png

 

Screenshot 2021-10-12 at 20.16.04.png

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
dataset1 = dataiku.Dataset("json_1")
df_1 = sample_json_prepared.get_dataframe()
dataset2 = dataiku.Dataset("json_2")
df_2 = sample_json_row2.get_dataframe()
dataset3 = dataiku.Dataset("json_2")
df_3 = sample_json_row2.get_dataframe()



# Compute recipe outputs
# TODO: Write here your actual code that computes the outputs
# NB: DSS supports several kinds of APIs for reading and writing data. Please see doc.

merged_output_df = result = pd.concat([df_1, df_2, df_3], axis=1)


# Write recipe outputs
merged_output = dataiku.Dataset("merged_output")
merged_output.write_with_schema(merged_output_df)

 

Let me know if this helps or if you have any questions.

View solution in original post

0 Kudos
2 Replies
AlexT
Dataiker
Dataiker

Indeed the ideas is to import as single row and fold the array. In most cases using the https://doc.dataiku.com/dss/9.0/preparation/processors/array-fold.html would be enough and would look like this :

Screenshot 2021-10-12 at 18.51.48.png

However, in your example, if I understand it correctly you have 3 array you want to fold. If you try to apply the array fold processor for the other columns it will perform a combination of all values. 

One way to solve this and while still using a visual recipe would mean to split the dataset into 3 separate datasets and concat with pandas via python recipe.

Here is what my flow looks like : 

Screenshot 2021-10-12 at 20.15.58.png

Prepare recipe for each row: 

Screenshot 2021-10-12 at 20.11.56.png

 

Screenshot 2021-10-12 at 20.16.04.png

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
dataset1 = dataiku.Dataset("json_1")
df_1 = sample_json_prepared.get_dataframe()
dataset2 = dataiku.Dataset("json_2")
df_2 = sample_json_row2.get_dataframe()
dataset3 = dataiku.Dataset("json_2")
df_3 = sample_json_row2.get_dataframe()



# Compute recipe outputs
# TODO: Write here your actual code that computes the outputs
# NB: DSS supports several kinds of APIs for reading and writing data. Please see doc.

merged_output_df = result = pd.concat([df_1, df_2, df_3], axis=1)


# Write recipe outputs
merged_output = dataiku.Dataset("merged_output")
merged_output.write_with_schema(merged_output_df)

 

Let me know if this helps or if you have any questions.

View solution in original post

0 Kudos
Maurip00
Level 1
Author

Good idea splitting the dataset, otherwise if I gain network access (from my dataiku instance) to RESTFUL web service that produce this response I could try to remap with python recipe... Even if now I don't know how to achieve it 🙂 but thank you for putting me in the right direction

0 Kudos
A banner prompting to get Dataiku DSS