Handling json list of dictionaries

Maurip00 · ‎10-12-2021

hello everyone, I have a JSON file with that structure

{
"Outputs": {
"Active": [
true,
false,
false,
true,
true,
true
],
"DefaultRevision": [
aa,
ee,
ee,
cc,
ff,
ww
],
"EffectiveDateStart": [
"1930-07-01T00:00:00",
"1940-06-01T00:00:00",
"1900-11-01T00:00:00",
"1910-05-01T00:00:00",
"1960-01-01T00:00:00",
"1960-03-01T00:00:00"
],...

How could I handle the import to put it into a dataset with one value into each row and keys (Active, DefaultRevision, EffectiveDateStart) in columns?

I have to import into one single row and then process it with a recipe?

Thanks

AlexT · ‎10-12-2021

Indeed the ideas is to import as single row and fold the array. In most cases using the https://doc.dataiku.com/dss/9.0/preparation/processors/array-fold.html would be enough and would look like this :

However, in your example, if I understand it correctly you have 3 array you want to fold. If you try to apply the array fold processor for the other columns it will perform a combination of all values.

One way to solve this and while still using a visual recipe would mean to split the dataset into 3 separate datasets and concat with pandas via python recipe.

Here is what my flow looks like :

Prepare recipe for each row:

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
dataset1 = dataiku.Dataset("json_1")
df_1 = sample_json_prepared.get_dataframe()
dataset2 = dataiku.Dataset("json_2")
df_2 = sample_json_row2.get_dataframe()
dataset3 = dataiku.Dataset("json_2")
df_3 = sample_json_row2.get_dataframe()



# Compute recipe outputs
# TODO: Write here your actual code that computes the outputs
# NB: DSS supports several kinds of APIs for reading and writing data. Please see doc.

merged_output_df = result = pd.concat([df_1, df_2, df_3], axis=1)


# Write recipe outputs
merged_output = dataiku.Dataset("merged_output")
merged_output.write_with_schema(merged_output_df)

Let me know if this helps or if you have any questions.

View solution in original post

AlexT · ‎10-12-2021

Indeed the ideas is to import as single row and fold the array. In most cases using the https://doc.dataiku.com/dss/9.0/preparation/processors/array-fold.html would be enough and would look like this :

However, in your example, if I understand it correctly you have 3 array you want to fold. If you try to apply the array fold processor for the other columns it will perform a combination of all values.

One way to solve this and while still using a visual recipe would mean to split the dataset into 3 separate datasets and concat with pandas via python recipe.

Here is what my flow looks like :

Prepare recipe for each row:

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
dataset1 = dataiku.Dataset("json_1")
df_1 = sample_json_prepared.get_dataframe()
dataset2 = dataiku.Dataset("json_2")
df_2 = sample_json_row2.get_dataframe()
dataset3 = dataiku.Dataset("json_2")
df_3 = sample_json_row2.get_dataframe()



# Compute recipe outputs
# TODO: Write here your actual code that computes the outputs
# NB: DSS supports several kinds of APIs for reading and writing data. Please see doc.

merged_output_df = result = pd.concat([df_1, df_2, df_3], axis=1)


# Write recipe outputs
merged_output = dataiku.Dataset("merged_output")
merged_output.write_with_schema(merged_output_df)

Let me know if this helps or if you have any questions.

Maurip00 · ‎10-12-2021

Good idea splitting the dataset, otherwise if I gain network access (from my dataiku instance) to RESTFUL web service that produce this response I could try to remap with python recipe... Even if now I don't know how to achieve it 🙂 but thank you for putting me in the right direction

chi_wong · ‎06-23-2022

wouldn't a stack recipe work just as well as the python code?

Sign up to take part

Handling json list of dictionaries

Handling json list of dictionaries