Handling json list of dictionaries
hello everyone, I have a JSON file with that structure
{
"Outputs": {
"Active": [
true,
false,
false,
true,
true,
true
],
"DefaultRevision": [
aa,
ee,
ee,
cc,
ff,
ww
],
"EffectiveDateStart": [
"1930-07-01T00:00:00",
"1940-06-01T00:00:00",
"1900-11-01T00:00:00",
"1910-05-01T00:00:00",
"1960-01-01T00:00:00",
"1960-03-01T00:00:00"
],...
How could I handle the import to put it into a dataset with one value into each row and keys (Active, DefaultRevision, EffectiveDateStart) in columns?
I have to import into one single row and then process it with a recipe?
Thanks
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Indeed the ideas is to import as single row and fold the array. In most cases using the https://doc.dataiku.com/dss/9.0/preparation/processors/array-fold.html would be enough and would look like this :
However, in your example, if I understand it correctly you have 3 array you want to fold. If you try to apply the array fold processor for the other columns it will perform a combination of all values.
One way to solve this and while still using a visual recipe would mean to split the dataset into 3 separate datasets and concat with pandas via python recipe.
Here is what my flow looks like :
Prepare recipe for each row:
# -*- coding: utf-8 -*- import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu # Read recipe inputs dataset1 = dataiku.Dataset("json_1") df_1 = sample_json_prepared.get_dataframe() dataset2 = dataiku.Dataset("json_2") df_2 = sample_json_row2.get_dataframe() dataset3 = dataiku.Dataset("json_2") df_3 = sample_json_row2.get_dataframe() # Compute recipe outputs # TODO: Write here your actual code that computes the outputs # NB: DSS supports several kinds of APIs for reading and writing data. Please see doc. merged_output_df = result = pd.concat([df_1, df_2, df_3], axis=1) # Write recipe outputs merged_output = dataiku.Dataset("merged_output") merged_output.write_with_schema(merged_output_df)
Let me know if this helps or if you have any questions.
Answers
-
Good idea splitting the dataset, otherwise if I gain network access (from my dataiku instance) to RESTFUL web service that produce this response I could try to remap with python recipe... Even if now I don't know how to achieve it
but thank you for putting me in the right direction -
wouldn't a stack recipe work just as well as the python code?