Handling json list of dictionaries

Maurip00
Maurip00 Registered Posts: 5 ✭✭✭

hello everyone, I have a JSON file with that structure

{
"Outputs": {
"Active": [
true,
false,
false,
true,
true,
true
],
"DefaultRevision": [
aa,
ee,
ee,
cc,
ff,
ww
],
"EffectiveDateStart": [
"1930-07-01T00:00:00",
"1940-06-01T00:00:00",
"1900-11-01T00:00:00",
"1910-05-01T00:00:00",
"1960-01-01T00:00:00",
"1960-03-01T00:00:00"
],...

How could I handle the import to put it into a dataset with one value into each row and keys (Active, DefaultRevision, EffectiveDateStart) in columns?

I have to import into one single row and then process it with a recipe?

Thanks

Best Answer

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
    edited July 17 Answer ✓

    Indeed the ideas is to import as single row and fold the array. In most cases using the https://doc.dataiku.com/dss/9.0/preparation/processors/array-fold.html would be enough and would look like this :

    Screenshot 2021-10-12 at 18.51.48.png

    However, in your example, if I understand it correctly you have 3 array you want to fold. If you try to apply the array fold processor for the other columns it will perform a combination of all values.

    One way to solve this and while still using a visual recipe would mean to split the dataset into 3 separate datasets and concat with pandas via python recipe.

    Here is what my flow looks like :

    Screenshot 2021-10-12 at 20.15.58.png

    Prepare recipe for each row:

    Screenshot 2021-10-12 at 20.11.56.png

    Screenshot 2021-10-12 at 20.16.04.png

    # -*- coding: utf-8 -*-
    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    
    # Read recipe inputs
    dataset1 = dataiku.Dataset("json_1")
    df_1 = sample_json_prepared.get_dataframe()
    dataset2 = dataiku.Dataset("json_2")
    df_2 = sample_json_row2.get_dataframe()
    dataset3 = dataiku.Dataset("json_2")
    df_3 = sample_json_row2.get_dataframe()
    
    
    
    # Compute recipe outputs
    # TODO: Write here your actual code that computes the outputs
    # NB: DSS supports several kinds of APIs for reading and writing data. Please see doc.
    
    merged_output_df = result = pd.concat([df_1, df_2, df_3], axis=1)
    
    
    # Write recipe outputs
    merged_output = dataiku.Dataset("merged_output")
    merged_output.write_with_schema(merged_output_df)
    

    Let me know if this helps or if you have any questions.

Answers

  • Maurip00
    Maurip00 Registered Posts: 5 ✭✭✭

    Good idea splitting the dataset, otherwise if I gain network access (from my dataiku instance) to RESTFUL web service that produce this response I could try to remap with python recipe... Even if now I don't know how to achieve it but thank you for putting me in the right direction

  • chi_wong
    chi_wong Registered Posts: 5 ✭✭✭✭

    wouldn't a stack recipe work just as well as the python code?

Setup Info
    Tags
      Help me…