Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
hello everyone, I have a JSON file with that structure
{
"Outputs": {
"Active": [
true,
false,
false,
true,
true,
true
],
"DefaultRevision": [
aa,
ee,
ee,
cc,
ff,
ww
],
"EffectiveDateStart": [
"1930-07-01T00:00:00",
"1940-06-01T00:00:00",
"1900-11-01T00:00:00",
"1910-05-01T00:00:00",
"1960-01-01T00:00:00",
"1960-03-01T00:00:00"
],...
How could I handle the import to put it into a dataset with one value into each row and keys (Active, DefaultRevision, EffectiveDateStart) in columns?
I have to import into one single row and then process it with a recipe?
Thanks
Indeed the ideas is to import as single row and fold the array. In most cases using the https://doc.dataiku.com/dss/9.0/preparation/processors/array-fold.html would be enough and would look like this :
However, in your example, if I understand it correctly you have 3 array you want to fold. If you try to apply the array fold processor for the other columns it will perform a combination of all values.
One way to solve this and while still using a visual recipe would mean to split the dataset into 3 separate datasets and concat with pandas via python recipe.
Here is what my flow looks like :
Prepare recipe for each row:
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
dataset1 = dataiku.Dataset("json_1")
df_1 = sample_json_prepared.get_dataframe()
dataset2 = dataiku.Dataset("json_2")
df_2 = sample_json_row2.get_dataframe()
dataset3 = dataiku.Dataset("json_2")
df_3 = sample_json_row2.get_dataframe()
# Compute recipe outputs
# TODO: Write here your actual code that computes the outputs
# NB: DSS supports several kinds of APIs for reading and writing data. Please see doc.
merged_output_df = result = pd.concat([df_1, df_2, df_3], axis=1)
# Write recipe outputs
merged_output = dataiku.Dataset("merged_output")
merged_output.write_with_schema(merged_output_df)
Let me know if this helps or if you have any questions.
Indeed the ideas is to import as single row and fold the array. In most cases using the https://doc.dataiku.com/dss/9.0/preparation/processors/array-fold.html would be enough and would look like this :
However, in your example, if I understand it correctly you have 3 array you want to fold. If you try to apply the array fold processor for the other columns it will perform a combination of all values.
One way to solve this and while still using a visual recipe would mean to split the dataset into 3 separate datasets and concat with pandas via python recipe.
Here is what my flow looks like :
Prepare recipe for each row:
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
dataset1 = dataiku.Dataset("json_1")
df_1 = sample_json_prepared.get_dataframe()
dataset2 = dataiku.Dataset("json_2")
df_2 = sample_json_row2.get_dataframe()
dataset3 = dataiku.Dataset("json_2")
df_3 = sample_json_row2.get_dataframe()
# Compute recipe outputs
# TODO: Write here your actual code that computes the outputs
# NB: DSS supports several kinds of APIs for reading and writing data. Please see doc.
merged_output_df = result = pd.concat([df_1, df_2, df_3], axis=1)
# Write recipe outputs
merged_output = dataiku.Dataset("merged_output")
merged_output.write_with_schema(merged_output_df)
Let me know if this helps or if you have any questions.
Good idea splitting the dataset, otherwise if I gain network access (from my dataiku instance) to RESTFUL web service that produce this response I could try to remap with python recipe... Even if now I don't know how to achieve it 🙂 but thank you for putting me in the right direction
wouldn't a stack recipe work just as well as the python code?