Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on August 2, 2019 2:12AM
Likes: 1
Replies: 11
I am trying to wirte multiple outputs out of a single Python recipe, as such:
# Write recipe outputs
user_exclusions_df = dataiku.Dataset("user_exclusions")
user_exclusions_df.write_schema_from_dataframe(excluded)
user_exclusions_df.write_from_dataframe(excluded)
#user_exclusions_df.write_with_schema(excluded)
manual_touches_df = dataiku.Dataset("manual_touches")
manual_touches_df.write_schema_from_dataframe(man_touches)
manual_touches_df.write_from_dataframe(man_touches)
#manual_touches_df.write_with_schema(man_touches)
employee_assignment_df = dataiku.Dataset("employee_assignment")
employee_assignment_df.write_schema_from_dataframe(z)
employee_assignment_df.write_from_dataframe(z)
The datasets reference already exist, and I have confirmed the dataframes exist and contain the desired data. I have also tried the following output format:
manual_touches_df = dataiku.Dataset("manual_touches")
manual_touches_df.write_with_schema(man_touches)
In either case, the first dataset written this way appears correctly, with the correct schema and data. However subsequent datasets, while having the correct columns, have data from the first dataframe insert instead of the data from the dataframe actually reference in the code.
This is occurring in both recipes and notebooks.
Hi rtaylor,
Have you added all the outputs to the recipe?
I use to do (and it works):
output_dataset = dataiku.Dataset(output_dataset_name, project_key=project_name, ignore_flow=True)
output_dataset_df = input_dataset.get_dataframe()
output_dataset.write_with_schema(output_dataset_df)
# If you want to modify the schema
output_dataset.write_schema(schema)
I tried Alan's answer and ran into an error, please see my follow-up comments.
I also tested splitting the recipe into three copies, and running in parallel. This did not work if the recipes were all run concurrently. However, IF I individually run each recipe in isolation, that appears to work, and output the proper schema and data.
The caveat to this is I am not sure how it will interact with trying to build the entire flow, as the recipes only work when run in isolation. Image below shows the final layout of this portion of the Flow. Each of the final three recipes must be run and allowed to complete before running another.