Multiple outputs in a single Python recipe are only writing data from the first dataset

Solved!
rtaylor
Level 3
Multiple outputs in a single Python recipe are only writing data from the first dataset

I am trying to wirte multiple outputs out of a single Python recipe, as such:




# Write recipe outputs
user_exclusions_df = dataiku.Dataset("user_exclusions")
user_exclusions_df.write_schema_from_dataframe(excluded)
user_exclusions_df.write_from_dataframe(excluded)
#user_exclusions_df.write_with_schema(excluded)

manual_touches_df = dataiku.Dataset("manual_touches")
manual_touches_df.write_schema_from_dataframe(man_touches)
manual_touches_df.write_from_dataframe(man_touches)
#manual_touches_df.write_with_schema(man_touches)

employee_assignment_df = dataiku.Dataset("employee_assignment")
employee_assignment_df.write_schema_from_dataframe(z)
employee_assignment_df.write_from_dataframe(z)


The datasets reference already exist, and I have confirmed the dataframes exist and contain the desired data.  I have also tried the following output format:




manual_touches_df = dataiku.Dataset("manual_touches")
manual_touches_df.write_with_schema(man_touches)


In either case, the first dataset written this way appears correctly, with the correct schema and data.  However subsequent datasets, while having the correct columns, have data from the first dataframe insert instead of the data from the dataframe actually reference in the code.



This is occurring in both recipes and notebooks.



 

1 Solution
Alan_Fusté
Level 3
Hello again rtaylor,

Maybe you can try one thing: if you have got just 1 recipe go to Advanced and set Concurrent activities to 1.
It seems like accessing DSS generates one thread for each output dataset and they access at the same time to the postgresql dataset and it breakes... I'm not sure if DSS works like that (maybe Clément Stenac knows that) and I've never used postgresql dataset in DSS.

Another solution, with isolation, is to create a scenario for running all these datasets (place them in different steps and don't a step if the previous one failed). With this configuration you avoid the problem.

I use to do scenarios to execute whole projects, when you get used to wrok with them, you save time 🙂

View solution in original post

11 Replies

Labels

?
Labels (3)
A banner prompting to get Dataiku