Issue with Python script appending data in Dataiku project

ELACHAR · January 29

Hello,

I have an issue with my Dataiku project. I wrote a Python script that appends new data from the input dataset to the output dataset.

I think the problem may be related to recursion in Dataiku. Could you please suggest a solution?

Thank you in advance!

Turribeach · January 29

So there are two ways of doing what you want. The first method is to use the "Append instead of overwrite" checkbox in your recipe. Different recipe types may have this option in the Inputs/Outputs tab or in the Advanced tab. As you are using a Python recipe this should be shown in the Inputs/Outputs tab as shown by my sample below:

In your case it is not being shown because your recipe is using inputs from different connections or because the connection type doesn't support append mode inserts. So try to use a Sync recipe for your "Lost_Path_Batiment…" dataset to move it to the same connection as the "Listed_Capteurs_Batiment…" and see if that enables the option. If you still don't see the append option then you need to mode your datasets to connection / data technology that supports appending data (most SQL databases do support append mode). You could then use the Sync recipe to bring it back to HDFS. Do keep in mind that the "Append instead of overwrite" checkbox does not guarantee the output dataset will not be dropped. In previous versions of Dataiku a schema change will result in append output datasets being dropped and recreated which means your historical data is lost. Only in the recent v13.1 release Dataiku will default to recipe failure if the output schema needs to be updated leaving you with the task of fixing it. So even if append output datasets don't get dropped they do make project maintenance harder as you can't propogate schema changes automatically through the flow on them.

The second way it's going to surprise @tgb417. In v13 Dataiku added support for recursive flows. This means that it is now possible to have a recipe that reads and writes to its output, among other designs. How to build it:

Create a simple Python recipe that takes an input_dataset and writes it to an output_dataset
Run the recipe so that the output_dataset is populated with the input_dataset contents
Now go to the Python recipe Inputs/Outputs tab and add the output dataset as an input
Modify the recipe code to use the output_dataset as required

Below is a sample Python recipe that reads the output dataset and concats it to the input dataset and writes it back to the output dataset:

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
input_dataset = dataiku.Dataset("input_dataset")
input_dataset_df = input_dataset.get_dataframe()

output_dataset = dataiku.Dataset("output_dataset")
output_dataset_df = output_dataset.get_dataframe()

# Concat both input and output dataset
df_merged = pd.concat([input_dataset_df, output_dataset_df], ignore_index=True, sort=False)

# Write recipe outputs
output_dataset.write_with_schema(df_merged)

This second way is not very efficient since you you end up re-writing all the data in the output dataset every time the recipe runs, rather than just inserting new rows only However it is safer than the first option since it will be able to handle schema changes gracefully. So choose the option that best fit your use case.

tgb417 · January 29

@ELACHAR ,

Welcome to the dataiku community. We are glad to have you join us.

It has been my experience that dataiku does not want a dataset to be both an input and output to a recipie. The system seems to want you to create a new dataset with the appended data.

It is interesting to me that you can even save what you have shown. So this may be something you can do. I’ve just never done this.

Issue with Python script appending data in Dataiku project

Setup Info

Best Answer

Answers

Welcome!

Welcome!

Quick Links

Categories

Sign up to take part

Issue with Python script appending data in Dataiku project

Setup Info

Best Answer

Answers

Welcome!

Welcome!

Quick Links

Categories