Data not appending in dataset through python - outside dss
Hi,
I am trying to append the data in my current dataset.
Tried with all method (write_row_dict, write_tuple, write_dataframe).
While executing its shows 1 rows successfully written, but when I go and check dataiku application and check dataset only one row available.
Can any one guide on this ? how to resolve this ?
Thanks,
Answers
-
-
Hello,
If you're using a Python Recipe in the flow you may want to try setting the output dataset to 'Append instead of overwrite' under 'Inputs/Outputs'. This will add new rows to the output dataset every time the recipe is run instead of overwriting the data.
You can also use a Pandas data frame append mode in your Python code to append data: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html. Since we can directly write a dataframe to a dataset in DSS, we can append the new rows to the dataframe in the Python code and then write that dataframe to the dataset. I've included a small sample below that illustrates this. It takes a dataset (test_csv_df) and then appends it to itself in a separate dataframe. The appended data dataframe is then written back to another dataset in DSS.
# -*- coding: utf-8 -*- import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu # Read recipe inputs test_csv = dataiku.Dataset("Test-csv") test_csv_df = test_csv.get_dataframe() temp_df = test_csv_df.append(test_csv_df) # Write recipe outputs temp_df = dataiku.Dataset("collapsed-data") temp_df.write_with_schema(collapsed_data_df)
Hope this helps!
Andrew M
-
I am trying to append data in input data set. Can we append in input data set
data = {'xx:'xxx','key':'1212'}
df = pandas.DataFrame(pandas.json_normalize([data]);
writer = dataiku.core.Dataset(dataSetName,projectKey);
vp = dataiku.core.dataset_write.DatasetWriter(writer);
vp.write_dataframe(df);
No Error. It's showing 1 rows successfully written
Thanks
-
Hello,
you could get your output dataset as a dataframe and append the new data to it.import dataiku import pandas as pd output_dataset = dataiku.Dataset("out") out_df = output_dataset.get_dataframe() data = [ {'xx':'row 1','key':'1212'}, {'xx':'row 2','key':'1212'}, ] df_to_append = pd.json_normalize(data) with out_df.get_writer() as writer: writer.write_dataframe(out_df.append(df_to_append))
Hope this solves your issue
Alex