Community Conundrums are live! Learn more

How to add data to a existing dataset with python?

Dataiker
Dataiker
How to add data to a existing dataset with python?
I have data set by name weather_data , i want to add data everyday to this dataset

How can i do this with python?
0 Kudos
5 Replies
Dataiker
Dataiker
You may want to use

- the append mode on the output dataset. This setting is available in the input/output tab of the recipe => The data with be append in the output dataset. Note that this mode is only available on recipe with output dataset using an infrastructure allowing append (e.g. it is not possible with HDFS)

- the partition on the output dataset. Each day you will write your weather data in the partition of that day on the dataset. This mode works whatever the connection. See https://doc.dataiku.com/dss/latest/partitions/index.html for more details.
0 Kudos
Level 1

Hi,

 

That's one of the only topic I found, and I have the same problem as @UserBird.

I would like to add one row to an existing dataset with a python recipe.

I'm looking for examples on the Internet and I can't find any... This is what I would like to do :

 

input_dataset = dataiku.Dataset("inter")
output_dataset = dataiku.Dataset("inter_temp")
foobar="foobar"

output_dataset.iter_rows(columns='my_column', values=foobar)
##Or something else but it should be very easy and I can't find a way...

 

 

If anyone has an answer, it would be gladly appreciated !

Have a good day.

0 Kudos
Dataiker
Dataiker

Hi,

I would suggest reading the input dataset in as a Pandas dataframe, handling the append in the dataframe itself, and then writing the resulting dataframe (in overwrite mode) into your output dataset. 

For example, something like:

 

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
inter = dataiku.Dataset("inter")
input_df = inter.get_dataframe()

# Create dataframe containing row you want to append
append_row = {'my_column': ['foobar']}
append_df = pd.DataFrame(data=append_row)

# Append row to input dataframe
output_df = input_df.append(append_df)

# Write recipe outputs
inter_temp = dataiku.Dataset("inter_temp")
inter_temp.write_with_schema(output_df)

 

I hope that this helps! I would also suggest checking out the following Pandas documentation, which provides more examples and details about how to use DataFrame.append:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html

Best,

Andrew

0 Kudos
Level 1

Hi ATsao,

 

Thanks for your reply it could help me in the future. I found a way without dataframe. Post here for those who whants an exemple of write_row_dict as I wanted yesterday.

 

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku.core.sql import SQLExecutor2
from dataiku import pandasutils as pdu

# On lit le schema de la BDD en entrée et on le copie dans le dataset temporaire
input_dataset = dataiku.Dataset("dataset")
schema = input_dataset.read_schema()
output_dataset = dataiku.Dataset("dataset_temp")
output_dataset.write_schema(schema)

##Il faut ensuite ouvrir un writer pour ajouter des lignes

try :
    writer = output_dataset.get_writer()
    foobar="foobar"
    values = {
        "colonne1": foobar,
        "colonne2": foobar,
        "colonne3": 1,
        "colonne4" : foobar
    }
    ##Ne prend que des valeurs de type dictionnaire
    writer.write_row_dict(values)
except:
    writer.close()
0 Kudos
Level 3
Of course, now that I read ot after a good sleep I see what you mean, thanks dev. It would depend on how you have your new data, I guess the easiest way will be with the panda's append function. I believe this page will help you: https://stackoverflow.com/questions/14988480/pandas-version-of-rbind
0 Kudos
Labels (4)