Writing a pandas dataframe to Snowflake table using writer

Megha Registered Posts: 3 ✭✭✭
edited July 16 in General Discussion

Hi ,

I am trying to write a Pandas Dataframe containing 1.58 million records to a snowflake table.

To make the procss faster, i wanted to use the chunked writer functionality available under: Datasets (reading and writing data) — Dataiku DSS 11 documentation.

I have the dataframe named "Out_df" containing 1.5m records which i want to write to Snowflake table "RM20_DATA"

when I execute the below piece of code, I get an error as below:

AttributeError: 'DataFrame' object has no attribute 'iter_dataframes'

I understand that the method iter_dataframes() cannot work on a dataframe but works on a dataset.

How can I convert my out_df to a dataset that can be iterated using iter_dataframes() function to write the data.


rm20_data = dataiku.Dataset("RM20_DATA")
with rm20_data.get_writer() as writer:

for df in Out_df.iter_dataframes():
# Process the df dataframe ...

# Write the processed dataframe

Operating system used: windows



  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    edited July 17

    Hi @Megha

    To increase the writing speed to snowflake from a python recipe you should try and leverage fast-path. This will be exponentially faster.


    You will need to update the connection-enabled fast path and have cloud storage( S3, GCS, Azure Blob) in the same region and with other prerequisites detailed in the doc above.

    Chunked reading/writing help with the memory usage of the python recipe but will not really help with the writting speed. If you can comfortably fit the dataset into memory you don't need to use chunked reading/writing. Your code is failing because you didn't define the writer with the Out_df,

    Please refer to below doc and syntax ;


    inp = Dataset("input")
    out = Dataset("output")
    with out.get_writer() as writer:
            for df in inp.iter_dataframes():
                    # Process the df dataframe ...
                    # Write the processed dataframe


Setup Info
      Help me…