Writing a pandas dataframe to Snowflake table using writer

Options
Megha
Megha Registered Posts: 3 ✭✭✭
edited July 16 in General Discussion

Hi ,

I am trying to write a Pandas Dataframe containing 1.58 million records to a snowflake table.

To make the procss faster, i wanted to use the chunked writer functionality available under: Datasets (reading and writing data) — Dataiku DSS 11 documentation.

I have the dataframe named "Out_df" containing 1.5m records which i want to write to Snowflake table "RM20_DATA"

when I execute the below piece of code, I get an error as below:

AttributeError: 'DataFrame' object has no attribute 'iter_dataframes'

I understand that the method iter_dataframes() cannot work on a dataframe but works on a dataset.

How can I convert my out_df to a dataset that can be iterated using iter_dataframes() function to write the data.

----Code---

rm20_data = dataiku.Dataset("RM20_DATA")
rm20_data.write_schema_from_dataframe(Out_df)
with rm20_data.get_writer() as writer:

for df in Out_df.iter_dataframes():
# Process the df dataframe ...

# Write the processed dataframe
writer.write_dataframe(df)


Operating system used: windows

Tagged:

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    edited July 17
    Options

    Hi @Megha
    ,

    To increase the writing speed to snowflake from a python recipe you should try and leverage fast-path. This will be exponentially faster.

    https://doc.dataiku.com/dss/latest/connecting/sql/snowflake.html#writing-data-into-snowflake

    You will need to update the connection-enabled fast path and have cloud storage( S3, GCS, Azure Blob) in the same region and with other prerequisites detailed in the doc above.


    Chunked reading/writing help with the memory usage of the python recipe but will not really help with the writting speed. If you can comfortably fit the dataset into memory you don't need to use chunked reading/writing. Your code is failing because you didn't define the writer with the Out_df,

    Please refer to below doc and syntax ;

    https://doc.dataiku.com/dss/latest/python-api/datasets-data.html#chunked-reading-and-writing-with-pandas

    inp = Dataset("input")
    out = Dataset("output")
    
    with out.get_writer() as writer:
    
            for df in inp.iter_dataframes():
                    # Process the df dataframe ...
    
                    # Write the processed dataframe
                    writer.write_dataframe(df)

    Thanks,

Setup Info
    Tags
      Help me…