Pandas Iter dataframes

Sajid_Khan Partner, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 12 Partner


I am trying to load a snowflake table as pandas dataframe. Since the data size is huge, kernel stops and show memory error. What can I do to avoid this? Is there any better way to load huge datasets.

I tried using iter_dataframe functions.

There are two types:

1)iter_dataframes(chunksize=10000, infer_with_pandas=True, sampling='head', sampling_column=None, parse_dates=True, limit=None, ratio=None, columns=None, bool_as_str=False, float_precision=None)

This still shows the memory error, possible reason is, Pandas tries to detect column data types by the values of those columns. Since my columns have different types of data and its huge, kernel stops and shows memory error. So I tried using the below function.

2)iter_dataframes_forced_types(names, dtypes, parse_date_columns, chunksize=10000, sampling='head', sampling_column=None, limit=None, ratio=None, float_precision=None)

In this function, I passes column names and their respective data types as dictionary "{column_name:str}". But information on 4 arguments - names, dtypes, parse_date_columns, chunksize is required so I passed column names as a list for the "names" argument, data types as a list for the "dtypes" argument. (Both lists sorted in a way to match each other). I am not sure what value has to be passed in "parse_date_columns". This is where I am stuck. I tried passing boolean values (True,False), date formats (MM/dd/yyyy HH:mm:ss), None. Nothing worked.

Can anyone direct me towards a better solution or approach?

Thank You,


Operating system used: Windows


  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 317 Neuron

    Hi @Sajid_Khan

    Have you tried iter_dataframes with the infer_with_pandas set False? If not, I'd try that. With that setting, the Snowflake table derived column data types should be used rather than Pandas trying to detect data types.


Setup Info
      Help me…