Pandas Iter dataframes

Sajid_Khan · ‎10-11-2022

Hello,

I am trying to load a snowflake table as pandas dataframe. Since the data size is huge, kernel stops and show memory error. What can I do to avoid this? Is there any better way to load huge datasets.

I tried using iter_dataframe functions.

There are two types:

1)iter_dataframes(chunksize=10000, infer_with_pandas=True, sampling='head', sampling_column=None, parse_dates=True, limit=None, ratio=None, columns=None, bool_as_str=False, float_precision=None)

This still shows the memory error, possible reason is, Pandas tries to detect column data types by the values of those columns. Since my columns have different types of data and its huge, kernel stops and shows memory error. So I tried using the below function.

2)iter_dataframes_forced_types(names, dtypes, parse_date_columns, chunksize=10000, sampling='head', sampling_column=None, limit=None, ratio=None, float_precision=None)

In this function, I passes column names and their respective data types as dictionary "{column_name:str}". But information on 4 arguments - names, dtypes, parse_date_columns, chunksize is required so I passed column names as a list for the "names" argument, data types as a list for the "dtypes" argument. (Both lists sorted in a way to match each other). I am not sure what value has to be passed in "parse_date_columns". This is where I am stuck. I tried passing boolean values (True,False), date formats (MM/dd/yyyy HH:mm:ss), None. Nothing worked.

Can anyone direct me towards a better solution or approach?

Thank You,

Sajid

Operating system used: Windows

Marlan · ‎10-11-2022

Hi @Sajid_Khan,

Have you tried iter_dataframes with the infer_with_pandas set False? If not, I'd try that. With that setting, the Snowflake table derived column data types should be used rather than Pandas trying to detect data types.

Marlan

Sign up to take part

Pandas Iter dataframes

Pandas Iter dataframes