Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello,
I am trying to load a snowflake table as pandas dataframe. Since the data size is huge, kernel stops and show memory error. What can I do to avoid this? Is there any better way to load huge datasets.
I tried using iter_dataframe functions.
There are two types:
1)iter_dataframes(chunksize=10000, infer_with_pandas=True, sampling='head', sampling_column=None, parse_dates=True, limit=None, ratio=None, columns=None, bool_as_str=False, float_precision=None)
This still shows the memory error, possible reason is, Pandas tries to detect column data types by the values of those columns. Since my columns have different types of data and its huge, kernel stops and shows memory error. So I tried using the below function.
2)iter_dataframes_forced_types(names, dtypes, parse_date_columns, chunksize=10000, sampling='head', sampling_column=None, limit=None, ratio=None, float_precision=None)
In this function, I passes column names and their respective data types as dictionary "{column_name:str}". But information on 4 arguments - names, dtypes, parse_date_columns, chunksize is required so I passed column names as a list for the "names" argument, data types as a list for the "dtypes" argument. (Both lists sorted in a way to match each other). I am not sure what value has to be passed in "parse_date_columns". This is where I am stuck. I tried passing boolean values (True,False), date formats (MM/dd/yyyy HH:mm:ss), None. Nothing worked.
Can anyone direct me towards a better solution or approach?
Thank You,
Sajid
Operating system used: Windows
Hi @Sajid_Khan,
Have you tried iter_dataframes with the infer_with_pandas set False? If not, I'd try that. With that setting, the Snowflake table derived column data types should be used rather than Pandas trying to detect data types.
Marlan