Hi,
I assume you use the get_dataframe() method and then work with a pandas dataframe. (Let me know if you do something different).
Here is what you can do:
1) Get only a sample of a dataset with my_dataset.get_dataframe(sampling='head', limit=10000)
2) Load the dataset by chunks with my_dataser.iter_dataframes(chunksize=10000)
my_dataset = dataiku.Dataset("name_dataset")
for partial_dataframe in my_dataset.iter_dataframes(chunksize=10000):
# Insert here applicative logic on each partial dataframe.
pass
You can read more in the documentation.
Hi,
I assume you use the get_dataframe() method and then work with a pandas dataframe. (Let me know if you do something different).
Here is what you can do:
1) Get only a sample of a dataset with my_dataset.get_dataframe(sampling='head', limit=10000)
2) Load the dataset by chunks with my_dataser.iter_dataframes(chunksize=10000)
my_dataset = dataiku.Dataset("name_dataset")
for partial_dataframe in my_dataset.iter_dataframes(chunksize=10000):
# Insert here applicative logic on each partial dataframe.
pass
You can read more in the documentation.