Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
I assume you use the get_dataframe() method and then work with a pandas dataframe. (Let me know if you do something different).
Here is what you can do:
1) Get only a sample of a dataset with my_dataset.get_dataframe(sampling='head', limit=10000)
2) Load the dataset by chunks with my_dataser.iter_dataframes(chunksize=10000)
my_dataset = dataiku.Dataset("name_dataset")
for partial_dataframe in my_dataset.iter_dataframes(chunksize=10000):
# Insert here applicative logic on each partial dataframe.
pass
You can read more in the documentation.
Hi,
I assume you use the get_dataframe() method and then work with a pandas dataframe. (Let me know if you do something different).
Here is what you can do:
1) Get only a sample of a dataset with my_dataset.get_dataframe(sampling='head', limit=10000)
2) Load the dataset by chunks with my_dataser.iter_dataframes(chunksize=10000)
my_dataset = dataiku.Dataset("name_dataset")
for partial_dataframe in my_dataset.iter_dataframes(chunksize=10000):
# Insert here applicative logic on each partial dataframe.
pass
You can read more in the documentation.