How to perform analysis in a python notebook on a part of a dataset without loading all the dataset
 
            
                
                    UserBird                
                
                    Dataiker, Alpha Tester Posts: 535 Dataiker                
            
                        
            
                    Hi,
I have a dataset representing transactions for all the products.
I would like to perform a loop for on product in a python notebook for loading the transaction for this product, then perform analysis and write the results in a dataset.
How can I load only a partition of my dataset from a python Notebook ?
Thanks in advance
                        
            I have a dataset representing transactions for all the products.
I would like to perform a loop for on product in a python notebook for loading the transaction for this product, then perform analysis and write the results in a dataset.
How can I load only a partition of my dataset from a python Notebook ?
Thanks in advance
Best Answer
- 
            Hi, I assume you use the get_dataframe() method and then work with a pandas dataframe. (Let me know if you do something different). Here is what you can do: 1) Get only a sample of a dataset with my_dataset.get_dataframe(sampling='head', limit=10000) 2) Load the dataset by chunks with my_dataser.iter_dataframes(chunksize=10000) my_dataset = dataiku.Dataset("name_dataset")
 for partial_dataframe in my_dataset.iter_dataframes(chunksize=10000):
 # Insert here applicative logic on each partial dataframe.
 passYou can read more in the documentation. 
