Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
We're working on a project utilizing an R notebook with some very large datasets and are wondering what the recommended approach is for working with a dataset that does not fit into memory.
We are big fans of the streaming API for Python - is there any equivalent for R?
Thanks!
Hello,
The dataiku R library allows you to read your data in by chunks. More information can be found in the docs here : https://doc.dataiku.com/dss/api/8.0/R/dataiku/reference/dkuReadDataset.html
Thanks @Triveni. The ability to read a random sample from the dataset is terrific, but can you tell me if it's possible to read records from a specific offset? For example, I want to iterate through the entire set but only take 10,000 records at a time.
+1 for this feature in R