Very big dataset

Options
TheMLEngineer
TheMLEngineer Registered Posts: 25

I have a very large dataset, 16.8billion records and about 8TB. It takes days to do any operation on the data and the project owner want to use all the data and not subset. Dataiku and S3 get into memory errors after several hours of running. Looking for some general guidelines on how to handle this situation.

Thank you.

Answers

Setup Info
    Tags
      Help me…