Community Conundrum 10: The Titanic is now live Learn more

Partitioning in Dataiku DSS - Watch on Demand

Community Manager
Community Manager
2 4 528

Watch @Malick-K explain how to create partitioned datasets to increase performance and computation usage when dealing with large volumes of data.

Presentation abstract: 

As datasets become more voluminous over time, processing time grows to update the flow with fresh incoming data, run preparation steps, and retrain models. Partitioning helps solve the issue. By splitting a dataset into subsets along meaningful dimensions (time or discrete dimensions), it leads to build the flow for the incremental data only - while keeping the historical data as it is.

Malick Konate (Data Scientist, Dataiku) will explain in detail what partitioning is and how DSS users can use it to increase computation performances while dealing with large volumes of data. Using the example of a retail company, he will walk us through how this can be used to build historical data, target data processes on new data, and train a partitioned machine learning model for each country. This will also be an opportunity to share best practices and common pitfalls of managing dependencies.

Note: Partitioning is not available in the Community edition of Dataiku DSS.

Malick.png

 

Malick started in the data ecosystem with business intelligence projects in data engineering and data visualization. He is now Data Scientist at Dataiku in Paris, where he supports our customers in building efficient data science projects and deploying them into production.

4 Comments
Level 3

Will this be a Zoom event? Unfortunately, my company doesn't allow the use of Zoom, but I would still love to join.

Community Manager
Community Manager

Hi Anton,

Unfortunately the event is held on Zoom...Apologies for this, it's a bummer you can't join live. But we'll be posting the recording here the day after, I'll make a note to send it to you! 

Level 3

Ah, okay. I'll see whether I can join via my personal computer, that should also work. I'd also love to receive the recording!

Community Manager
Community Manager

More info on Partitioning:

We hope this helps and please let us know if you have any questions or feedback!