The Dataiku Frontrunner Awards are now accepting submissions until July 15 to recognize your achievements! ENTER YOUR SUBMISSION

An Introduction to Partitioning

CoreyS
Community Manager
Community Manager
An Introduction to Partitioning

On May 20th, we'll dive into partitions, with @Malick-K to increase performance and computation usage when dealing with large volumes of data. 

As datasets become more voluminous over time, processing time grows to update the flow with fresh incoming data, run preparation steps, and retrain models. Partitioning helps solve the issue. Partitioning refers to the splitting of the dataset along meaningful dimensions. Each partition contains a subset of the dataset. 

By splitting a dataset into subsets along meaningful dimensions: time (ex: year, month, day or hour) or discrete (ex: country, business unit, etc.), it leads to building the flow for the incremental data only - while keeping the historical data as it is.

Note: Partitioning is not available in the Community edition of Dataiku DSS.

If you’re interested in learning more about Partitioning, please join us next week!

For more resources about Partitioning:

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos
0 Replies
A banner prompting to get Dataiku DSS
Public