On May 20th, we'll dive into partitions, with @Malick-K, to increase performance and computation usage when dealing with large volumes of data.
As datasets become more voluminous over time, processing time grows to update the flow with fresh incoming data, run preparation steps, and retrain models. Partitioning helps solve the issue. Partitioning refers to the splitting of the dataset along meaningful dimensions. Each partition contains a subset of the dataset.
By splitting a dataset into subsets along meaningful dimensions: time (ex: year, month, day or hour) or discrete (ex: country, business unit, etc.), it leads to building the flow for the incremental data only - while keeping the historical data as it is.
Note: Partitioning is not available in the Community edition of Dataiku DSS.
If you’re interested in learning more about Partitioning, please join us next week!
For more resources about Partitioning:
Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base
A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!