Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

Split - Keep same proportions

Level 1
Split - Keep same proportions

Hi all !

I would need to split my dataset into a training and a testing dataset. I would like to ensure that the proportions of classes I have in my original dataset are kept in my training dataset (for example, if my original dataset has 55% of women and 45% of men, the same proportion would be found in the training dataset, same for several other classes).

Which type of splitting should be used to ensure the above ? Is it a default when using the full random or should I add some filters ?

Many thanks !

0 Kudos
1 Reply
Dataiker
Dataiker

Hi

random sampling is fine in this case