TRAIN/TEST PARTITIONING

didierbarlogis
Level 1
TRAIN/TEST PARTITIONING

Is it possible to choose the data of the test and training set other than randomly ?

0 Kudos
1 Reply
tgb417

@didierbarlogis ,

Welcome to the Dataiku Community.  It is great to have you joining us.

From your question, I'm going to guess that you are trying to use "Visual ML" not a coded ML Model.

If that is correct, there are several not strictly random things you can do to split data.

1. For Time Series Data you can try the Time Series Ordering option:
Time Series Ordering.jpg

2. You can also use the Visual Split Recipe in before your modeling pipeline to pull together a different kind of sample for validation and train datasets.

Showing the use of a visual split recipeShowing the use of a visual split recipe

Here is some introductory material on the Split Recipe.

If you have a paid license you can also do "partitioning" based on the storage or values in a column of your data.  Models can be built based on partitions.  

That said, in your one-sentence question you have not provided a lot of details about what you are trying to achieve.  If you could share some more details about what you are trying to achieve.  I or another member community or even Dataiku Staff might be able to provide some further insights.

There are some other conversations here in the Dataiku Community that are on this topic of splitting, for example:

https://community.dataiku.com/t5/Using-Dataiku-DSS/Splitting-dataset/m-p/2709

I used this google search to find that discussion:

site:dataiku.com spliting data

Welcome to the Dataiku community.  Let us know how we can be of further assistance.

--Tom
0 Kudos