TRAIN/TEST PARTITIONING
Is it possible to choose the data of the test and training set other than randomly ?
Answers
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Welcome to the Dataiku Community. It is great to have you joining us.
From your question, I'm going to guess that you are trying to use "Visual ML" not a coded ML Model.
If that is correct, there are several not strictly random things you can do to split data.
1. For Time Series Data you can try the Time Series Ordering option:
2. You can also use the Visual Split Recipe in before your modeling pipeline to pull together a different kind of sample for validation and train datasets.
Here is some introductory material on the Split Recipe.
If you have a paid license you can also do "partitioning" based on the storage or values in a column of your data. Models can be built based on partitions.
That said, in your one-sentence question you have not provided a lot of details about what you are trying to achieve. If you could share some more details about what you are trying to achieve. I or another member community or even Dataiku Staff might be able to provide some further insights.
There are some other conversations here in the Dataiku Community that are on this topic of splitting, for example:
https://community.dataiku.com/t5/Using-Dataiku-DSS/Splitting-dataset/m-p/2709
I used this google search to find that discussion:
site:dataiku.com spliting data
Welcome to the Dataiku community. Let us know how we can be of further assistance.