How to train a model without test set?
Hello,
Is there a way to build a model without using a test set? To train a model, it asks for a split of the dataset which is seemingly mandatory. As highlighted in the screenshot below, in the Policy, there is no option of turning off the testing stage or discarding the split. I tried to "trick"
Therefore, I am asking is there a way of avoiding this step, to just fit the given data? E.g. Imagine that I want to build a simple linear regression model with 100 samples and I don't want to validate or test it. I just want DSS to fit a line basing on my 100 samples. It is like using .fit() function in programming languages to fit the data.
Answers
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
Hi, @shahriyar
! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community. -
Hi, @CoreyS
. Thanks for the clarification, I updated the question accordingly. -
Tanguy Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2023 Posts: 114 Neuron
+1 for the OP question.
This feature is very much needed : (re-)train the model on the entire dataset / without holding any data outside of the training process.
It seems to be currently impossible to do that using the "train" object in the flow (which imposes a strategy to split the data into train/test set). -
Hi @tanguy
,
This feature is available in the training recipe:
Once the model is deployed in the flow, you can choose to retrain it using 100% of the training data !