Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

How to train a model without test set?

shahriyar
Level 2
How to train a model without test set?

Hello,

Is there a way to build a model without using a test set? To train a model, it asks for a split of the dataset which is seemingly mandatory. As highlighted in the screenshot below, in the Policy, there is no option of turning off the testing stage or discarding the split. I tried to "trick" 🙂 split process by setting Train ratio 1.0 but it failed. 

Capture.PNG

Therefore, I am asking is there a way of avoiding this step, to just fit the given data? E.g. Imagine that I want to build a simple linear regression model with 100 samples and I don't want to validate or test it. I just want DSS to fit a line basing on my 100 samples. It is like using .fit() function in programming languages to fit the data. 

4 Replies
CoreyS
Community Manager
Community Manager

Hi, @shahriyar! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community.

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos
shahriyar
Level 2
Author

Hi, @CoreyS. Thanks for the clarification, I updated the question accordingly. 

 

tanguy
Level 3

+1 for the OP question.

This feature is very much needed : (re-)train the model on the entire dataset / without holding any data outside of the training process.

It seems to be currently impossible to do that using the "train" object in the flow (which imposes a strategy to split the data into train/test set).

0 Kudos
Mattsco
Dataiker
Dataiker

Hi @tanguy

This feature is available in the training recipe: 
Once the model is deployed in the flow, you can choose to retrain it using 100% of the training data ! 
Capture d’écran 2022-04-24 à 08.47.58.png

Mattsco