How to train a model without test set?

Options
shahriyar
shahriyar Registered Posts: 4 ✭✭✭✭

Hello,

Is there a way to build a model without using a test set? To train a model, it asks for a split of the dataset which is seemingly mandatory. As highlighted in the screenshot below, in the Policy, there is no option of turning off the testing stage or discarding the split. I tried to "trick" split process by setting Train ratio 1.0 but it failed.

Capture.PNG

Therefore, I am asking is there a way of avoiding this step, to just fit the given data? E.g. Imagine that I want to build a simple linear regression model with 100 samples and I don't want to validate or test it. I just want DSS to fit a line basing on my 100 samples. It is like using .fit() function in programming languages to fit the data.

Answers

  • CoreyS
    CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
    Options

    Hi, @shahriyar
    ! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community.

  • shahriyar
    shahriyar Registered Posts: 4 ✭✭✭✭
    Options

    Hi, @CoreyS
    . Thanks for the clarification, I updated the question accordingly.

  • Tanguy
    Tanguy Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2023 Posts: 112 Neuron
    Options

    +1 for the OP question.

    This feature is very much needed : (re-)train the model on the entire dataset / without holding any data outside of the training process.

    It seems to be currently impossible to do that using the "train" object in the flow (which imposes a strategy to split the data into train/test set).

  • Mattsco
    Mattsco Administrator, Dataiker Posts: 125 Administrator
    Options

    Hi @tanguy
    ,

    This feature is available in the training recipe:
    Once the model is deployed in the flow, you can choose to retrain it using 100% of the training data !
    Capture d’écran 2022-04-24 à 08.47.58.png

Setup Info
    Tags
      Help me…