Fields in Scoring Dataset that weren't in Training Dataset

ccecil Registered Posts: 17 ✭✭✭✭
Hi there,

Two questions:

1) I'm receiving the error message below and I'm wondering if case on the field text is impacting this. I have 'Number of Rooms' in my training set and 'NUMBER_OF_ROOMS' in my scoring set.

- Will I need to go into my scoring set and match the case of the field names with the training set?

An invalid argument has been encountered : in act.score_Model_Score_NP: Cannot apply the model with the output of preparation on this input (Missing column: Number_of_Rooms)

2) I have some extra fields in my scoring set that were not in my training set, is there a way for my model to ignore those additional fields when scoring the model?

Thank you in advance, I really appreciate it.

Operating system used: Windows

Operating system used: Windows


Best Answer

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Answer ✓

    Hi @ccecil
    You either need to match the schema of the training dataset to the scored datase.
    You can to use a preparation script in the visual analysis for the model you've trained.
    E.g Simply add rename column step where the column name is "Number of Rooms" and change to "Number_of_Rooms" , you may get a warning if the column doesn't exist in the training datasets.

    Screen Shot 2023-03-31 at 4.44.55 PM.pngBut it would apply the same script steps when running the scoring recipe and should avoid the error you see.



Setup Info
      Help me…