Fields in Scoring Dataset that weren't in Training Dataset

ccecil · March 2023

Hi there,

Two questions:

1) I'm receiving the error message below and I'm wondering if case on the field text is impacting this. I have 'Number of Rooms' in my training set and 'NUMBER_OF_ROOMS' in my scoring set.

- Will I need to go into my scoring set and match the case of the field names with the training set?

An invalid argument has been encountered : in act.score_Model_Score_NP: Cannot apply the model with the output of preparation on this input (Missing column: Number_of_Rooms)

2) I have some extra fields in my scoring set that were not in my training set, is there a way for my model to ignore those additional fields when scoring the model?

Thank you in advance, I really appreciate it.

Operating system used: Windows

Alexandru · March 2023

Hi @ccecil
,
You either need to match the schema of the training dataset to the scored datase.
You can to use a preparation script in the visual analysis for the model you've trained.
E.g Simply add rename column step where the column name is "Number of Rooms" and change to "Number_of_Rooms" , you may get a warning if the column doesn't exist in the training datasets.

Screen Shot 2023-03-31 at 4.44.55 PM.png But it would apply the same script steps when running the scoring recipe and should avoid the error you see.

Thanks

ccecil · March 2023

Hi @AlexT
,

I made a small typo in my original question, the field in my training dataset is 'Number_of_Bedrooms' and in my scoring dataset it is 'NUMBER_OF_BEDROOMS'.

Does the same solution still apply?

Alexandru · March 2023

Hi,

That would still apply. Rename any columns you expect to have in your scoring dataset.
Thanks

ccecil · March 2023

Okay.

I do that and then hit deploy script. Which produces a new dataset with my renamed columns. If I use that new dataset to train my model, it throws an error saying that one of the columns I renamed is now empty. Did I go wrong somewhere?

@AlexT

Alexandru · March 2023

If you deploy the script, it creates a prepare recipe you would need to change the input of the newly created recipe to the dataset you are scoring.

https://knowledge.dataiku.com/latest/data-preparation/lab-visual-analyses/tutorial-lab.html

Fields in Scoring Dataset that weren't in Training Dataset

Hi there,

An invalid argument has been encountered : in act.score_Model_Score_NP: Cannot apply the model with the output of preparation on this input (Missing column: Number_of_Rooms)

Best Answer

Answers

Categories

Setup Info

Tags