Fields in Scoring Dataset that weren't in Training Dataset
Hi there,
Two questions:
1) I'm receiving the error message below and I'm wondering if case on the field text is impacting this. I have 'Number of Rooms' in my training set and 'NUMBER_OF_ROOMS' in my scoring set.
- Will I need to go into my scoring set and match the case of the field names with the training set?
An invalid argument has been encountered : in act.score_Model_Score_NP: Cannot apply the model with the output of preparation on this input (Missing column: Number_of_Rooms)
2) I have some extra fields in my scoring set that were not in my training set, is there a way for my model to ignore those additional fields when scoring the model?
Thank you in advance, I really appreciate it.
Operating system used: Windows
Operating system used: Windows
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,215 Dataiker
Hi @ccecil
,
You either need to match the schema of the training dataset to the scored datase.
You can to use a preparation script in the visual analysis for the model you've trained.
E.g Simply add rename column step where the column name is "Number of Rooms" and change to "Number_of_Rooms" , you may get a warning if the column doesn't exist in the training datasets.But it would apply the same script steps when running the scoring recipe and should avoid the error you see.
Thanks
Answers
-
Hi @AlexT
,I made a small typo in my original question, the field in my training dataset is 'Number_of_Bedrooms' and in my scoring dataset it is 'NUMBER_OF_BEDROOMS'.
Does the same solution still apply?
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,215 Dataiker
Hi,
That would still apply. Rename any columns you expect to have in your scoring dataset.
Thanks -
Okay.
I do that and then hit deploy script. Which produces a new dataset with my renamed columns. If I use that new dataset to train my model, it throws an error saying that one of the columns I renamed is now empty. Did I go wrong somewhere?
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,215 Dataiker
If you deploy the script, it creates a prepare recipe you would need to change the input of the newly created recipe to the dataset you are scoring.
https://knowledge.dataiku.com/latest/data-preparation/lab-visual-analyses/tutorial-lab.html