Fields in Scoring Dataset that weren't in Training Dataset
Hi there,
Two questions:
1) I'm receiving the error message below and I'm wondering if case on the field text is impacting this. I have 'Number of Rooms' in my training set and 'NUMBER_OF_ROOMS' in my scoring set.
- Will I need to go into my scoring set and match the case of the field names with the training set?
An invalid argument has been encountered : in act.score_Model_Score_NP: Cannot apply the model with the output of preparation on this input (Missing column: Number_of_Rooms)
2) I have some extra fields in my scoring set that were not in my training set, is there a way for my model to ignore those additional fields when scoring the model?
Thank you in advance, I really appreciate it.
Operating system used: Windows
Operating system used: Windows
Best Answer
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,211 Dataiker
Hi @ccecil
You either need to match the schema of the training dataset to the scored datase.
You can to use a preparation script in the visual analysis for the model you've trained.
E.g Simply add rename column step where the column name is "Number of Rooms" and change to "Number_of_Rooms" , you may get a warning if the column doesn't exist in the training datasets.But it would apply the same script steps when running the scoring recipe and should avoid the error you see.
Hi @AlexT
,I made a small typo in my original question, the field in my training dataset is 'Number_of_Bedrooms' and in my scoring dataset it is 'NUMBER_OF_BEDROOMS'.
Does the same solution still apply?
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,211 Dataiker
That would still apply. Rename any columns you expect to have in your scoring dataset.
Thanks -
I do that and then hit deploy script. Which produces a new dataset with my renamed columns. If I use that new dataset to train my model, it throws an error saying that one of the columns I renamed is now empty. Did I go wrong somewhere?
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,211 Dataiker
If you deploy the script, it creates a prepare recipe you would need to change the input of the newly created recipe to the dataset you are scoring.