Unexpected prediction behavior in API Designer Test Queries with missing feature values
Hello,
I am currently testing an API endpoint created with Dataiku API Designer and noticed some prediction behaviors that I do not fully understand.
Environment
- Dataiku DSS
- API Designer
- LightGBM prediction model
- Approximately 48 input features
Observation
I expected the API to return an error when required feature values were missing from the input payload. However, the API continues to return predictions even when some features are omitted or when an almost empty query is submitted.
To better understand the behavior, I performed the following tests using the Test Queries feature in API Designer.
Test Cases
- Submit a complete input payload (ground truth)
- Modify some feature values
- Remove one or more feature values from the payload
- Modify features with high importance according to the Feature Importance analysis
- Submit an empty query
Results
Test | Result |
|---|---|
Complete input payload | Prediction returned |
Modified feature values | Same prediction as Test #1 |
Missing feature values | Same prediction as Test #1 |
Modified high-importance features | Different prediction |
Empty query | Different prediction, but prediction is still returned |
Questions
Based on these results, I would like to better understand how Dataiku handles missing features during API inference.
- Does Dataiku automatically fill missing feature values using defaults, training statistics, or preprocessing logic?
- Is the behavior different between API Designer test queries and actual API calls?
- Is there a way to enforce strict validation so that predictions fail when required features are missing?
- How should an empty query be interpreted by the prediction endpoint?
The fact that predictions are still generated even when features are removed or when an empty query is submitted makes it difficult to understand exactly what input data is being used by the model.
Any explanation of the inference-time handling of missing inputs would be greatly appreciated.
Thank you.
Dataiku version used: 14.4.3
Dataiku version used: 14.4.3
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,710 NeuronThis is a known issue which I raised years ago and has was never fixed. The fact you can't force the API to return an error if a required parameter is obviously a massive issue for ML models which can return unpredictable results when data is not valid. The only way we know to deal with this is to use a custom Python function API which means you end up doing all the coding to handle the parameters validation. Contact Dataiku Support and ask your company to be added to the enhancement request which is collecting dust somewhere…