Problems with "Add queries" function due to the difference between Python boolean and json's boolean

yonghyun · February 3

I'll explain with "Dataiku TShirts" sample project.

I was testing making an api using a dataset in the project.

Using the function of API Designer's Add queries, we created a test query using five columns of revue_prediction.

The column "campain" was boolean, which had a true false value.

But when we converted this to json, it had a true false value, so the predicted value changed.

We had to match this with the dataset to get the predicted value we wanted.

Of course, the model was made so that if the input is different, the result will be different, so I understand the part, but I only created a sample query using the data set of dataiku using the function of dataiku, but it's hard to think that the result value changes because the false value comes out as a false value.
I'm curious about your opinion on that part.

ÁngelÁlvarez · February 4

The issue you have identified stems from a subtle but significant "translation error" between how Dataiku stores data for training and how it generates JSON for API testing.

Here is the explanation for why the "Add queries" function causes a change in the predicted value:

1. Dataiku's "Hidden" String Storage

While the dataset column is labeled as Boolean (with the blue icon), Dataiku often stores these values as the capitalized text strings "True" and "False" under the hood. This is confirmed by the "Copy row as JSON" option from the dataset, which shows "campain": "False" (with quotes) rather than a raw JSON boolean (false).

{
  "customer_id": "066afc964e",
  "ip": "41.189.149.136",
  "ip_country": "Ghana",
  "ip_geopoint": "POINT(-2 8)",
  "pages_visited": 7,
  "campain": "False",
  "prediction": 181.97621671433708
}

2. The Categorical Training Mismatch

When you created the model using AutoML, Dataiku detected those "True"/"False" strings and defaulted the Variable Type to Categorical instead of Numerical/Boolean.

The Model's Perspective: The model learned that the text label "False" (a string) is a specific input that correlates to a prediction of 181.97.
The "Add Queries" Behavior: When the API Designer's "Add Queries" function generates a test, it often translates that internal data into a standard JSON boolean: "campain": false (lowercase, no quotes).

3. Why the Prediction Changes

To a Machine Learning model, a string and a boolean are not the same thing:

Case A (Correct): Input is "False" (String). The model recognizes this categorical label and gives the expected result.
Case B (The Problem): Input is false (JSON Boolean). Because the model was trained on strings, it doesn't "see" a match for the label "False". It treats the input as an Unknown Category or a Missing Value.
Result: The model applies its "unknown value" logic, which shifts the prediction from 181.97 to a different value (like your 149.20 result).

Problems with "Add queries" function due to the difference between Python boolean and json's boolean

Comments

1. Dataiku's "Hidden" String Storage

2. The Categorical Training Mismatch

3. Why the Prediction Changes

Categories

Setup Info

Tags