Dataiku API generates same output for different inputs in Test Queries.
I have created a Dataiku API from a model trained on Vectorized text as input. When I tried to test the API by giving the input in "Test queries", I am getting the same output irrespective of the input provided in the "Test Queries" parameters.
One more query I have is, before giving the input to the prediction model, I have created a prepare recipe to cleanse & filter the input data. So if I provide the raw input to API, will it be going thru this prepare recipe before it hits the prediction model ?
Answers
-
Emma Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
Hey @vaishnavi
,It is hard to comment on what your model is doing and whether or not those two records should have the same response without more context.
But, to answer your second question, no it will not prepare/cleanse the data if you used a separate Prepare recipe.
However, IF the data preparation steps were in the “Script” tab of the Visual ML then yes, these steps will be included in the endpoint that is part of the API service deployed to the API node. As a result, records sent as API calls will first be preprocessed according to the steps in the prepare script of the visual analysis, before being passed to the model for scoring. See screenshot.
Hope that helps,
Emma
-
Thanks for your response @Emma
.I can provide few details of the model I am using. It is a "Logistic Regression" model for Multi-class classification selected from the "Quick Prototype models" in Dataiku lab section. I have the text input which I am passing through the TF-IDF vectorizer in the "Design" section of the lab during the model creation and using it as the input to our model. The model can predict any class based on the input provided. Irrespective of the input I have provided, I am getting the same output class "Number Management" as shown in screenshots attached earlier.
Please find couple of queries below :
1. If the "Test Queries" is given with empty input, then we should be getting the empty output right ? But still I can see the prediction ouput as "Number Management".
2. In the "Test Queries", the "features" JSON should be expecting the column name which is same as the model input column name right ? But even if I give some random name like "jdhfjkdhkf" which is not available in my dataset, I am getting the prediction as one of my target classes "Number Management".
3. As I am applying the TFIDF vectorizer in the "Design" section of the lab during the model creation part, If I give some raw text value as input in the "Test Queries" "features" json, will it pass that raw input text to the TFIDF vectorizer before hitting the prediction model ?
4. Basically I want to pass my raw text input to 2 different models and then write some custom logic to decide the final prediction. If I want to have a single API in Dataiku with 2 different models giving 2 outputs, how can we achieve this ?
5. I want to train/re-train my model in one project and then load this trained/re-trained model in other project. Can we use the model created in one Dataiku project in other Dataiku project ? If yes, how can I achieve this ?
6. Can we create an API endpoint with the Python recipe ?
I can understand that I have flooded this post with lots of queries, but I am really stuck at all these points and need your help in finding a solution for the same and move forward. I have searched for all these queries in the discussion forum in the Dataiku community page but I wasn't able to find any solutions as such for any of these queries
"> Please help me in finding a way to solve these issues/blockers.
If possible, Can we connect in anyway (Google meet etc.) to discuss these queries in detail ?
-
Emma Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
Hey @vaishnavi
- I've asked your account team to reach out directly to work through your use case!In the meantime and for anyone following this thread:
1. If the "Test Queries" is given with empty input, then we should be getting the empty output right ? But still I can see the prediction ouput as "Number Management".
> No, there is always an output. Without any parameters in your input, it will not be a very worthwhile/accurate output but there will be one based on the historical data the model was trained on.
2. In the "Test Queries", the "features" JSON should be expecting the column name which is same as the model input column name right ? But even if I give some random name like "jdhfjkdhkf" which is not available in my dataset, I am getting the prediction as one of my target classes "Number Management".
> Right, it is expecting the same columns and will ignore columns that were not present during training.
3. As I am applying the TFIDF vectorizer in the "Design" section of the lab during the model creation part, If I give some raw text value as input in the "Test Queries" "features" json, will it pass that raw input text to the TFIDF vectorizer before hitting the prediction model ?
> Yes.
Read more: https://doc.dataiku.com/dss/latest/machine-learning/scoring-engines.html#preprocessing
4. Basically I want to pass my raw text input to 2 different models and then write some custom logic to decide the final prediction. If I want to have a single API in Dataiku with 2 different models giving 2 outputs, how can we achieve this ?
> Take a look at our documentation around Custom Python Functions - it allows you to write and combine multiple custom models/logic together into a single endpoint.
https://doc.dataiku.com/dss/latest/apinode/endpoints.html
5. I want to train/re-train my model in one project and then load this trained/re-trained model in other project. Can we use the model created in one Dataiku project in other Dataiku project ? If yes, how can I achieve this ?
> Yes, you can share code and/or AutoML modelling elements amongst projects.
Sharing: https://doc.dataiku.com/dss/latest/security/shared-objects.html
6. Can we create an API endpoint with the Python recipe ?
> Yes, in one of two ways, expose a custom python function or a custom python prediction endpoint.
https://doc.dataiku.com/dss/latest/apinode/endpoints.html
-
Thanks for your response @Emma
It helped me in understanding about the concept of "Shared Objects".