Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Predictions made using Dataiku Snowpark API Yield Single Class for Multi-Class Classification model

Suhail
Level 3
Predictions made using Dataiku Snowpark API Yield Single Class for Multi-Class Classification model

Hello community,

I am facing an issue when using Dataiku's Snowpark API to make predictions from a multi-class classification model.

The model is trained with Dataiku and a Snowflake table via Visual ML.

The predictions only return one class, while the same table loaded as a pandas DataFrame returns correct predictions with multiple classes.

Even when the Snowflake table is transformed into a pandas DataFrame and then predicted, the predictions are still incorrect with only one class.

 

Steps to Reproduce:

  1. Train a K-means multi-class classification model using Dataiku and a snowflake table vis Visual ML.
  2. deploy the model on api deployer
  3. Open a python jupyter notebook and use the Dataiku Snowpark API to read the data
  4. Create a prediction on the trained model using the read data.
  5. Observe that all predictions are of a single class.
  6. Read the same Snowflake table using Dataiku read dataframe method to read the data as a pandas DataFrame.
  7. Run predictions on the pandas DataFrame.
  8. Observe that predictions are as expected, with multiple classes.
  9. Convert the Snowflake table to a pandas DataFrame and then predict.
  10. Observe that predictions are still incorrect with all being the same class.

 

Code to read data using snowpark api

 

input_dataset = dataiku.Dataset("inference_data")
dku_snowpark = DkuSnowpark()
snowpark_session = dku_snowpark.create_session(
    connection_name="SNOWFLAKE_CONNECTION", 
    project_key=dataiku.default_project_key()
)
dataset_dataframe = dku_snowpark.get_dataframe(dataset=input_dataset, session=snowpark_session)

 

 

Code to read data as a pandas df:

 

input_dataset = dataiku.Dataset("inference_data")
dataset_dataframe = input_dataset.get_dataframe()

 

 

Code to run predictions:

 

client = dataikuapi.APINodeClient(apinode_endpoint, "Model_AutoMl")
prediction = client.predict_records("Clustering_Model", dataset_dataframe)['results']

 

 

This is the data and results when using snowpark api

Screenshot 2024-05-27 at 5.00.43 PM.png

Screenshot 2024-05-27 at 5.00.59 PM.png

 

This is the data and result when using pandas dataframe  

Screenshot 2024-05-27 at 5.01.37 PM.png

Screenshot 2024-05-27 at 5.01.43 PM.png

  

Any insights or solutions to resolve this inconsistency would be greatly appreciated. Please let me know if additional information is required.

Thank you for your support.


Operating system used: Mac OS

0 Kudos
1 Reply
AlexT
Dataiker

Hi @Suhail ,


Could you please open a support ticket for this and share job diagnostics from both variants?
Thanks

0 Kudos

Labels

?
Labels (2)
A banner prompting to get Dataiku