Predictions made using Dataiku Snowpark API Yield Single Class for Multi-Class Classification model
Hello community,
I am facing an issue when using Dataiku's Snowpark API to make predictions from a multi-class classification model.
The model is trained with Dataiku and a Snowflake table via Visual ML.
The predictions only return one class, while the same table loaded as a pandas DataFrame returns correct predictions with multiple classes.
Even when the Snowflake table is transformed into a pandas DataFrame and then predicted, the predictions are still incorrect with only one class.
Steps to Reproduce:
- Train a K-means multi-class classification model using Dataiku and a snowflake table vis Visual ML.
- deploy the model on api deployer
- Open a python jupyter notebook and use the Dataiku Snowpark API to read the data
- Create a prediction on the trained model using the read data.
- Observe that all predictions are of a single class.
- Read the same Snowflake table using Dataiku read dataframe method to read the data as a pandas DataFrame.
- Run predictions on the pandas DataFrame.
- Observe that predictions are as expected, with multiple classes.
- Convert the Snowflake table to a pandas DataFrame and then predict.
- Observe that predictions are still incorrect with all being the same class.
Code to read data using snowpark api
input_dataset = dataiku.Dataset("inference_data") dku_snowpark = DkuSnowpark() snowpark_session = dku_snowpark.create_session( connection_name="SNOWFLAKE_CONNECTION", project_key=dataiku.default_project_key() ) dataset_dataframe = dku_snowpark.get_dataframe(dataset=input_dataset, session=snowpark_session)
Code to read data as a pandas df:
input_dataset = dataiku.Dataset("inference_data") dataset_dataframe = input_dataset.get_dataframe()
Code to run predictions:
client = dataikuapi.APINodeClient(apinode_endpoint, "Model_AutoMl") prediction = client.predict_records("Clustering_Model", dataset_dataframe)['results']
This is the data and results when using snowpark api
This is the data and result when using pandas dataframe
Any insights or solutions to resolve this inconsistency would be greatly appreciated. Please let me know if additional information is required.
Thank you for your support.
Operating system used: Mac OS