Unable to run data drift from the saved model page with a custom model
Hello everyone,
I have a custom model logged into Dataiku as a saved model using the mlflow integration in Dataiku, trying to run the data drift analysis from the saved model page i encountered an error. The error message reads,
"FileNotFoundError: [Errno 2] No such file or directory: /app/dataiku/DSS_DATA_DIR/saved_models/DKU_PREDICTIVE_MAINTENANCE_1/hzGw5Zzl/versions/2023_03_01T05_23_29/split/split.json".
the code i used to create the saved model
project = dataiku.api_client().get_default_project() mlflow_ext = project.get_mlflow_extension() managed_folder = project.get_managed_folder(folder_id) mlflow_handle = project.setup_mlflow(managed_folder) mlflow.set_experiment(experiment_id) mlflow.sklearn.autolog() with mlflow.start_run(run_name=f"run_{round(time.time() * 1000)}") as run: clf = RandomForestClassifier(n_estimators,random_state,max_depth,min_samples_leaf,verbose) clf.class_weight = class_weight clf.fit(X_train, y_train) mlflow_ext.set_run_inference_info(prediction_type,run_id,classes,code_env_name,target) mlflow_ext.deploy_run_model(run_id,saved_m_id,evaluation_dataset) mlflow_handle.clear()
Could someone help me understand why this error is occurring and how I can resolve it? Any suggestions or insights would be greatly appreciated.
Thank you in advance!
Best Answer
-
Pierre-MaëlM Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 3 Dataiker
If you wish to create a standalone evaluation recipe on a dataset that was scored by your model (which we will call input dataset - typically the recent data your model scored), you can add a reference (scored) dataset as a second input to this recipe (typically - the validation dataset you used at train time or even the dataset you trained the model on) and the standalone evaluation recipe will be able to compute the drift score (i.e. the drift between the input dataset and the reference dataset) and store this metric in a Model Evaluation which will be stored in the Model Evaluation Store.
You may also use the Evaluate Recipe instead of Standalone Evaluation Recipe to evaluate MLflow models (and compute drift). The Evaluate Recipe takes two inputs: the MLflow model and an evaluation dataset. Be sure to use the Model Evaluation Store as an output for this recipe and every run of the Evaluate Recipe will add a Model Evaluation to the Model Evaluation Store. In a similar fashion as what is done for the Standalone Evaluation Recipe, the drift metric will be stored for each Model Evaluation in the Model Evaluation Store. In the case of an Evaluate Recipe with an MLflow model, the input data drift is the metric quantifying the drift between the reference - in this case it is the dataset that was used to evaluate the MLflow model when it was imported in DSS using
mlflow_ext.deploy_run_model(run_id, saved_m_id, evaluation_dataset)
- and the evaluation dataset, which is the second input of the Evaluate Recipe.
Also, from a Model Evaluation details screen, you can trigger a drift computation from the Drift Analysis > Input data drift section. (To access the Model Evaluation details screen, go to the Model Evaluation Store and click on one of the names of the Model Evaluation that are displayed in the table.)
Please let me know if you have any more questions.
Regards,
Pierre-Maël
Answers
-
Pierre-MaëlM Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 3 Dataiker
Hello @Suhail
If I understand correctly you seem to be using the data drift plugin to compute data drift analysis on your custom model. This plugin does not support MLflow models and is deprecated.
Instead, I can direct you to the Model Evaluations which can hopefully help you achieve drift analysis with your model.
Do not hesitate if you have further questions.
Regards,
Pierre-Maël -
Hi @Pierre-MaëlM
,Thanks for the response,
I have a quick followup question if i use the evaluation store or the stand alone evaluation can you help me with how to run the drift analysis between the data on which the model is trained on and a new data set with recent data
-
Hi @Pierre-MaëlM
,Thanks for the help