Unable to run data drift from the saved model page with a custom model

Suhail
Suhail Registered Posts: 18 ✭✭✭✭
edited July 16 in Using Dataiku

Hello everyone,

I have a custom model logged into Dataiku as a saved model using the mlflow integration in Dataiku, trying to run the data drift analysis from the saved model page i encountered an error. The error message reads,

"FileNotFoundError: [Errno 2] No such file or directory: /app/dataiku/DSS_DATA_DIR/saved_models/DKU_PREDICTIVE_MAINTENANCE_1/hzGw5Zzl/versions/2023_03_01T05_23_29/split/split.json".

the code i used to create the saved model

project = dataiku.api_client().get_default_project()
mlflow_ext = project.get_mlflow_extension()
managed_folder = project.get_managed_folder(folder_id)
mlflow_handle = project.setup_mlflow(managed_folder)
mlflow.set_experiment(experiment_id)
mlflow.sklearn.autolog()
with mlflow.start_run(run_name=f"run_{round(time.time() * 1000)}") as run:
    clf = RandomForestClassifier(n_estimators,random_state,max_depth,min_samples_leaf,verbose)
    clf.class_weight = class_weight
    clf.fit(X_train, y_train)
    mlflow_ext.set_run_inference_info(prediction_type,run_id,classes,code_env_name,target)
    mlflow_ext.deploy_run_model(run_id,saved_m_id,evaluation_dataset)
mlflow_handle.clear()

Could someone help me understand why this error is occurring and how I can resolve it? Any suggestions or insights would be greatly appreciated.

Thank you in advance! 

Best Answer

  • Pierre-MaëlM
    Pierre-MaëlM Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 3 Dataiker
    edited July 17 Answer ✓

    If you wish to create a standalone evaluation recipe on a dataset that was scored by your model (which we will call input dataset - typically the recent data your model scored), you can add a reference (scored) dataset as a second input to this recipe (typically - the validation dataset you used at train time or even the dataset you trained the model on) and the standalone evaluation recipe will be able to compute the drift score (i.e. the drift between the input dataset and the reference dataset) and store this metric in a Model Evaluation which will be stored in the Model Evaluation Store.

    You may also use the Evaluate Recipe instead of Standalone Evaluation Recipe to evaluate MLflow models (and compute drift). The Evaluate Recipe takes two inputs: the MLflow model and an evaluation dataset. Be sure to use the Model Evaluation Store as an output for this recipe and every run of the Evaluate Recipe will add a Model Evaluation to the Model Evaluation Store. In a similar fashion as what is done for the Standalone Evaluation Recipe, the drift metric will be stored for each Model Evaluation in the Model Evaluation Store. In the case of an Evaluate Recipe with an MLflow model, the input data drift is the metric quantifying the drift between the reference - in this case it is the dataset that was used to evaluate the MLflow model when it was imported in DSS using

     mlflow_ext.deploy_run_model(run_id, saved_m_id, evaluation_dataset)

    - and the evaluation dataset, which is the second input of the Evaluate Recipe.

    Also, from a Model Evaluation details screen, you can trigger a drift computation from the Drift Analysis > Input data drift section. (To access the Model Evaluation details screen, go to the Model Evaluation Store and click on one of the names of the Model Evaluation that are displayed in the table.)

    Please let me know if you have any more questions.

    Regards,
    Pierre-Maël

Answers

  • Pierre-MaëlM
    Pierre-MaëlM Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 3 Dataiker

    Hello @Suhail

    If I understand correctly you seem to be using the data drift plugin to compute data drift analysis on your custom model. This plugin does not support MLflow models and is deprecated.

    Instead, I can direct you to the Model Evaluations which can hopefully help you achieve drift analysis with your model.

    Do not hesitate if you have further questions.

    Regards,
    Pierre-Maël

  • Suhail
    Suhail Registered Posts: 18 ✭✭✭✭

    Hi @Pierre-MaëlM
    ,

    Thanks for the response,

    I have a quick followup question if i use the evaluation store or the stand alone evaluation can you help me with how to run the drift analysis between the data on which the model is trained on and a new data set with recent data

  • Suhail
    Suhail Registered Posts: 18 ✭✭✭✭

    Hi @Pierre-MaëlM
    ,

    Thanks for the help

Setup Info
    Tags
      Help me…