How to load MLFLOW registered model to Dataiku DSS

vaishnavi
Level 3
How to load MLFLOW registered model to Dataiku DSS

Before using Dataiku, I used to use MLFlow (Hosted on an EC2 instance) to register my models. Now I want to load those models to Dataiku DSS. 

I have read the below document as well to understand the process of importing mlflow models.

https://doc.dataiku.com/dss/latest/mlops/mlflow-models/importing.html

But I understood that models saved to MLFLow from Dataiku can only be loaded back to DSS.

mlflow_version = saved_model.import_mlflow_version_from_path("version_id", model_directory, 'code-environment-to-use')

In this line of code, the model_directory is the path of the mlflow registered model which is on other EC2 instance ? If yes,  I tried the same but it gives me below error

DataikuException: com.dataiku.dip.io.SocketBlockLinkKernelException: Could not run command READ_META: : <class 'FileNotFoundError'> : [Errno 2] No such file or directory:
'<dataiku-install-dir-name>/saved_models/<Project_Name>/<id>/versions/version_id/MLmodel'

 This the value I had provided to the model_directory attribute : https://<host-name>/mlflow/#/models/<model_name> 

0 Kudos
4 Replies
AurelienL
Dataiker

Hi @vaishnavi 

import_mlflow_version_from_path only works with filesystem paths, not URLs. So your model would have to be on the DSS filesystem to use that method.

An alternative would be to copy all your model files to a DSS managed folder and use import_mlflow_version_from_managed_folder.

0 Kudos
vaishnavi
Level 3
Author

@AurelienL But how do I copy the registered models from Mlflow instance to DSS ?

0 Kudos
AurelienL
Dataiker

@vaishnavi  To import a model from a MLflow model registry to DSS, you can:

  1. Download the model to your local filesystem. For example, from your own machine:
    import mlflow
    
    mlflow.set_tracking_uri("your_mlflow_tracking_uri")
    
    model_name = "your_model_name"
    model_version = "latest"
    local_dir = "./my_model"
    
    mlflow.artifacts.download_artifacts(artifact_uri=f"models:/{model_name}/{model_version}", dst_path=local_dir)
  2. Copy all the files from the resulting folder to a managed folder in DSS:
    • From the flow, click on "+ DATASET" / "Folder"
    • Give the folder a label, click create
    • Click "Add a file" then select all the files from your local folder and upload them
    • Copy the managed folder id (you can find it in your browser's URL bar), you will need it in the next step
  3. From a DSS notebook, import the MLflow model with import_mlflow_version_from_managed_folder:
    
    import dataiku
    
    client = dataiku.api_client()
    project = client.get_default_project()
    # Assuming your model is a binary classification model
    dss_model = project.create_mlflow_pyfunc_model('your_model_name', 'BINARY_CLASSIFICATION')
    dss_model_version = dss_model.import_mlflow_version_from_managed_folder("your_model_version", "your_managed_folder_id", "path_in_managed_folder", "your_code_environment")
    # don't forget to call dss_model_version.set_core_metadata and dss_model_version.evaluate if you want to have access to performance and explainability features of DSS
    
0 Kudos
vaishnavi
Level 3
Author

Thanks for your response @AurelienL I will try this and check once.

0 Kudos