How to load MLFLOW registered model to Dataiku DSS

Options
vaishnavi
vaishnavi Registered Posts: 40 ✭✭✭✭
edited July 16 in Using Dataiku

Before using Dataiku, I used to use MLFlow (Hosted on an EC2 instance) to register my models. Now I want to load those models to Dataiku DSS.

I have read the below document as well to understand the process of importing mlflow models.

https://doc.dataiku.com/dss/latest/mlops/mlflow-models/importing.html

But I understood that models saved to MLFLow from Dataiku can only be loaded back to DSS.

mlflow_version = saved_model.import_mlflow_version_from_path("version_id", model_directory, 'code-environment-to-use')

In this line of code, the model_directory is the path of the mlflow registered model which is on other EC2 instance ? If yes, I tried the same but it gives me below error

DataikuException: com.dataiku.dip.io.SocketBlockLinkKernelException: Could not run command READ_META: : <class 'FileNotFoundError'> : [Errno 2] No such file or directory:
'<dataiku-install-dir-name>/saved_models/<Project_Name>/<id>/versions/version_id/MLmodel'

This the value I had provided to the model_directory attribute : https://<host-name>/mlflow/#/models/<model_name&gt;

Answers

  • AurelienL
    AurelienL Dataiker Posts: 9 Dataiker
    Options

    Hi @vaishnavi

    import_mlflow_version_from_path only works with filesystem paths, not URLs. So your model would have to be on the DSS filesystem to use that method.

    An alternative would be to copy all your model files to a DSS managed folder and use import_mlflow_version_from_managed_folder.

  • vaishnavi
    vaishnavi Registered Posts: 40 ✭✭✭✭
    Options

    @AurelienL
    But how do I copy the registered models from Mlflow instance to DSS ?

  • AurelienL
    AurelienL Dataiker Posts: 9 Dataiker
    edited July 17
    Options

    @vaishnavi
    To import a model from a MLflow model registry to DSS, you can:

    1. Download the model to your local filesystem. For example, from your own machine:
      import mlflow
      
      mlflow.set_tracking_uri("your_mlflow_tracking_uri")
      
      model_name = "your_model_name"
      model_version = "latest"
      local_dir = "./my_model"
      
      mlflow.artifacts.download_artifacts(artifact_uri=f"models:/{model_name}/{model_version}", dst_path=local_dir)
    2. Copy all the files from the resulting folder to a managed folder in DSS:
    • From the flow, click on "+ DATASET" / "Folder"
    • Give the folder a label, click create
    • Click "Add a file" then select all the files from your local folder and upload them
    • Copy the managed folder id (you can find it in your browser's URL bar), you will need it in the next step

    From a DSS notebook, import the MLflow model with import_mlflow_version_from_managed_folder:
    import dataiku
    
    client = dataiku.api_client()
    project = client.get_default_project()
    # Assuming your model is a binary classification model
    dss_model = project.create_mlflow_pyfunc_model('your_model_name', 'BINARY_CLASSIFICATION')
    dss_model_version = dss_model.import_mlflow_version_from_managed_folder("your_model_version", "your_managed_folder_id", "path_in_managed_folder", "your_code_environment")
    # don't forget to call dss_model_version.set_core_metadata and dss_model_version.evaluate if you want to have access to performance and explainability features of DSS
    

  • vaishnavi
    vaishnavi Registered Posts: 40 ✭✭✭✭
    Options

    Thanks for your response @AurelienL
    I will try this and check once.

Setup Info
    Tags
      Help me…