How to load MLFLOW registered model to Dataiku DSS
Before using Dataiku, I used to use MLFlow (Hosted on an EC2 instance) to register my models. Now I want to load those models to Dataiku DSS.
I have read the below document as well to understand the process of importing mlflow models.
https://doc.dataiku.com/dss/latest/mlops/mlflow-models/importing.html
But I understood that models saved to MLFLow from Dataiku can only be loaded back to DSS.
mlflow_version = saved_model.import_mlflow_version_from_path("version_id", model_directory, 'code-environment-to-use')
In this line of code, the model_directory is the path of the mlflow registered model which is on other EC2 instance ? If yes, I tried the same but it gives me below error
DataikuException: com.dataiku.dip.io.SocketBlockLinkKernelException: Could not run command READ_META: : <class 'FileNotFoundError'> : [Errno 2] No such file or directory:
'<dataiku-install-dir-name>/saved_models/<Project_Name>/<id>/versions/version_id/MLmodel'
This the value I had provided to the model_directory attribute : https://<host-name>/mlflow/#/models/<model_name>
Answers
-
Hi @vaishnavi
import_mlflow_version_from_path only works with filesystem paths, not URLs. So your model would have to be on the DSS filesystem to use that method.
An alternative would be to copy all your model files to a DSS managed folder and use import_mlflow_version_from_managed_folder.
-
@AurelienL
But how do I copy the registered models from Mlflow instance to DSS ? -
@vaishnavi
To import a model from a MLflow model registry to DSS, you can:- Download the model to your local filesystem. For example, from your own machine:
import mlflow mlflow.set_tracking_uri("your_mlflow_tracking_uri") model_name = "your_model_name" model_version = "latest" local_dir = "./my_model" mlflow.artifacts.download_artifacts(artifact_uri=f"models:/{model_name}/{model_version}", dst_path=local_dir)
- Copy all the files from the resulting folder to a managed folder in DSS:
- From the flow, click on "+ DATASET" / "Folder"
- Give the folder a label, click create
- Click "Add a file" then select all the files from your local folder and upload them
- Copy the managed folder id (you can find it in your browser's URL bar), you will need it in the next step
import dataiku client = dataiku.api_client() project = client.get_default_project() # Assuming your model is a binary classification model dss_model = project.create_mlflow_pyfunc_model('your_model_name', 'BINARY_CLASSIFICATION') dss_model_version = dss_model.import_mlflow_version_from_managed_folder("your_model_version", "your_managed_folder_id", "path_in_managed_folder", "your_code_environment") # don't forget to call dss_model_version.set_core_metadata and dss_model_version.evaluate if you want to have access to performance and explainability features of DSS
- Download the model to your local filesystem. For example, from your own machine:
-
Thanks for your response @AurelienL
I will try this and check once.