How to retrieve the test dataset used in the trained model With python?
Hello everyone,
I am working on Dataiku, primarily using their API. I have trained my model and would like to retrieve the dataset that was used for testing via the API methods. Despite trying several methods, including get_train_info(), I am unable to obtain the test dataset. I don't want to export it; I just want to access it within my script. I am working with a train_model_id().
Thank you for helping
Operating system used: Windows
Operating system used: Windows
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Can you post your code snippet in a code block (the </> icon) and what errors/issues do you get?
-
I'm not encountering any errors, but I'm unable to find an API method to access the training dataset used when the model was trained.
<
client = dataiku.api_client()project_key = dataiku.default_project_key()
project = client.get_project(project_key)
data = []
saved_models = project.list_saved_models()
for model_info in saved_models:model_id = model_info.get('id', 'ID inconnu')
model_name = model_info.get('name', 'Nom inconnu')
# Get Active version Info
version_active = saved_model.get_active_version()
version_active_id = version_active.get('id', 'Version ID inconnu')
is_active = version_active.get('active', False)
train_date = version_active.get('trainDate', None)
train_date_active = datetime.fromtimestamp(train_date / 1000).strftime('%Y-%m-%d %H:%M:%S') if train_date else "Date inconnue"
full_model_id_active = saved_model.get_version_details(version_active_id).get_raw().get('smOrigin', {}).get('fullModelId', 'FullModel ID inconnu')
parts = full_model_id_active.split("-")
if len(parts) > 5:
analysis_id = parts[2]
ml_task_id = parts[3]
id_session = parts[4]
# Obtenir l'objet Analyse
analysis_obj = project.get_analysis(analysis_id_active)
# Obtenir la tâche ML
mltask = analysis_obj.get_ml_task(ml_task_id_active)
# Get train model details
details = mltask.get_trained_model_details(full_model_id_active)
user_meta = details_last_session.get_user_meta()
print(user_meta)
/>
Here is the result :
{'starred': False, 'clusterMetas': {}, 'customMeta': {'kv': {}}, 'tags': [], 'activeClassifierThreshold': 0.625, 'description': '', 'name': 'Decision Tree (s2)', 'labels': [{'key': 'trainDataset:dataset-name', 'value': 'customers_labeled'}, {'key': 'testDataset:dataset-name', 'value': 'customers_labeled'}, {'key': 'testDataset:creator', 'value': 'alex'}, {'key': 'evaluationDataset:dataset-name', 'value': 'customers_labeled'}, {'key': 'trainDataset:creator', 'value': 'alex'}, {'key': 'evaluationDataset:creator', 'value': 'alex'}, {'key': 'model:algorithm', 'value': 'DECISION_TREE_CLASSIFICATION'}, {'key': 'model:name', 'value': 'Decision Tree (s2)'}, {'key': 'model:date', 'value': '2024-12-04T14:28:31.686+0100'}, {'key': 'evaluation:date', 'value': '2024-12-04T14:28:31.686+0100'}]}