How to retrieve the test dataset used in the trained model With python?

Agoi Crispussia
Agoi Crispussia Registered Posts: 2 ✭✭

Hello everyone,

I am working on Dataiku, primarily using their API. I have trained my model and would like to retrieve the dataset that was used for testing via the API methods. Despite trying several methods, including get_train_info(), I am unable to obtain the test dataset. I don't want to export it; I just want to access it within my script. I am working with a train_model_id().

Thank you for helping

Operating system used: Windows

Operating system used: Windows

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron

    Can you post your code snippet in a code block (the </> icon) and what errors/issues do you get?

  • Agoi Crispussia
    Agoi Crispussia Registered Posts: 2 ✭✭

    I'm not encountering any errors, but I'm unable to find an API method to access the training dataset used when the model was trained.

    <
     client = dataiku.api_client()

    project_key = dataiku.default_project_key()

    project = client.get_project(project_key)

    data = []

    saved_models = project.list_saved_models()

    for model_info in saved_models:

        model_id = model_info.get('id', 'ID inconnu')

        model_name = model_info.get('name', 'Nom inconnu')

        # Get Active version Info

        version_active = saved_model.get_active_version()

        version_active_id = version_active.get('id', 'Version ID inconnu')

        is_active = version_active.get('active', False)

        train_date = version_active.get('trainDate', None)

        train_date_active = datetime.fromtimestamp(train_date / 1000).strftime('%Y-%m-%d %H:%M:%S') if train_date else "Date inconnue"

        full_model_id_active = saved_model.get_version_details(version_active_id).get_raw().get('smOrigin', {}).get('fullModelId', 'FullModel ID inconnu')

        parts = full_model_id_active.split("-") 

        if len(parts) > 5:        

            analysis_id = parts[2]

            ml_task_id = parts[3]

            id_session = parts[4]     

            # Obtenir l'objet Analyse

            analysis_obj = project.get_analysis(analysis_id_active)

            # Obtenir la tâche ML

            mltask = analysis_obj.get_ml_task(ml_task_id_active) 

            # Get train model details 

            details = mltask.get_trained_model_details(full_model_id_active)

            user_meta = details_last_session.get_user_meta()

            print(user_meta)
    />
    Here is the result :
    {'starred': False, 'clusterMetas': {}, 'customMeta': {'kv': {}}, 'tags': [], 'activeClassifierThreshold': 0.625, 'description': '', 'name': 'Decision Tree (s2)', 'labels': [{'key': 'trainDataset:dataset-name', 'value': 'customers_labeled'}, {'key': 'testDataset:dataset-name', 'value': 'customers_labeled'}, {'key': 'testDataset:creator', 'value': 'alex'}, {'key': 'evaluationDataset:dataset-name', 'value': 'customers_labeled'}, {'key': 'trainDataset:creator', 'value': 'alex'}, {'key': 'evaluationDataset:creator', 'value': 'alex'}, {'key': 'model:algorithm', 'value': 'DECISION_TREE_CLASSIFICATION'}, {'key': 'model:name', 'value': 'Decision Tree (s2)'}, {'key': 'model:date', 'value': '2024-12-04T14:28:31.686+0100'}, {'key': 'evaluation:date', 'value': '2024-12-04T14:28:31.686+0100'}]}

Setup Info
    Tags
      Help me…