API for Internal Metrics Dataset

Options
joshi
joshi Registered Posts: 2 ✭✭✭

I am able to create Internal Metrics Dataset directly from my flow. How can I create this using a Python code in DSS?

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    edited July 17
    Options

    Hi,

    You can use create_dataset as suggested here https://doc.dataiku.com/dss/latest/python-api/projects.html#dataikuapi.dss.project.DSSProject.create_dataset. First create a dataset from the UI then retrieve the required params and then you can use those to create similar datasets from python code here is an example :

    import dataiku
    import pandas as pd, numpy as np
    
    # retrieve dataset details 
    
    client = dataiku.api_client()
    project = client.get_default_project()
    dataset_settings = project.get_dataset("project_only_metrics").get_settings()
    get_params = dataset_settings.get_raw_params()
    ds_type = settings.get_raw()['type']
    
    print(ds_type)
    print(get_params)
    
    #you can also hard code these once you know the values based on the existing dataset
    #params_defined = { "view": "METRICS_HISTORY", "scope": "PROJECT"}
    #dataset_type = "JobsDB"
    # create_dataset(dataset_name, type, params=None, formatType=None, formatParams=None)
    
    project.create_dataset("new_metrics_dataset_4", ds_type, params=get_params)

    Screenshot 2021-08-27 at 09.57.52.png

  • joshi
    joshi Registered Posts: 2 ✭✭✭
    Options

    Hi Alex,

    Thank you so much for the response!

    I am able to use this code to create the internal metrics dataset for my current project. How can I create this dataset for another project from a notebook/recipe in my current project?

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    edited July 17
    Options

    Hi Joshi,

    To create it for another project you can use something like this:

    import dataiku
    import pandas as pd, numpy as np
    
    # retrieve dataset details 
    client = dataiku.api_client()
    source_project = client.get_project('SOURCE_PROJECT_NAME')
    dataset_settings = source_project.get_dataset("project_only_metrics").get_settings()
    get_params = dataset_settings.get_raw_params()
    ds_type = settings.get_raw()['type']
    
    print(ds_type)
    print(get_params)
    
    #you can also hard code these once you know the values based on the existing dataset
    #params_defined = { "view": "METRICS_HISTORY", "scope": "PROJECT"}
    #dataset_type = "JobsDB"
    # create_dataset(dataset_name, type, params=None, formatType=None, formatParams=None)
    
    destination_project = client.get_project('DESTINATION_PROJECT_NAME')
    
    destination_project.create_dataset("new_metrics_dataset_4", ds_type, params=get_params)

  • joshi123
    joshi123 Dataiku DSS Core Designer, Registered Posts: 3 ✭✭✭
    Options

    Alex, I am able to create the required dataset using this!

    However, I am not able to use the dataset as internal metrics dataset has no schema. Is there a workaround for this?

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    edited July 17
    Options

    Hi,

    Please try adding at the bottom of the existing script,

    settings = dataset.autodetect_settings() 
    settings
    .save()
    Reference :
  • joshi123
    joshi123 Dataiku DSS Core Designer, Registered Posts: 3 ✭✭✭
    Options

    This works for regular datasets only. It is throwing an error for Internal Metrics Dataset.

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    edited July 17
    Options

    Hi,

    Sorry for the delay. To overcome this you can either go to the settings of the dataset, click on Preview and save.

    Or manually set the schema:

    import dataiku
    import dataikuapi
    import pandas as pd, numpy as np
    
    client = dataiku.api_client()
    project_key = 'OTHER_PROJECT'
    dataset_name = "my_dataset_name"
    
    dataset = dataikuapi.dss.dataset.DSSDataset(client,project_key,dataset_name)
    schema_to_set = {'columns': [{'name': 'connection', 'type': 'string'}, {'name': 'task_type', 'type': 'string'}, {'name': 'project_key', 'type': 'string'}, {'name': 'task_data', 'type': 'string'}, {'name': 'user', 'type': 'string'}, {'name': 'start_time', 'type': 'bigint'}, {'name': 'end_time', 'type': 'bigint'}], 'userModified': True}
    dataset.set_schema(schema_to_set)
Setup Info
    Tags
      Help me…