API for Internal Metrics Dataset
I am able to create Internal Metrics Dataset directly from my flow. How can I create this using a Python code in DSS?
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi,
You can use create_dataset as suggested here https://doc.dataiku.com/dss/latest/python-api/projects.html#dataikuapi.dss.project.DSSProject.create_dataset. First create a dataset from the UI then retrieve the required params and then you can use those to create similar datasets from python code here is an example :
import dataiku import pandas as pd, numpy as np # retrieve dataset details client = dataiku.api_client() project = client.get_default_project() dataset_settings = project.get_dataset("project_only_metrics").get_settings() get_params = dataset_settings.get_raw_params() ds_type = settings.get_raw()['type'] print(ds_type) print(get_params) #you can also hard code these once you know the values based on the existing dataset #params_defined = { "view": "METRICS_HISTORY", "scope": "PROJECT"} #dataset_type = "JobsDB" # create_dataset(dataset_name, type, params=None, formatType=None, formatParams=None) project.create_dataset("new_metrics_dataset_4", ds_type, params=get_params)
-
Hi Alex,
Thank you so much for the response!
I am able to use this code to create the internal metrics dataset for my current project. How can I create this dataset for another project from a notebook/recipe in my current project?
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi Joshi,
To create it for another project you can use something like this:
import dataiku import pandas as pd, numpy as np # retrieve dataset details client = dataiku.api_client() source_project = client.get_project('SOURCE_PROJECT_NAME') dataset_settings = source_project.get_dataset("project_only_metrics").get_settings() get_params = dataset_settings.get_raw_params() ds_type = settings.get_raw()['type'] print(ds_type) print(get_params) #you can also hard code these once you know the values based on the existing dataset #params_defined = { "view": "METRICS_HISTORY", "scope": "PROJECT"} #dataset_type = "JobsDB" # create_dataset(dataset_name, type, params=None, formatType=None, formatParams=None) destination_project = client.get_project('DESTINATION_PROJECT_NAME') destination_project.create_dataset("new_metrics_dataset_4", ds_type, params=get_params)
-
Alex, I am able to create the required dataset using this!
However, I am not able to use the dataset as internal metrics dataset has no schema. Is there a workaround for this?
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi,
Please try adding at the bottom of the existing script,
settings = dataset.autodetect_settings()
settings.save()Reference : -
This works for regular datasets only. It is throwing an error for Internal Metrics Dataset.
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi,
Sorry for the delay. To overcome this you can either go to the settings of the dataset, click on Preview and save.
Or manually set the schema:
import dataiku import dataikuapi import pandas as pd, numpy as np client = dataiku.api_client() project_key = 'OTHER_PROJECT' dataset_name = "my_dataset_name" dataset = dataikuapi.dss.dataset.DSSDataset(client,project_key,dataset_name) schema_to_set = {'columns': [{'name': 'connection', 'type': 'string'}, {'name': 'task_type', 'type': 'string'}, {'name': 'project_key', 'type': 'string'}, {'name': 'task_data', 'type': 'string'}, {'name': 'user', 'type': 'string'}, {'name': 'start_time', 'type': 'bigint'}, {'name': 'end_time', 'type': 'bigint'}], 'userModified': True} dataset.set_schema(schema_to_set)