Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

API for Internal Metrics Dataset

joshi
Level 1
API for Internal Metrics Dataset

I am able to create Internal Metrics Dataset directly from my flow. How can I create this using a Python code in DSS?

0 Kudos
7 Replies
AlexT
Dataiker
Dataiker

Hi,

You can use create_dataset as suggested here https://doc.dataiku.com/dss/latest/python-api/projects.html#dataikuapi.dss.project.DSSProject.create.... First create a dataset from the UI then retrieve the required params and then you can use those to create similar datasets from python code here is an example :

import dataiku
import pandas as pd, numpy as np

# retrieve dataset details 

client = dataiku.api_client()
project = client.get_default_project()
dataset_settings = project.get_dataset("project_only_metrics").get_settings()
get_params = dataset_settings.get_raw_params()
ds_type = settings.get_raw()['type']

print(ds_type)
print(get_params)

#you can also hard code these once you know the values based on the existing dataset
#params_defined = { "view": "METRICS_HISTORY", "scope": "PROJECT"}
#dataset_type = "JobsDB"
# create_dataset(dataset_name, type, params=None, formatType=None, formatParams=None)

project.create_dataset("new_metrics_dataset_4", ds_type, params=get_params)

 

Screenshot 2021-08-27 at 09.57.52.png

0 Kudos
joshi
Level 1
Author

Hi Alex,

Thank you so much for the response!

I am able to use this code to create the internal metrics dataset for my current project. How can I create this dataset for another project from a notebook/recipe in my current project?

0 Kudos
AlexT
Dataiker
Dataiker

Hi Joshi,

To create it for another project you can use something like this: 

import dataiku
import pandas as pd, numpy as np

# retrieve dataset details 
client = dataiku.api_client()
source_project = client.get_project('SOURCE_PROJECT_NAME')
dataset_settings = source_project.get_dataset("project_only_metrics").get_settings()
get_params = dataset_settings.get_raw_params()
ds_type = settings.get_raw()['type']

print(ds_type)
print(get_params)

#you can also hard code these once you know the values based on the existing dataset
#params_defined = { "view": "METRICS_HISTORY", "scope": "PROJECT"}
#dataset_type = "JobsDB"
# create_dataset(dataset_name, type, params=None, formatType=None, formatParams=None)

destination_project = client.get_project('DESTINATION_PROJECT_NAME')

destination_project.create_dataset("new_metrics_dataset_4", ds_type, params=get_params)

 

0 Kudos
joshi123
Level 1

Alex, I am able to create the required dataset using this!

However, I am not able to use the dataset as internal metrics dataset has no schema. Is there a workaround for this?

 

0 Kudos
AlexT
Dataiker
Dataiker

Hi,

Please try adding at the bottom of the existing script,

settings = dataset.autodetect_settings() 
settings
.save()
 
Reference : 
0 Kudos
joshi123
Level 1

 This works for regular datasets only. It is throwing an error for Internal Metrics Dataset.

0 Kudos
AlexT
Dataiker
Dataiker

Hi,

Sorry for the delay. To overcome this you can either go to the settings of the dataset, click on Preview and save.

Or manually set the schema:

import dataiku
import dataikuapi
import pandas as pd, numpy as np

client = dataiku.api_client()
project_key = 'OTHER_PROJECT'
dataset_name = "my_dataset_name"

dataset = dataikuapi.dss.dataset.DSSDataset(client,project_key,dataset_name)
schema_to_set = {'columns': [{'name': 'connection', 'type': 'string'}, {'name': 'task_type', 'type': 'string'}, {'name': 'project_key', 'type': 'string'}, {'name': 'task_data', 'type': 'string'}, {'name': 'user', 'type': 'string'}, {'name': 'start_time', 'type': 'bigint'}, {'name': 'end_time', 'type': 'bigint'}], 'userModified': True}
dataset.set_schema(schema_to_set)
0 Kudos