Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I am able to create Internal Metrics Dataset directly from my flow. How can I create this using a Python code in DSS?
Hi,
You can use create_dataset as suggested here https://doc.dataiku.com/dss/latest/python-api/projects.html#dataikuapi.dss.project.DSSProject.create.... First create a dataset from the UI then retrieve the required params and then you can use those to create similar datasets from python code here is an example :
import dataiku
import pandas as pd, numpy as np
# retrieve dataset details
client = dataiku.api_client()
project = client.get_default_project()
dataset_settings = project.get_dataset("project_only_metrics").get_settings()
get_params = dataset_settings.get_raw_params()
ds_type = settings.get_raw()['type']
print(ds_type)
print(get_params)
#you can also hard code these once you know the values based on the existing dataset
#params_defined = { "view": "METRICS_HISTORY", "scope": "PROJECT"}
#dataset_type = "JobsDB"
# create_dataset(dataset_name, type, params=None, formatType=None, formatParams=None)
project.create_dataset("new_metrics_dataset_4", ds_type, params=get_params)
Hi Alex,
Thank you so much for the response!
I am able to use this code to create the internal metrics dataset for my current project. How can I create this dataset for another project from a notebook/recipe in my current project?
Hi Joshi,
To create it for another project you can use something like this:
import dataiku
import pandas as pd, numpy as np
# retrieve dataset details
client = dataiku.api_client()
source_project = client.get_project('SOURCE_PROJECT_NAME')
dataset_settings = source_project.get_dataset("project_only_metrics").get_settings()
get_params = dataset_settings.get_raw_params()
ds_type = settings.get_raw()['type']
print(ds_type)
print(get_params)
#you can also hard code these once you know the values based on the existing dataset
#params_defined = { "view": "METRICS_HISTORY", "scope": "PROJECT"}
#dataset_type = "JobsDB"
# create_dataset(dataset_name, type, params=None, formatType=None, formatParams=None)
destination_project = client.get_project('DESTINATION_PROJECT_NAME')
destination_project.create_dataset("new_metrics_dataset_4", ds_type, params=get_params)
Alex, I am able to create the required dataset using this!
However, I am not able to use the dataset as internal metrics dataset has no schema. Is there a workaround for this?
Hi,
Please try adding at the bottom of the existing script,
settings = dataset.autodetect_settings()
settings.save()
Hi,
Sorry for the delay. To overcome this you can either go to the settings of the dataset, click on Preview and save.
Or manually set the schema:
import dataiku
import dataikuapi
import pandas as pd, numpy as np
client = dataiku.api_client()
project_key = 'OTHER_PROJECT'
dataset_name = "my_dataset_name"
dataset = dataikuapi.dss.dataset.DSSDataset(client,project_key,dataset_name)
schema_to_set = {'columns': [{'name': 'connection', 'type': 'string'}, {'name': 'task_type', 'type': 'string'}, {'name': 'project_key', 'type': 'string'}, {'name': 'task_data', 'type': 'string'}, {'name': 'user', 'type': 'string'}, {'name': 'start_time', 'type': 'bigint'}, {'name': 'end_time', 'type': 'bigint'}], 'userModified': True}
dataset.set_schema(schema_to_set)