Get shared projects using Dataiku API
Hi there,
I am looking for a way to get the database keys and names that are shared into my project using the Dataiku API.
I tried the following:
project = client.get_project('PROJECT_NAME')
datasets = project.list_datasets()
When using datasets[index_of_database]['params']['table'], then I get the name of a database.
However, the API call does not include databases which are shared into my project.
Background of this is to find dependencies of projects (e.g. if database A is shared into project B, then project A needs to be built first)
I am looking forward to your help.
Best,
Oliver
Best Answer
-
Hi, this code snippet can help you get the list of shared datasets + their connections.
client = dataiku.api_client()
for project_key in client.list_project_keys():
print "*** EXPOSED FROM PROJECT %s ***" % (project_key)
p = client.get_project(project_key)
for exposed_object in p.get_settings().get_raw()["exposedObjects"]["objects"]:
connection = p.get_dataset(exposed_object["localName"]).get_definition().get('params').get('connection')
print " Object id=%s type=%s db=%s is exposed to projects:" % (exposed_object["localName"], exposed_object["type"], connection)
for rule in exposed_object["rules"]:
print " %s" % rule["targetProject"]Cheers,
Answers
-
Thanks a lot, Du. Very helpful!
-
If you want to check if the shared (exported) dataset is used in downstream (i.e. is an input of a recipe in the other project) you can use something like this:
def get_shared_datasets(client, project_key=None, direction='from'):
# Returns all the shared dataset
# 1. from a given project (direction = from)
# i.e. it returns all the datasets that are exported(shared) from this project
# and are used. So for example if DS1 is exported from PRJA to PRJB
# it is reported only if in PRJB there is a recipe reading PRJA.DS1.
# 2. or to a given project (direction = to)
# i.e. it returns all the datasets that are imported to this project
# and are used. So for example if DS is imported from PRJB to PRJA
# it is reported only if in PRJA there is a recipe reading PRJB.DS1
# project_key can be <str> or <list> of <str>
# If project_key is None, then returns exported datasets from every project
# Result is a dict with structure:
# {u'PROJECT_KEY_A':
# {u'dataset_A': [u'CHILD_PROJECT_A'],
# u'dataset_B': [u'CHILD_PROJECT_A',u'CHILD_PROJECT_B'],
# ... },
# u'PROJECT_KEY_B':
# { .. }
# }
# client = dataiku.api_client()
projects = []
if isinstance(project_key, str):
projects = [project_key]
if isinstance(project_key, list):
projects = project_key
patt = re.compile('\w+\.\w+')
shared_datasets = {}
for project in client.list_projects():
prj = client.get_project(project['projectKey'])
for r in prj.list_recipes():
if 'inputs' in r:
if 'main' in r['inputs']:
if 'items' in r['inputs']['main']:
for inp in r['inputs']['main']['items']:
if patt.match(inp['ref']):
proj_ds = inp['ref'].split('.')
if project_key is None or (proj_ds[0] in projects and direction == 'from') or\
(project['projectKey'] in projects and direction == 'to'):
if proj_ds[0] not in shared_datasets:
shared_datasets[proj_ds[0]] = {}
if proj_ds[1] not in shared_datasets[proj_ds[0]]:
shared_datasets[proj_ds[0]][proj_ds[1]] = []
if project['projectKey'] not in shared_datasets[proj_ds[0]][proj_ds[1]]:
shared_datasets[proj_ds[0]][proj_ds[1]].append(project['projectKey'])
return shared_datasets