Get shared projects using Dataiku API

osk
osk Registered Posts: 9 ✭✭✭✭
edited July 18 in Using Dataiku

Hi there,

I am looking for a way to get the database keys and names that are shared into my project using the Dataiku API.

I tried the following:


project = client.get_project('PROJECT_NAME')
datasets = project.list_datasets()

When using datasets[index_of_database]['params']['table'], then I get the name of a database.

However, the API call does not include databases which are shared into my project.

Background of this is to find dependencies of projects (e.g. if database A is shared into project B, then project A needs to be built first)

I am looking forward to your help.

Best,

Oliver

Tagged:

Best Answer

  • UserBird
    UserBird Dataiker, Alpha Tester Posts: 535 Dataiker
    edited July 18 Answer ✓

    Hi, this code snippet can help you get the list of shared datasets + their connections.


    client = dataiku.api_client()
    for project_key in client.list_project_keys():
    print "*** EXPOSED FROM PROJECT %s ***" % (project_key)
    p = client.get_project(project_key)
    for exposed_object in p.get_settings().get_raw()["exposedObjects"]["objects"]:
    connection = p.get_dataset(exposed_object["localName"]).get_definition().get('params').get('connection')
    print " Object id=%s type=%s db=%s is exposed to projects:" % (exposed_object["localName"], exposed_object["type"], connection)
    for rule in exposed_object["rules"]:
    print " %s" % rule["targetProject"]

    Cheers,

Answers

  • osk
    osk Registered Posts: 9 ✭✭✭✭
    Thanks a lot, Du. Very helpful!
  • Tomas
    Tomas Registered, Neuron 2022 Posts: 121 ✭✭✭✭✭
    edited July 18

    If you want to check if the shared (exported) dataset is used in downstream (i.e. is an input of a recipe in the other project) you can use something like this:


    def get_shared_datasets(client, project_key=None, direction='from'):
    # Returns all the shared dataset
    # 1. from a given project (direction = from)
    # i.e. it returns all the datasets that are exported(shared) from this project
    # and are used. So for example if DS1 is exported from PRJA to PRJB
    # it is reported only if in PRJB there is a recipe reading PRJA.DS1.
    # 2. or to a given project (direction = to)
    # i.e. it returns all the datasets that are imported to this project
    # and are used. So for example if DS is imported from PRJB to PRJA
    # it is reported only if in PRJA there is a recipe reading PRJB.DS1
    # project_key can be <str> or <list> of <str>
    # If project_key is None, then returns exported datasets from every project
    # Result is a dict with structure:
    # {u'PROJECT_KEY_A':
    # {u'dataset_A': [u'CHILD_PROJECT_A'],
    # u'dataset_B': [u'CHILD_PROJECT_A',u'CHILD_PROJECT_B'],
    # ... },
    # u'PROJECT_KEY_B':
    # { .. }
    # }
    # client = dataiku.api_client()
    projects = []
    if isinstance(project_key, str):
    projects = [project_key]
    if isinstance(project_key, list):
    projects = project_key
    patt = re.compile('\w+\.\w+')
    shared_datasets = {}
    for project in client.list_projects():
    prj = client.get_project(project['projectKey'])
    for r in prj.list_recipes():
    if 'inputs' in r:
    if 'main' in r['inputs']:
    if 'items' in r['inputs']['main']:
    for inp in r['inputs']['main']['items']:
    if patt.match(inp['ref']):
    proj_ds = inp['ref'].split('.')
    if project_key is None or (proj_ds[0] in projects and direction == 'from') or\
    (project['projectKey'] in projects and direction == 'to'):
    if proj_ds[0] not in shared_datasets:
    shared_datasets[proj_ds[0]] = {}
    if proj_ds[1] not in shared_datasets[proj_ds[0]]:
    shared_datasets[proj_ds[0]][proj_ds[1]] = []
    if project['projectKey'] not in shared_datasets[proj_ds[0]][proj_ds[1]]:
    shared_datasets[proj_ds[0]][proj_ds[1]].append(project['projectKey'])
    return shared_datasets

Setup Info
    Tags
      Help me…