Get shared projects using Dataiku API

Options
osk
osk Registered Posts: 9 ✭✭✭✭

Hi there,

I am looking for a way to get the database keys and names that are shared into my project using the Dataiku API.

I tried the following:


project = client.get_project('PROJECT_NAME')<BR />datasets = project.list_datasets()

When using datasets[index_of_database]['params']['table'], then I get the name of a database.

However, the API call does not include databases which are shared into my project.

Background of this is to find dependencies of projects (e.g. if database A is shared into project B, then project A needs to be built first)

I am looking forward to your help.

Best,

Oliver

Tagged:

Best Answer

  • UserBird
    UserBird Dataiker, Alpha Tester Posts: 535 Dataiker
    Answer ✓
    Options

    Hi, this code snippet can help you get the list of shared datasets + their connections.


    client = dataiku.api_client()<BR />for project_key in client.list_project_keys():<BR /> print "*** EXPOSED FROM PROJECT %s ***" % (project_key)<BR /> p = client.get_project(project_key)<BR /> for exposed_object in p.get_settings().get_raw()["exposedObjects"]["objects"]:<BR /> connection = p.get_dataset(exposed_object["localName"]).get_definition().get('params').get('connection')<BR /> print " Object id=%s type=%s db=%s is exposed to projects:" % (exposed_object["localName"], exposed_object["type"], connection)<BR /> for rule in exposed_object["rules"]:<BR /> print " %s" % rule["targetProject"]<BR />

    Cheers,

Answers

  • osk
    osk Registered Posts: 9 ✭✭✭✭
    Options
    Thanks a lot, Du. Very helpful!
  • tomas
    tomas Registered, Neuron 2022 Posts: 120 ✭✭✭✭✭
    Options

    If you want to check if the shared (exported) dataset is used in downstream (i.e. is an input of a recipe in the other project) you can use something like this:


    def get_shared_datasets(client, project_key=None, direction='from'):<BR /> # Returns all the shared dataset<BR /> # 1. from a given project (direction = from)<BR /> # i.e. it returns all the datasets that are exported(shared) from this project<BR /> # and are used. So for example if DS1 is exported from PRJA to PRJB<BR /> # it is reported only if in PRJB there is a recipe reading PRJA.DS1.<BR /> # 2. or to a given project (direction = to)<BR /> # i.e. it returns all the datasets that are imported to this project<BR /> # and are used. So for example if DS is imported from PRJB to PRJA<BR /> # it is reported only if in PRJA there is a recipe reading PRJB.DS1<BR /> # project_key can be <str> or <list> of <str><BR /> # If project_key is None, then returns exported datasets from every project<BR /> # Result is a dict with structure:<BR /> # {u'PROJECT_KEY_A':<BR /> # {u'dataset_A': [u'CHILD_PROJECT_A'],<BR /> # u'dataset_B': [u'CHILD_PROJECT_A',u'CHILD_PROJECT_B'],<BR /> # ... },<BR /> # u'PROJECT_KEY_B':<BR /> # { .. }<BR /> # }<BR /> # client = dataiku.api_client()<BR /> projects = []<BR /> if isinstance(project_key, str):<BR /> projects = [project_key]<BR /> if isinstance(project_key, list):<BR /> projects = project_key<BR /> patt = re.compile('\w+\.\w+')<BR /> shared_datasets = {}<BR /> for project in client.list_projects():<BR /> prj = client.get_project(project['projectKey'])<BR /> for r in prj.list_recipes():<BR /> if 'inputs' in r:<BR /> if 'main' in r['inputs']:<BR /> if 'items' in r['inputs']['main']:<BR /> for inp in r['inputs']['main']['items']:<BR /> if patt.match(inp['ref']):<BR /> proj_ds = inp['ref'].split('.')<BR /> if project_key is None or (proj_ds[0] in projects and direction == 'from') or\<BR /> (project['projectKey'] in projects and direction == 'to'):<BR /> if proj_ds[0] not in shared_datasets:<BR /> shared_datasets[proj_ds[0]] = {}<BR /> if proj_ds[1] not in shared_datasets[proj_ds[0]]:<BR /> shared_datasets[proj_ds[0]][proj_ds[1]] = []<BR /> if project['projectKey'] not in shared_datasets[proj_ds[0]][proj_ds[1]]:<BR /> shared_datasets[proj_ds[0]][proj_ds[1]].append(project['projectKey'])<BR /> return shared_datasets

Setup Info
    Tags
      Help me…