How to find shared-in datasets of a project with Python API?

Haoran
Haoran Registered Posts: 7 ✭✭✭

Thanks for your time at the beginning.

I have a project and I want to know which datasets are shared-in from other projects (black icons) with Python API.

image.png

I look through the previous Q&A, only find the way to find those which are shared-out to other project:

def find_exposed_datasets(project):
result = []
raw = project.get_settings().get_raw()
exposed_objects = raw['exposedObjects']['objects']
for obj in exposed_objects:
if obj['type'] == 'DATASET':
result.append(obj)
# # print out share-to projects
# for item in obj['rules']:
# result.append(item['targetProject'])
return result


Thanks for your help!

Operating system used: Win11 enterprise

Best Answer

  • Haoran
    Haoran Registered Posts: 7 ✭✭✭
    Answer ✓

    https://community.dataiku.com/discussion/3372/get-shared-projects-using-dataiku-api

    image.png


    I found solutions. Find Tomas's response from the link. Thanks you Tomas.

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,615 Neuron

    I don't believe it's possible to get this at instance level so you will to call client.list_project_keys() and loop through every project exposed objects to collect the whole list so you can then do a lookup per project.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,615 Neuron

    Indeed that does what I said you should do which is to loop through every project. Personally I wouldn't call this in a function since it can take some time to run in large instances. I will build a small flow that refreshes a table every hour or so. Then you can call your function or report using this metadata.

  • FlorentD
    FlorentD Dataiker, Dataiku DSS Core Designer, Registered Posts: 32 Dataiker

    There is also another trick, rather than iterating over projects.

    You could get the flow from your project, and look if the data contains datasets with a . in the name, something like:

    client = dataiku.api_client()
    project = client.get_default_project()
    flow = project.get_flow()
    [dataset for dataset in flow.get_graph().get_source_datasets() if '.' in dataset.dataset_name]
    

    This solution requires the shared dataset as an input, which is usually the case.

    Best

Setup Info
    Tags
      Help me…