How to find shared-in datasets of a project with Python API?
Thanks for your time at the beginning.
I have a project and I want to know which datasets are shared-in from other projects (black icons) with Python API.
I look through the previous Q&A, only find the way to find those which are shared-out to other project:
def find_exposed_datasets(project):
result = []
raw = project.get_settings().get_raw()
exposed_objects = raw['exposedObjects']['objects']
for obj in exposed_objects:
if obj['type'] == 'DATASET':
result.append(obj)
# # print out share-to projects
# for item in obj['rules']:
# result.append(item['targetProject'])
return result
Thanks for your help!
Operating system used: Win11 enterprise
Best Answer
-
https://community.dataiku.com/discussion/3372/get-shared-projects-using-dataiku-api
I found solutions. Find Tomas's response from the link. Thanks you Tomas.
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,615 NeuronI don't believe it's possible to get this at instance level so you will to call client.list_project_keys() and loop through every project exposed objects to collect the whole list so you can then do a lookup per project.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,615 NeuronIndeed that does what I said you should do which is to loop through every project. Personally I wouldn't call this in a function since it can take some time to run in large instances. I will build a small flow that refreshes a table every hour or so. Then you can call your function or report using this metadata.
-
There is also another trick, rather than iterating over projects.
You could get the flow from your project, and look if the data contains datasets with a
.in the name, something like:client = dataiku.api_client() project = client.get_default_project() flow = project.get_flow() [dataset for dataset in flow.get_graph().get_source_datasets() if '.' in dataset.dataset_name]
This solution requires the shared dataset as an input, which is usually the case.
Best


