Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

List all project's datasets dependencies (Shared and HDFS)

ephemeral
Level 1
List all project's datasets dependencies (Shared and HDFS)

Hi there,

I'll soon be tackling down a huge compliance revamp on a project.
But I'd need to have a view of all the dependencies (datasets in use + their original source project + transitory projects) in order to plan everything.

We are eyeballing 10+ projects and couple hundreds of datasets.

Some datasets in the final target project are shared and some other are not utilizing the inbuilt share functionality, simply being imported (and stored) as HDFS files.

The HOME > PROJECTS > Graph view does not help since you can not display only dependency related projects (filter) and the graph view retains too much unnecessary information since it's global which makes it unexploitable.
In addition to that and IMO,the light gray to dark grey link color to highlight dependency is also not easily visible.

Is there any plugins, scripts or features that could help me plan/ have a precise view of what I need?

Thanks,

0 Kudos
1 Reply
SarinaS
Dataiker

Hi @ephemeral,

Would you mind providing a detailed single example of exactly the information you are looking for, including screenshots highlighting what you want to pull from your flow? That should help us suggest an approach! 

Is the following question and response similar to what you are looking for at all?
https://community.dataiku.com/t5/Using-Dataiku/python-api-Select-all-upstream-and-downstream-dataset... 

Thank you,
Sarina

0 Kudos