Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

Research by data

Hello community,

I know there is alerady a data catalog in DSS, but I think it will be very useful to have this functionnality at project level :

Capture_dss.PNG

I can find a particular recipie or dataset, but I would like to see as well where a data is stored and used in the entire Flow.

Have a nice day 🙂

6 Comments
ElisaS
Dataiker
Dataiker

Hi @Tuong-Vi

I'm not sure I fully understand the feature request here. You would like to be able to see all available datasets from this search bar (=same results as in the Data Catalog) or is it something else linked to the datasets used in the project ? 

Elisa 

ElisaS
Dataiker
Dataiker
Status changed to: Needs Info
 
Tuong-Vi
Neuron
Neuron

Hello @ElisaS ,

Actually, at Project level, when I want to see if a particular data exists or is used, i have to choose one dataset, click on it and use the panel bar to search if the column name (for example "product_name") is in the schema :

 

Sans titre.png

 

some users have told me that it will be useful to have a search bar at project level (or why not another option "data" in the data catalog) to see quickly all the datasets having "product_name" in their schema.

Hope it will be helpful,

Tuong-Vi

ElisaS
Dataiker
Dataiker
Status changed to: In Backlog
 
ElisaS
Dataiker
Dataiker

Hi @Tuong-Vi

Thanks for the clarification. When you enter a column name in the Catalog's search bar, you will get the datasets that have this column but it may not be very clear in the UI since we don't display the column in question. 

I have added the request to have this at the project level in our backlog. We can't provide a timeline at this point, but be sure to check back for updates!

Elisa 

natejgardner
Neuron
Neuron

I agree, this would be a really helpful feature, especially if computed columns were also searchable here. One of my biggest challenges in large projects with hundreds of datasets is finding my custom columns so they can be copied onto other datasets. But in general, being able to quickly check which datasets in my project have a particular field would narrow down searches a lot. I've run into the need to search for datasets that contain a particular column within my project at least three times in the last month. It'd be really powerful if the search feature could be upgraded to index column-level metadata, like names, types, source, and, if already collected, column-level metrics. My current main project has over 500 datasets, spread over about 20 flow zones, so any indexing and search features to make projects of that scale more navigable are very welcome in my team!