DSS Catalog: how to add descriptions to the columns?

Solved!
Ignacio_Toledo
DSS Catalog: how to add descriptions to the columns?

Hi community,

We are currently working on our strategy for data governance, and we have been looking around for data catalogs to help us in the process.

DSS has a very nice Catalog tool, which allows us to search tables in external databases (external meaning stored in external database engines/servers), provided that they can be indexed by DSS. We have indexed our SQL and hive data sources so far, and the potential of the Catalog tool seems to be huge.

There is one feature that we have not been able to deduce how to use, and is the addition of "descriptions" to the Columns of an external table:

catalog.png

The questions is: is it possible to add descriptions to the external tables' columns from within the Catalog interface? Or are they should be added to the metadata of the sources from outside DSS?

We have made the test with managed datasets stored in the external databases, and added information into the "description" field of the Schema (like in the following screenshot), and even then the description doesn't show up in the Catalog.

schemainfo.png

Thanks in advance

Ignacio. 

1 Solution
Mark_Treveil
Dataiker Alumni

Ignacio

I do not believe it is currently possible to add metadata to external tables.  My understanding is that our metadata is only held by DSS objects. 

This is an area I am looking at for potential further development so I would be very interested to discuss, offline, your use cases.

Many thanks 

Mark

View solution in original post

0 Kudos
5 Replies
Mark_Treveil
Dataiker Alumni

Ignacio

I do not believe it is currently possible to add metadata to external tables.  My understanding is that our metadata is only held by DSS objects. 

This is an area I am looking at for potential further development so I would be very interested to discuss, offline, your use cases.

Many thanks 

Mark

0 Kudos

@Mark_Treveil 

Is there a way for the Skema to keep track of the original source of data?  That is the name of the data object that originally externally provided the data.  If the data is modified it would have some indication that the column was a calculated column.

It appears that to some level the object descriptions are passed through the scheme.  However the rules being used are a little bot opaque to me right now.

--Tom
0 Kudos
Ignacio_Toledo
Author

Hi Mark,

As you said, for DSS objects like datasets the metadata is available when browsing the "DSS ITEMS" in the catalog, including the Columns descriptions. But I was afraid that for external tables it was going to be something different.

But I'm happy to hear that you are looking for potential further developments! I will send you a private message to discuss offline what we are thinking on. Is something similar to the feature to add descriptions in the columns, in tools like Amundesen.

Cheers! Ignacio

0 Kudos
Andrey
Dataiker Alumni

Hi @Ignacio_Toledo ,

 

Have you tried to add description in a field that says "+ Add description" ? 

The information you add there is stored on the DSS side and once our re-index the connection containing that table the description will become searchable in the catalog.

 

Screenshot 2020-11-08 at 12.52.11.png

Screenshot 2020-11-08 at 12.52.20.png

 

Regards

Andrey Avtomonov
R&D Engineer @ Dataiku
0 Kudos
Ignacio_Toledo
Author

Hi @Andrey. Thanks for the suggestions, and yes, we have not only tried that, but we use it, along with the tags, to make the data discovery easier. However, this is a step further, which is to add now descriptions to the Columns of the tables. Some of our tables have more than 20 columns, and when that happens having quick access to the description or meaning is really important.

As I mentioned to @Mark_Treveil, is a feature like the one available in this particular tool:

amundsen_columns.png

Cheers!

0 Kudos