DSS Catalog: how to add descriptions to the columns?

Ignacio_Toledo
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

Hi community,

We are currently working on our strategy for data governance, and we have been looking around for data catalogs to help us in the process.

DSS has a very nice Catalog tool, which allows us to search tables in external databases (external meaning stored in external database engines/servers), provided that they can be indexed by DSS. We have indexed our SQL and hive data sources so far, and the potential of the Catalog tool seems to be huge.

There is one feature that we have not been able to deduce how to use, and is the addition of "descriptions" to the Columns of an external table:

catalog.png

The questions is: is it possible to add descriptions to the external tables' columns from within the Catalog interface? Or are they should be added to the metadata of the sources from outside DSS?

We have made the test with managed datasets stored in the external databases, and added information into the "description" field of the Schema (like in the following screenshot), and even then the description doesn't show up in the Catalog.

schemainfo.png

Thanks in advance

Ignacio. 

Best Answer

  • Mark_Treveil
    Mark_Treveil Dataiker Alumni Posts: 30 ✭✭✭✭✭
    Answer ✓

    Ignacio

    I do not believe it is currently possible to add metadata to external tables. My understanding is that our metadata is only held by DSS objects.

    This is an area I am looking at for potential further development so I would be very interested to discuss, offline, your use cases.

    Many thanks

    Mark

Answers

  • Andrey
    Andrey Dataiker Alumni Posts: 119 ✭✭✭✭✭✭✭

    Hi @Ignacio_Toledo
    ,

    Have you tried to add description in a field that says "+ Add description" ?

    The information you add there is stored on the DSS side and once our re-index the connection containing that table the description will become searchable in the catalog.

    Screenshot 2020-11-08 at 12.52.11.png

    Screenshot 2020-11-08 at 12.52.20.png

    Regards

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @Mark_Treveil

    Is there a way for the Skema to keep track of the original source of data? That is the name of the data object that originally externally provided the data. If the data is modified it would have some indication that the column was a calculated column.

    It appears that to some level the object descriptions are passed through the scheme. However the rules being used are a little bot opaque to me right now.

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Hi Mark,

    As you said, for DSS objects like datasets the metadata is available when browsing the "DSS ITEMS" in the catalog, including the Columns descriptions. But I was afraid that for external tables it was going to be something different.

    But I'm happy to hear that you are looking for potential further developments! I will send you a private message to discuss offline what we are thinking on. Is something similar to the feature to add descriptions in the columns, in tools like Amundesen.

    Cheers! Ignacio

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Hi @Andrey
    . Thanks for the suggestions, and yes, we have not only tried that, but we use it, along with the tags, to make the data discovery easier. However, this is a step further, which is to add now descriptions to the Columns of the tables. Some of our tables have more than 20 columns, and when that happens having quick access to the description or meaning is really important.

    As I mentioned to @Mark_Treveil
    , is a feature like the one available in this particular tool:

    amundsen_columns.png

    Cheers!

Setup Info
    Tags
      Help me…