Implement Open Lineage as part of the Dataiku Catalogue

tgb417
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

Epic User Story:

As a data analyst who cares about the lineage of data, I'm receiving and sharing. Implement Open Lineage as part of the Dataiku Catalog. This will provide an interoperable way to track metadata about data as it flows into Dataiku, around Dataiku, and out to other tools.

COS:

  • This is a new and still developing standard. Maintain a commitment to advance with the standard as it continues to develop.

Notes:

13
13 votes

In the Backlog · Last Updated

Comments

  • natejgardner
    natejgardner Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 151 Neuron

    This would also add rocket fuel to the integration with Alation, if Alation can also read OpenLineage data.

    One of our biggest problems at my company is finding the underlying sources of data and understanding how it's been transformed and when it was materialized. Implementing this would mean that for datasets created with Dataiku, that won't be a problem at all, even to users outside Dataiku. It'd enable both finding ways data has been used and understanding ultimate data sources and transformations. Dataiku transformations wouldn't be a black box to data consumers, instead it'd be clear what data was transformed how and when to get to an end result. Great idea

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @natejgardner
    ,

    Thanks for your comments. Know anyone else in the Dataiku DSS community who would find this useful. If so, ask them to upvote the idea.

    What products are you currently using with OpenLineage support at this time? How are you finding the advantages of having this kind of meta-data?

  • natejgardner
    natejgardner Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 151 Neuron

    As far as I know, we don't have any other tools that currently interface with OpenLineage (though it looks possible to implement for Airflow jobs?). But that wouldn't prevent us from taking advantage of this metadata - as long as it's universally accessible, we can leverage it for datasets that provide it, and encourage teams using other tools (and our other vendors) to provide it as well.

  • fsergot
    fsergot Dataiker, Registered, Product Ideas Manager Posts: 118 Dataiker

    Hello,

    We are already integrating with data management specific solutions such as Alation & Collibra. I am not aware of them supporting Open Lineage though.

    In this context, we will look at this technology indeed and see whether there are options or traction for it.

    In the mean time, anyone interested or with specific usage, do not hesitate to share.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @fsergot
    ,

    Thanks for changing status to investigating. The Alation & Collibra solutions appear to be closed sources and not yet de facto standards.

    Which in my experience means that interoperability come from those two groups for their own platforms. And that each group is invested into creating a mote around their customers and business model.

    As a customer this situation is often expensive to me as a user. What is interesting to me with the idea of Open Lineage is value of cross vendor standards and the multiplier effect this brings to innovation, and lowered cost of entry.

Setup Info
    Tags
      Help me…