Implement Open Lineage as part of the Dataiku Catalogue
Epic User Story:
As a data analyst who cares about the lineage of data, I'm receiving and sharing. Implement Open Lineage as part of the Dataiku Catalog. This will provide an interoperable way to track metadata about data as it flows into Dataiku, around Dataiku, and out to other tools.
COS:
- This is a new and still developing standard. Maintain a commitment to advance with the standard as it continues to develop.
Notes:
- The Data Engineering Podcast had an interesting podcast about this called Unlocking "The Power of Data Lineage In Your Platform with OpenLineage"
- Here is another talk about the project "Observability and Pipelines: OpenLineage and Marquez" by Data Matt Turck VC at FirstMark.
Comments
-
This would also add rocket fuel to the integration with Alation, if Alation can also read OpenLineage data.
One of our biggest problems at my company is finding the underlying sources of data and understanding how it's been transformed and when it was materialized. Implementing this would mean that for datasets created with Dataiku, that won't be a problem at all, even to users outside Dataiku. It'd enable both finding ways data has been used and understanding ultimate data sources and transformations. Dataiku transformations wouldn't be a black box to data consumers, instead it'd be clear what data was transformed how and when to get to an end result. Great idea
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Thanks for your comments. Know anyone else in the Dataiku DSS community who would find this useful. If so, ask them to upvote the idea.
What products are you currently using with OpenLineage support at this time? How are you finding the advantages of having this kind of meta-data?
-
As far as I know, we don't have any other tools that currently interface with OpenLineage (though it looks possible to implement for Airflow jobs?). But that wouldn't prevent us from taking advantage of this metadata - as long as it's universally accessible, we can leverage it for datasets that provide it, and encourage teams using other tools (and our other vendors) to provide it as well.
-
Hello,
We are already integrating with data management specific solutions such as Alation & Collibra. I am not aware of them supporting Open Lineage though.
In this context, we will look at this technology indeed and see whether there are options or traction for it.
In the mean time, anyone interested or with specific usage, do not hesitate to share.
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
@fsergot
,Thanks for changing status to investigating. The Alation & Collibra solutions appear to be closed sources and not yet de facto standards.
Which in my experience means that interoperability come from those two groups for their own platforms. And that each group is invested into creating a mote around their customers and business model.
As a customer this situation is often expensive to me as a user. What is interesting to me with the idea of Open Lineage is value of cross vendor standards and the multiplier effect this brings to innovation, and lowered cost of entry.