The Dataiku Frontrunner Awards have launched to recognize your achievements! SUBMIT YOUR ENTRY

Implement Open Lineage as part of the Dataiku Catalogue

Epic User Story:

As a data analyst who cares about the lineage of data, I'm receiving and sharing.  Implement Open Lineage as part of the Dataiku Catalog. This will provide an interoperable way to track metadata about data as it flows into Dataiku, around Dataiku, and out to other tools.

COS:

  • This is a new and still developing standard.  Maintain a commitment to advance with the standard as it continues to develop.

Notes:

5 Comments
natejgardner
Level 5

This would also add rocket fuel to the integration with Alation, if Alation can also read OpenLineage data. 

One of our biggest problems at my company is finding the underlying sources of data and understanding how it's been transformed and when it was materialized. Implementing this would mean that for datasets created with Dataiku, that won't be a problem at all, even to users outside Dataiku. It'd enable both finding ways data has been used and understanding ultimate data sources and transformations. Dataiku transformations wouldn't be a black box to data consumers, instead it'd be clear what data was transformed how and when to get to an end result. Great idea

tgb417
Neuron
Neuron

@natejgardner ,

Thanks for your comments.  Know anyone else in the Dataiku DSS community who would find this useful.  If so, ask them to upvote the idea.

What products are you currently using with OpenLineage support at this time?  How are you finding the advantages of having this kind of meta-data?

 

natejgardner
Level 5

As far as I know, we don't have any other tools that currently interface with OpenLineage (though it looks possible to implement for Airflow jobs?). But that wouldn't prevent us from taking advantage of this metadata - as long as it's universally accessible, we can leverage it for datasets that provide it, and encourage teams using other tools (and our other vendors) to provide it as well. 

fsergot
Dataiker
Dataiker
Status changed to: Investigating

Hello,

We are already integrating with data management specific solutions such as Alation & Collibra. I am not aware of them supporting Open Lineage though.

In this context, we will look at this technology indeed and see whether there are options or traction for it.

In the mean time, anyone interested or with specific usage, do not hesitate to share.

tgb417
Neuron
Neuron

@fsergot ,

Thanks for changing status to investigating. The Alation & Collibra solutions appear to be closed sources and not yet de facto standards.  

Which in my experience means that interoperability come from those two groups for their own platforms. And that each group is invested into creating a mote around their customers and business model.  

As a customer this situation is often expensive to me as a user.   What is interesting to me with the idea of Open Lineage is value of cross vendor standards and the multiplier effect this brings to innovation, and lowered cost of entry.  

Public