Implement Open Lineage as part of the Dataiku Catalogue

Epic User Story:

As a data analyst who cares about the lineage of data, I'm receiving and sharing.  Implement Open Lineage as part of the Dataiku Catalog. This will provide an interoperable way to track metadata about data as it flows into Dataiku, around Dataiku, and out to other tools.

COS:

  • This is a new and still developing standard.  Maintain a commitment to advance with the standard as it continues to develop.

Notes:

--Tom
6 Comments

This would also add rocket fuel to the integration with Alation, if Alation can also read OpenLineage data. 

One of our biggest problems at my company is finding the underlying sources of data and understanding how it's been transformed and when it was materialized. Implementing this would mean that for datasets created with Dataiku, that won't be a problem at all, even to users outside Dataiku. It'd enable both finding ways data has been used and understanding ultimate data sources and transformations. Dataiku transformations wouldn't be a black box to data consumers, instead it'd be clear what data was transformed how and when to get to an end result. Great idea

This would also add rocket fuel to the integration with Alation, if Alation can also read OpenLineage data. 

One of our biggest problems at my company is finding the underlying sources of data and understanding how it's been transformed and when it was materialized. Implementing this would mean that for datasets created with Dataiku, that won't be a problem at all, even to users outside Dataiku. It'd enable both finding ways data has been used and understanding ultimate data sources and transformations. Dataiku transformations wouldn't be a black box to data consumers, instead it'd be clear what data was transformed how and when to get to an end result. Great idea

@natejgardner ,

Thanks for your comments.  Know anyone else in the Dataiku DSS community who would find this useful.  If so, ask them to upvote the idea.

What products are you currently using with OpenLineage support at this time?  How are you finding the advantages of having this kind of meta-data?

 

--Tom

@natejgardner ,

Thanks for your comments.  Know anyone else in the Dataiku DSS community who would find this useful.  If so, ask them to upvote the idea.

What products are you currently using with OpenLineage support at this time?  How are you finding the advantages of having this kind of meta-data?

 

As far as I know, we don't have any other tools that currently interface with OpenLineage (though it looks possible to implement for Airflow jobs?). But that wouldn't prevent us from taking advantage of this metadata - as long as it's universally accessible, we can leverage it for datasets that provide it, and encourage teams using other tools (and our other vendors) to provide it as well. 

As far as I know, we don't have any other tools that currently interface with OpenLineage (though it looks possible to implement for Airflow jobs?). But that wouldn't prevent us from taking advantage of this metadata - as long as it's universally accessible, we can leverage it for datasets that provide it, and encourage teams using other tools (and our other vendors) to provide it as well. 

fsergot
Dataiker

Hello,

We are already integrating with data management specific solutions such as Alation & Collibra. I am not aware of them supporting Open Lineage though.

In this context, we will look at this technology indeed and see whether there are options or traction for it.

In the mean time, anyone interested or with specific usage, do not hesitate to share.

Status changed to: Parked

Hello,

We are already integrating with data management specific solutions such as Alation & Collibra. I am not aware of them supporting Open Lineage though.

In this context, we will look at this technology indeed and see whether there are options or traction for it.

In the mean time, anyone interested or with specific usage, do not hesitate to share.

@fsergot ,

Thanks for changing status to investigating. The Alation & Collibra solutions appear to be closed sources and not yet de facto standards.  

Which in my experience means that interoperability come from those two groups for their own platforms. And that each group is invested into creating a mote around their customers and business model.  

As a customer this situation is often expensive to me as a user.   What is interesting to me with the idea of Open Lineage is value of cross vendor standards and the multiplier effect this brings to innovation, and lowered cost of entry.  

--Tom

@fsergot ,

Thanks for changing status to investigating. The Alation & Collibra solutions appear to be closed sources and not yet de facto standards.  

Which in my experience means that interoperability come from those two groups for their own platforms. And that each group is invested into creating a mote around their customers and business model.  

As a customer this situation is often expensive to me as a user.   What is interesting to me with the idea of Open Lineage is value of cross vendor standards and the multiplier effect this brings to innovation, and lowered cost of entry.  

MichaelG
Community Manager
Community Manager
 
I hope I helped! Do you Know that if I was Useful to you or Did something Outstanding you can Show your appreciation by giving me a KUDOS?

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as โ€˜Accepted Solutionโ€™ to help others like you!
Status changed to: In the Backlog