Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

Excelion Partners - Building a Free Plugin to Efficiently Catalog and View Data Lineage

Team members: Ryan Moore & Tony Olson

Country: United States

Organization: Excelion Partners

Excelion Partners is a consulting organization with cloud data architects, data scientists, data engineers, and data analysts that are passionate about finding answers and building solutions with data. We help you "Decide with Data."

Awards Categories:

  • Partner Acceleration
  • Moonshot Pioneers

Business Challenge:

At Excelion Partners, we work with numerous customers who utilize Dataiku in their Data Science and Analytics practice. Many of these organizations and analytics groups have not yet invested in an enterprise data cataloging tool or data lineage tool, which are often cost-prohibitive.

As part of the productionalization process for these customers, we have often witnessed them creating "homegrown" data cataloging solutions that typically consist of a combination of spreadsheets, Dataiku, and their preferred visualization tool. Their “homegrown” data cataloging solutions are labor-intensive to maintain and do not integrate with their developers, who are hands-on with the Dataiku projects.

Additionally, our clients struggle with data lineage. They are creating numerous downstream datasets in Dataiku. We often experience them saying “where did that column come from?” Without upstream data lineage visibility, our clients lose trust in the data and ultimately the solution’s business outcomes.

 

Business Solution:

Because of this cataloging and lineage challenge, Excelion has created a free Dataiku plugin called Thread. Thread is a lightweight catalog and lineage tool that directly integrates with Dataiku and its datasets. This tool allows for a single location to document data connected to Dataiku and to consume the catalog's contents in a manner that is easy and efficient for business practices.

Thread is implemented as a Dataiku web app plugin which has a very easy installation process and has the ability to securely scan an entire (or partial) Dataiku node to allow for lineage view and documentation. The indexes and metadata that are generated by Thread are saved as Dataiku datasets in a project flow, making it very easy to export indexes and metadata for exposure in 3rd party visualization tools such as PowerBI or Tableau.

Use Case Stage: In Production

Value Generated:

THREAD has already been deployed on 100s of projects at multiple joint Excelion and Dataiku clients. Here are some areas of business value THREAD users have obtained:

Creates Efficiencies

  • Less clicks / saves time by having the data definition at the time and location the information is needed.
  • More insights and improved insights during exploratory data analysis through better documented columns.

image3.png

  • This all leads to faster solution building and data enrichment through documentation and improved data understanding.

Improves Governance

  • Clear measurement of governance through KPIs showing the percent of columns documented in any data set.

image1.png

  •  Creates a repository for data documentation.
  • Easier to keep definitions up to date.
  • Allows definitions to be easily auditable (exportable).

image2.png

  • Natively integrated with Dataiku permissions that limit editing data definitions to those with access.

Improves Trust

  • Creates easy transparency for data analysts, data engineers, data scientists, and business leaders to see:
    • What data was used in a project (data catalog).
    • Where it was used (upstream/downstream data lineage). 
    • How that data is defined throughout the project (data dictionary).

image4.png

  •  Builds a common language between the business and analysts

Training & Onboarding Efficiencies

  • Helps new team members learn company-specific jargon and abbreviations faster.
  • Streamlines onboarding and training by keeping all individuals in Dataiku instead of a myriad of spreadsheets and code documentation.

Saves Money and Labor

  • Saves Analytics leaders $200k+ in purchasing, implementing, and supporting an enterprise grade data catalog & data lineage tool for their Dataiku environment.

 

Value Brought by Dataiku:

Thread is built on top of Dataiku! All the value Thread creates is an extension of and possible because of Dataiku.

Dataiku’s flexible and extensible platform allows the community to contribute and share solutions across organizations and industries easily. The ability to write custom plugins and integrate with the Python API provide the capability to achieve exceptional business value through custom integrations.

The native security integration removes governance concerns on building application solutions on top of Dataiku and thus increases the speed of the innovation process.

Value Type:

  • Reduce risk
  • Save time
  • Increase trust
Comments

Amazing work, congratulations! By the way, is it possible to install the plugin somehow?

Share
Version history
Last update:
a week ago
Updated by:
Contributors