Snow Fox Data - Building a Free Plugin to Efficiently Catalog and View Data Lineage

rmoore
rmoore Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Participant, Neuron 2023 Posts: 33 Neuron

Team members: Ryan Moore & Tony Olson

Country: United States

Organization: Snow Fox Data

Snow Fox Data is a premier data strategy, data science, and analytics solutions provider. Headquartered in Wisconsin and serving customers worldwide, we provide a vast landscape of knowledge that supports your success through data-driven decision-making. A passionate team of data architects, data scientists, data engineers, and data analysts, Snow Fox Data empowers you to make clearer decisions through clever data solutions.

Awards Categories:

  • Partner Acceleration
  • Moonshot Pioneers

Business Challenge:

At Snow Fox Data, we work with numerous customers who utilize Dataiku in their Data Science and Analytics practice. Many of these organizations and analytics groups have not yet invested in an enterprise data cataloging tool or data lineage tool, which are often cost-prohibitive.

As part of the productionalization process for these customers, we have often witnessed them creating "homegrown" data cataloging solutions that typically consist of a combination of spreadsheets, Dataiku, and their preferred visualization tool. Their “homegrown” data cataloging solutions are labor-intensive to maintain and do not integrate with their developers, who are hands-on with the Dataiku projects.

Additionally, our clients struggle with data lineage. They are creating numerous downstream datasets in Dataiku. We often experience them saying “where did that column come from?” Without upstream data lineage visibility, our clients lose trust in the data and ultimately the solution’s business outcomes.

Business Solution:

Because of this cataloging and lineage challenge, Snow Fox Data has created a free Dataiku plugin called THREAD™. THREAD™ is a lightweight catalog and lineage tool that directly integrates with Dataiku and its datasets. This tool allows for a single location to document data connected to Dataiku and to consume the catalog's contents in a manner that is easy and efficient for business practices.

THREAD™ is implemented as a Dataiku web app plugin that has a very easy installation process and has the ability to securely scan an entire (or partial) Dataiku node to allow for lineage view and documentation. The indexes and metadata that are generated by THREAD™ are saved as Dataiku datasets in a project flow, making it very easy to export indexes and metadata for exposure in 3rd party visualization tools such as PowerBI or Tableau.

Use Case Stage: In Production

Value Generated:

THREAD™ has already been deployed on 100s of projects at multiple joint Snow Fox Data and Dataiku clients. Here are some areas of business value THREAD™ users have obtained:

Creates Efficiencies

  • Less clicks / saves time by having the data definition at the time and location the information is needed.
  • More insights and improved insights during exploratory data analysis through better documented columns.

image3.png

  • This all leads to faster solution building and data enrichment through documentation and improved data understanding.

Improves Governance

  • Clear measurement of governance through KPIs showing the percent of columns documented in any data set.

image1.png

  • Creates a repository for data documentation.
  • Easier to keep definitions up to date.
  • Allows definitions to be easily auditable (exportable).

image2.png

  • Natively integrated with Dataiku permissions that limit editing data definitions to those with access.

Improves Trust

  • Creates easy transparency for data analysts, data engineers, data scientists, and business leaders to see:
    • What data was used in a project (data catalog).
    • Where it was used (upstream/downstream data lineage).
    • How that data is defined throughout the project (data dictionary).

image4.png

  • Builds a common language between the business and analysts

Training & Onboarding Efficiencies

  • Helps new team members learn company-specific jargon and abbreviations faster.
  • Streamlines onboarding and training by keeping all individuals in Dataiku instead of a myriad of spreadsheets and code documentation.

Saves Money and Labor

  • Saves Analytics leaders $200k+ in purchasing, implementing, and supporting an enterprise grade data catalog & data lineage tool for their Dataiku environment.

Value Brought by Dataiku:

THREAD™ is built on top of Dataiku! All the value THREAD™ creates is an extension of and possible because of Dataiku.

Dataiku’s flexible and extensible platform allows the community to contribute and share solutions across organizations and industries easily. The ability to write custom plugins and integrate with the Python API provide the capability to achieve exceptional business value through custom integrations.

The native security integration removes governance concerns on building application solutions on top of Dataiku and thus increases the speed of the innovation process.

Value Type:

  • Reduce risk
  • Save time
  • Increase trust

Comments

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Amazing work, congratulations! By the way, is it possible to install the plugin somehow?

  • harsh9127
    harsh9127 Dataiku DSS Core Designer, Registered, Frontrunner 2022 Participant Posts: 2 ✭✭✭

    Great work! Are external users able to download this plug-in?

  • rmoore
    rmoore Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Participant, Neuron 2023 Posts: 33 Neuron

    Hi @Ignacio_Toledo
    @harsh9127
    - the Dataiku team is reviewing the plugin for approval in the store right now. Send me a DM if you'd like to get "early access", I can get you the plugin directly.

Setup Info
    Tags
      Help me…