7 Lessons to Ensure Successful ML Projects: The Dataiku Take
Originally posted by Dataiku on May 5, 2021 by @cgrasso88
Regularly thrown around are the myriad of reasons that data science and machine learning (ML) projects fail, with some of the popular ones including data quality issues (missing or incomplete data, for example), problems around tooling (i.e., confronting legacy systems and processes during digital transformation), and a lack of data project operationalization (meaning the ML model wasn’t deployed for use across the organization to drive tangible value and impact).
Instead of dwelling on the failures, we’re going to flip the script on that notion and use this article from MIT Sloan School of Management to highlight seven lessons to ensure successful ML projects. We use each item listed by author Michelle Lee, vice president of machine learning solutions at Amazon Web Services and full-term member of the MIT Corporation, as the base and detail how Dataiku’s all-in-one platform supports it.
1. Make sure you have easy access to necessary data — and a comprehensive data strategy.
Most organizations don’t have a central repository for their data efforts and, rather, have disparate data sources across legacy tools and spreadsheets. Further, they don’t have the infrastructure or even a strategy to think about improving access to or quality of data.
Dataiku’s unique end-to-end approach offers organizations the ability to centralize all AI efforts in one single product and interface which, in turn, enables teams to easily document and reuse data project elements with other teams or individuals across the organization. As an organic result of having everything in one place, business stakeholders and technical profiles collaborate, breaking down silos while also ensuring robust access controls and monitoring.
With Dataiku, organizations can connect to any data source, regardless of where or how it’s stored. This ensures teams can use the latest and best data science technologies and the overall organization and centralization allows for multiple projects to happen simultaneously, aiding in scalability.
2. Carefully select machine learning use cases and set success metrics.
Dataiku can help organizations prioritize use cases according to three main criteria: the business value, the level of necessary effort, and the likelihood of success (including any risk). Particularly during the global health crisis, many companies realized that agility and elasticity in AI efforts are critical. Businesses must be prepared to pursue a wide spectrum of AI use cases depending on the company’s immediate needs — from self-serve analytics to operationalized models in production — and Dataiku facilitates and supports every use case across every industry.
Insurance company and Dataiku customer Aviva is a great example of how a company can use a data science platform to achieve strategic business objectives by doing multiple use cases across different departments while overcoming underlying complexity. Working in Dataiku helped the team provide transparency into data that previously resided in a lot of different legacy systems through built-in connectors, APIs, and plugins.
As Aviva began its transition to the cloud, Dataiku’s flexibility to deploy on-prem and in the cloud meant that the team was able to continue working on key automation and predictive analytics use cases with minimal disruption to their workflows versus having to tear everything down and start again from scratch. All of this helped the team become five times more efficient from beginning to end and it can now build a model and push it to production in two days versus two weeks.
The success has led to Dataiku begin used for different use cases across multiple business units (including People Analytics, Claims, Risk Analytics, and Audit Analytics) and across three continents, with a wide range of users from full-code (i.e., data scientists working in Python) to no-code (i.e., claims handlers using Dataiku APIs to leverage insights into claimants).
Once the use cases are selected, organizations should be sure to implement a use case qualification framework, appointing a specific person or team to own this function moving forward and track success metrics at least quarterly. This rundown of successes and learnings will help build internal momentum and inspire other parts of the organization, while also highlighting any takeaways from initial use cases that can be improved upon in the future.
3. Make sure technical experts and domain experts work side by side.
Oftentimes, data analysts, data scientists, IT, and business teams aren’t empowered to work together collaboratively. Sometimes that is because each person is working independently on his or her own machine with local copies of files, other times it’s because some people have domain expertise but no formal data science training, and vice versa.
Even further, some team members (i.e., analysts) prefer to use low or no-code visual tools to work with data, while others (i.e., data scientists) prefer a code-first approach. This leaves frustration for all and, likely, a lack of understanding of what other people are doing in their unique sections of the data science workflow.
At Dataiku, we believe that collaborative data science is not only possible, but critical — it should be the status quo for organizations that want to scale AI. By embracing the different strengths and technical skills of various contributors and enabling them to consolidate their work in a governed and organized way, we ease these challenges and enable teams to develop data products faster and more efficiently. In addition to the ability to use either code or a visual interface, the platform provides:
- Integrated documentation and knowledge sharing, including project to-do lists, commenting, and sharing
- Robust project change management and rollback
- Advanced team activity monitoring for effective data project management
4. Ensure executive sponsorship and a culture of experimentation.
Companies in the early stages of scaling analytics use cases — 32% of organizations according to an ESI Thought Lab Survey — must meticulously lay out the top three to five feasible use cases that can create the greatest value quickly, ideally within the first year. This will establish momentum and encourage buy-in for future analytics investments. A helpful way to decide on the use cases is by analyzing the entire business value chain, from supplier to purchase to post-sales, to identify the ones with the highest potential value.
Further, when setting up any self-service data initiative, Dataiku customer GE Aviation always works with business lines to make sure the needs of the business are incorporated into the project. To ensure ongoing success, they get even more people involved. They combine both grassroots efforts within the business and executive buy-in and support to increase self-service program visibility, exposure, and word-of-mouth advocacy.
So, it’s evident that executive buy-in and support is critical not only in the beginning stages of investing in AI, but throughout the entirety of each project. Transforming businesses processes and technology is one thing, but establishing a data culture where each employee feels inspired and empowered to incorporate data into his or her day-to-day tasks is the real challenge. In fact, in the New Vantage Partners Big Data and AI Executive survey 2021, 92% of business leaders report that challenges to becoming data-driven are predominantly related to people and culture. To mitigate this, organizations can create a culture of experimentation. Simply put, this means that data leaders:
- Engage their data professionals to help make strategic decisions
- Reward them for innovation and collaboration (i.e., tying performance and compensation metrics to these functions)
- Redefine the traditional boundaries of what’s considered a success (i.e., embracing curiosity and looking at failures as learnings to better prepare for the future)
5. Assess and address any skill gaps.
Educating and upskilling employees is fundamentally critical to AI staffing — which is, in turn, fundamentally critical to successfully scaling AI — but it is overwhelmingly and woefully overlooked at most companies. According to a McKinsey survey, only 35% of high-performing companies, and a meager 10% of all others, say they have an active continuous learning program on AI for employees.
At Dataiku, we believe that companies should create a blueprint for their talent hiring, retention, and upskilling so that they find the right people, retain them, and ultimately, design a scalable, multi-persona workforce. They should:
- Communicate the value of data science, ML, and AI to staff, so they can more easily see how upskilling comes into play and how data efforts fit into the company’s greater business strategy
- Invest in technology that supports upskilling and collaboration: For example, Dataiku enables non-technical personas to participate in the data science process (i.e., data exploration and visualization without needing to know how to code).
- Implement a top-down and bottom-up strategy: This means providing more education around AI initiatives and upskilling employees so they can see the power and value that data can deliver to each individual role.
6. Free your team from unnecessary heavy lifting and invest in the right infrastructure.
Dataiku connects to existing infrastructure — so there is no need to move data for processing — plus format and schema detection allows instant access to data wherever it’s stored (analytical MPP databases, cloud databases, operational databases, NOSQL stores, Hadoop, cloud object storage, remote data sources, and more).
Especially in the remote work environment that resulted from the global health crisis, this means that the way people work with data stays consistent and secure, regardless of changes in underlying systems, staff, etc. Also, as alluded to earlier, Dataiku removes the “heavy lifting” by offering the ability to use either code or a visual interface for effective contributions from all skill levels at all stages of a data project.
7. Plan for the long term.
With regard to tooling specifically, Dataiku is ideal for organizations who want to make AI projects and systems sustainable over time. It allows organizations to leverage existing underlying data architecture as well as invest in best-of-breed technologies in terms of storage, compute, algorithms, languages, frameworks, and more, making it tremendously future proof.
But long term planning for AI doesn’t stop at technology. Scaling AI requires the right framework — one that involves embedding AI in all business processes. The onus to do so has to come from the business lines themselves, who need to be supported by an organizational model that strikes the balance between bottom-up empowerment and top-down support. For example, it must include at a minimum upskilling of existing business professionals, access to tooling and data, identification of priorities, and strategic roadmap setting.
Go Further: Setting Up for AI Success
In this ebook, discover how to define a successful AI project with a helpful framework for choosing the right use cases (i.e., how will it improve outcomes, how will it be measured, who does it benefit, and more).