Chat Your Way to Data Insights with Dataiku Answers

Introduction
Asking questions of your data shouldn’t be hard. With Dataiku Answers, you can unlock the power to explore your data conversationally, turning natural language questions into instant insights. In this blog, we will demonstrate how you can quickly create an AI chatbot for your business users, or lazy techies, to ask questions of their data. Dataiku Answers provides organizations with a repeatable framework for creating and deploying enterprise-ready AI chatbots.
While there are many applications where AI Chatbots can be valuable to the business, one increasingly common use case is for business users to be able to interact with data through natural language. The remainder of this blog will include step-by-step instructions on how to accomplish this in Dataiku.
Video Recording
If you prefer to consume content through video, you can find a video recording of this demonstration here:
Otherwise… happy reading!
Installation
At the time of writing this, Dataiku Answers is installed via our Plugin store. If you’re new to Datiaku, don’t sweat, because this is the best kind of store. The kind where everything is free :). Navigate to the waffle menu in the upper right corner of your Dataiku Application in order to access the Plugin Store.
Any Dataiku user is able to view the Plugin store but only administrators have the ability to install them on your instance. If you do not have this capability, please reach out to your assigned admin to get help. To avoid any compatibility issues, it’s a good idea to make sure your Dataiku version is up to date as well.
Example Project
Once we have successfully installed the Answers plugin, we can immediately begin asking questions of our data. Oftentimes this is referred to as text-to-sql because the questions we ask to our AI chatbot are being converted to sql statements and executed against the available data. Due to this, the data we want to ask questions of must be stored in a SQL database. In our example, we will be working with data stored in Snowflake.
My example project was developed to simulate a loan approval process for a loan advisor. The data contains comprehensive information on loan requests, including financial and credit details of applicants and is designed to assist in evaluating loan applications by providing key metrics such as the amount requested, loan purpose, and applicant's financial status. The flow, shown below, demonstrates the steps performed to prepare the data prior to training a machine learning model. The output data contains each loan applicant’s probably of default, a critical component to approving or rejecting a loan.
In our example, we want to create an AI Chatbot that allows a loan advisor to quickly look up information about a loan applicant or generate summary statistics about a subpopulation. Someone proficient in SQL could write code to do this but we want to provide this capability to our loan advisors and other business users through natural language.
Our first step will be to ensure that we have a metadata enriched dataset. Augmenting our data with additional context is a critical component to ensuring we get accurate results when asking questions of our data. One of my favorite new features from Dataiku is our ability to use AI to quickly and accurately enrich data with descriptions!
As with any AI generated content, we want to be able to validate with our own knowledge. We can quickly and easily modify any of the AI generated descriptions prior to confirming these changes and writing back to the dataset schema. This saves hours of time when generating metadata across numerous datasets. In reality, this level documentation simply wasn’t being done, resulting in poor natural language interactions, duplication of data assets, and lack of collaboration across teams.
Dataiku Answers Setup
Now that we have enriched our dataset with our business context and additional documentation, we are ready to begin building our AI Chatbot. By navigating to our Webapps, we can create a new Visual Webapp. If you followed the previous steps, you should see Answers as one of your options!
Welcome to Dataiku Answers! There are many different settings and configurations options available for customizing your AI Chatbots but we will focus on a few just to get us started. First, we need to choose our main LLM. Assuming we have an LLM connection configured, we will see it available in the drop down. Dataiku Answers is plugged directly into the LLM Mesh, meaning you get the full flexibility and choice of LLM to ensure you’re using the best one for the job. In my example, I will be leveraging Open AI’s GPT-4o. Also, notice that you need to create two datasets within your SQL data warehouse for storing chat history and profile information. My selections were as follows.
The next step is the most important for our desired use case. We must augment our AI chatbot with the dataset we previously enriched. In order to do this, we must set the retrieval method to “use dataset retrieval (for specific answers from a table)”.
From here, we will provide the AI Chatbot with the data we want to ask questions of. In our example, we are only leveraging a single dataset but this is not a limitation of the software. We will point to our previously enriched dataset as shown below.
As previously mentioned, there are many more configuration changes and adjustments we can make to our AI Chatbot but the last we will highlight in this discussion is the toggle for displaying our SQL code as a source for our answer. This toggle is shown by selecting “Manage advanced dataset retrieval settings for the knowledge retrieval” and scrolling to the bottom. I personally enable this because it allows me to validate the accuracy of the responses by confirming the SQL code that was used to generate the output.
Dataiku Answers
We can now begin asking questions of our data by moving over to our View panel.
Here are some example questions that I was able to ask our newly created AI Chatbot. Each of these generated quick and accurate responses to my questions! I asked a few simple lookup questions but also a couple of questions that required aggregate calculations to be performed across rows. We are able to validate their accuracy by exploring the data visually within Dataiku.
Question:
Validation:
Question:
Validation:
Question:
Validation:
Conclusion
As mentioned throughout this blog, we are simply scratching the surface of what is possible. I encourage you to explore and experiment with the various settings within Dataiku Answers. You can even apply your corporate logo and themes to make the AI Chatbot match your corporate color palette. Please feel free to leave a comment if you have questions or comments!