Dataiku Webapp "Answers" not working with Google Cloud LLMs

Jorge Carlos · September 2024

Hi, I am currently testing the feature of the LLMs in Dataiku DSS and I am using the visual Webapp "Answers" to test some models. I created the connection to Google Vertex AI and loaded the Gemini Pro and the Bison LLMs, the connection seems to work but when I test it in the Webapp it does not load.

I am using the dataset retrieval setting and whenever I ask a question such as "which product has the most sales?" The webapp starts working but never displays a response, it just stays in "Querying LLM…" and nothing happens.

I have been checking the backend logs and it looks like it gives a response, but somehow it is not displayed in the webapp.

Backend logs:

Human:         Evaluate the following human query and SQL Table Description and Schema to determine if a SQL queries should be generated.

        Generate the response as a JSON object following the JSON query format using the main JSON keys only.

        Provide a JSON object only. Do not add any other explanations, decorations, or additional information beyond the JSON object.

        # SQL Table Description and Schema

Name: inventory_of_stores

SQL Database type: PostgreSQL

Datasource description: None

## Columns description

- Name: 'date' | Type: 'date' | Description: 'Date when the sale was done, this represents a day in which a certain product was sold in the specific store. Store in datetime ISO8601 format (ex. 2023-05-10T00:00:00.000Z which represents May 10th, 2023).'

- Name: 'store_id' | Type: 'bigint' | Description: 'Unique identifier of the store where the sale was done'

- Name: 'product_name' | Type: 'string' | Description: 'The name of a product. (ex. ‘FERRERO  CHOCOLATE  ROCHER T4X10 [12.5 Grs]’)'

- Name: 'unit_price' | Type: 'double' | Description: 'Unit price of the product. It represent a currency amount.'

- Name: 'sales_amount' | Type: 'double' | Description: 'Quantity of products sold in the day on the specific store. It represent integer values.'

- Name: 'sales' | Type: 'double' | Description: 'Total profit of the sale, is the result pf multiplying unit_price and sales_amount. It represents a currency amount'

- Name: 'stock' | Type: 'bigint' | Description: 'Quantity of products in-stock in the specific store and date.'

- Name: 'market_description' | Type: 'string' | Description: 'General market to which the product belongs . (e.g. 'Dulces')'

- Name: 'division_description' | Type: 'string' | Description: 'Category to which the product belongs. (e.g 'Confiteria')'

- Name: 'category_description' | Type: 'string' | Description: 'Division to which the product belongs . (e.g. 'Paleta')'

- Name: 'brand' | Type: 'string' | Description: 'Brand to which the product belongs. (e.g. 'CORONADO')'

- Name: 'unit_measure' | Type: 'string' | Description: 'Unit of measure to which the product belongs. (e.g. Grs)'

-- End of Description --


        # CHAT HISTORY:

        []

        # --- END OF CHAT HISTORY ---

        human query: which product has the most sales?

        your JSON object:


2024-09-12 19:55:59,275 - llm_assist.logging - DEBUG - Creating initial query prompt ...

2024-09-12 19:55:59,277 INFO Executing completion on DKULLM with settings: {'temperature': 0, 'maxOutputTokens': 2048}

2024-09-12 19:56:01,758 - llm_assist.logging - DEBUG - Generated response for DB query: ``` json

{'selectList': [{'expr': {'type': 'COLUMN', 'name ': 'product_name'}}, {'expr': {'type': 'OPERATOR',  'op': 'SUM', 'args': [{'type': 'COLUMN', 'name': 'sales'}]}, 'alias': 'total_sales'}], 'where ': null, 'groupBy': [{'expr': {'type': 'COLUMN', 'name': 'product_name'}}], 'having': null, 'orderBy ': [{'expr': {'type': 'COLUMN', 'name': 'total_sales'}, 'orderType': 'DESC'}], 'limit': 1, 'justification': 'To find the product with the most sales, we need to  group the sales by product name, sum the sales for each product, and then order the results in descending order of total sales. The limit clause is used to return only the product with the highest total sales.'}

``` 

2024-09-12 19:56:01,758 - llm_assist.logging - DEBUG - get_retrieval_query performed for db_query with: [which product has the most sales?], computed db_query is [None]

2024-09-12 19:56:01,758 - llm_assist.logging - DEBUG - use_db_retrieval: False

2024-09-12 19:56:01,770 - llm_assist.logging - DEBUG - Question language is English with confidence value of 0.71

2024-09-12 19:56:01,771 - llm_assist.logging - DEBUG - ([StringPromptValue(text="Each row of the dataset consist of a record per day of each individual product and the total quantity that was sold by store that day. Some questions may require you to respond to which product or products were the most sold on a time range, remember to group by dates in that interval and order by sales (money-related) or by sales_amount (total quantity of products) date related questions.\n\n        To perform the following queries, use these guidelines:\n\n       1. When the user asks you for a product, they are referring to 'product_name'.\n       2. If the user asks for anything involving best selling, they are probably expecting for you to find the product / store / category_description / brand and analysis should be made to look into the best of those categories. These analysis could be like the following guidelines:\n             2.1 If the use asks for the best selling product, the are expecting you to perform an aggregation in 'sales' using the sum operator and assigning it the alias 'total', then Grouping by the condition the user requested and, finally, sorting the query using the computed 'total'.\n             2.2. If the user asks for the best selling store, they are expecting and analysis similar as the previous guideline, aggregate 'sales' with sum and alias 'total', groupBy 'store' and sortBy 'total'.\n       3. If the user wants to know something in any date, remember that each record is a representation of the sales of a product in each store available for a certain day, consider the following guidelines to give a better response:\n             3.1 If the user asks for the best selling product in a given month, they expect an analysis similar to guideline 2.1, but it must include a WHERE clause using a comparison in DATE_FORMAT to retrieve only the information of the given month, for weeks you could use YEARWEEK, also for any other date range a similar analysis should be made.\n             3.2 If the user asks for the month with the most sales they expect an analysis in which you made an aggregation in 'date' column using an operator like DATE_FORMAT or YEARWEEK or any other similar for other date ranges alias 'date_range', also you must inlcude an aggregation in 'sales' using sum operator with alias 'total', then grouping by the computed 'date_range' and ordering by the 'total'.\n\nRemember to use a similar analysis for queries that are similar.\n\n        Be aware that you are part of a team that can retrieve information from external data sources. \n        The data source description is;\n        # SQL Table Description and Schema\nName: inventory_of_stores\nSQL Database type: PostgreSQL\nDatasource description: None\n\n## Columns description\n- Name: 'date' | Type: 'date' | Description: 'Date when the sale was done, this represents a day in which a certain product was sold in the specific store. Store in datetime ISO8601 format (ex. 2023-05-10T00:00:00.000Z which represents May 10th, 2023).'\n- Name: 'store_id' | Type: 'bigint' | Description: 'Unique identifier of the store where the sale was done'\n- Name: 'product_name' | Type: 'string' | Description: 'The name of a product. (ex. ‘FERRERO  CHOCOLATE  ROCHER T4X10 [12.5 Grs]’)'\n- Name: 'unit_price' | Type: 'double' | Description: 'Unit price of the product. It represent a currency amount.'\n- Name: 'sales_amount' | Type: 'double' | Description: 'Quantity of products sold in the day on the specific store. It represent integer values.'\n- Name: 'sales' | Type: 'double' | Description: 'Total profit of the sale, is the result pf multiplying unit_price and sales_amount. It represents a currency amount'\n- Name: 'stock' | Type: 'bigint' | Description: 'Quantity of products in-stock in the specific store and date.'\n- Name: 'market_description' | Type: 'string' | Description: 'General market to which the product belongs . (e.g. 'Dulces')'\n- Name: 'division_description' | Type: 'string' | Description: 'Category to which the product belongs. (e.g 'Confiteria')'\n- Name: 'category_description' | Type: 'string' | Description: 'Division to which the product belongs . (e.g. 'Paleta')'\n- Name: 'brand' | Type: 'string' | Description: 'Brand to which the product belongs. (e.g. 'CORONADO')'\n- Name: 'unit_measure' | Type: 'string' | Description: 'Unit of measure to which the product belongs. (e.g. Grs)'\n-- End of Description --\n\n        However, the your team mate decided not make any query for the following reason.\n        \n        You do not need to mention this unless you are asked directly about it.\n        - Do not attempt to write any SQL\n        --- End of justification ---\n\n\n        Current conversation:\n        []\n        --- End of Current conversation ---\n\n        Human: which product has the most sales? \n It is essential that you reply to my question in English. Do not mention this.\n        Assistant:")], None)

2024-09-12 19:56:01,771 - llm_assist.logging - DEBUG - Time ===> taken by Computing prompt for conversational: 0.00 secs

2024-09-12 19:56:01,771 - llm_assist.logging - DEBUG - Final prompt:  Each row of the dataset consist of a record per day of each individual product and the total quantity that was sold by store that day. Some questions may require you to respond to which product or products were the most sold on a time range, remember to group by dates in that interval and order by sales (money-related) or by sales_amount (total quantity of products) date related questions.

        To perform the following queries, use these guidelines:

       1. When the user asks you for a product, they are referring to 'product_name'.

       2. If the user asks for anything involving best selling, they are probably expecting for you to find the product / store / category_description / brand and analysis should be made to look into the best of those categories. These analysis could be like the following guidelines:

             2.1 If the use asks for the best selling product, the are expecting you to perform an aggregation in 'sales' using the sum operator and assigning it the alias 'total', then Grouping by the condition the user requested and, finally, sorting the query using the computed 'total'.

             2.2. If the user asks for the best selling store, they are expecting and analysis similar as the previous guideline, aggregate 'sales' with sum and alias 'total', groupBy 'store' and sortBy 'total'.

       3. If the user wants to know something in any date, remember that each record is a representation of the sales of a product in each store available for a certain day, consider the following guidelines to give a better response:

             3.1 If the user asks for the best selling product in a given month, they expect an analysis similar to guideline 2.1, but it must include a WHERE clause using a comparison in DATE_FORMAT to retrieve only the information of the given month, for weeks you could use YEARWEEK, also for any other date range a similar analysis should be made.

             3.2 If the user asks for the month with the most sales they expect an analysis in which you made an aggregation in 'date' column using an operator like DATE_FORMAT or YEARWEEK or any other similar for other date ranges alias 'date_range', also you must inlcude an aggregation in 'sales' using sum operator with alias 'total', then grouping by the computed 'date_range' and ordering by the 'total'.

Remember to use a similar analysis for queries that are similar.

        Be aware that you are part of a team that can retrieve information from external data sources. 

        The data source description is;

        # SQL Table Description and Schema

Name: inventory_of_stores

SQL Database type: PostgreSQL

Datasource description: None

## Columns description

- Name: 'date' | Type: 'date' | Description: 'Date when the sale was done, this represents a day in which a certain product was sold in the specific store. Store in datetime ISO8601 format (ex. 2023-05-10T00:00:00.000Z which represents May 10th, 2023).'

- Name: 'store_id' | Type: 'bigint' | Description: 'Unique identifier of the store where the sale was done'

- Name: 'product_name' | Type: 'string' | Description: 'The name of a product. (ex. ‘FERRERO  CHOCOLATE  ROCHER T4X10 [12.5 Grs]’)'

- Name: 'unit_price' | Type: 'double' | Description: 'Unit price of the product. It represent a currency amount.'

- Name: 'sales_amount' | Type: 'double' | Description: 'Quantity of products sold in the day on the specific store. It represent integer values.'

- Name: 'sales' | Type: 'double' | Description: 'Total profit of the sale, is the result pf multiplying unit_price and sales_amount. It represents a currency amount'

- Name: 'stock' | Type: 'bigint' | Description: 'Quantity of products in-stock in the specific store and date.'

- Name: 'market_description' | Type: 'string' | Description: 'General market to which the product belongs . (e.g. 'Dulces')'

- Name: 'division_description' | Type: 'string' | Description: 'Category to which the product belongs. (e.g 'Confiteria')'

- Name: 'category_description' | Type: 'string' | Description: 'Division to which the product belongs . (e.g. 'Paleta')'

- Name: 'brand' | Type: 'string' | Description: 'Brand to which the product belongs. (e.g. 'CORONADO')'

- Name: 'unit_measure' | Type: 'string' | Description: 'Unit of measure to which the product belongs. (e.g. Grs)'

-- End of Description --

        However, the your team mate decided not make any query for the following reason.


        You do not need to mention this unless you are asked directly about it.

        - Do not attempt to write any SQL

        --- End of justification ---

        Current conversation:

        []

        --- End of Current conversation ---

        Human: which product has the most sales? 

 It is essential that you reply to my question in English. Do not mention this.

        Assistant:

2024-09-12 19:56:01,772 - llm_assist.logging - DEBUG - {'step': <LLM_STEP.QUERING_LLM_WITHOUT_RETRIEVAL: 7>}

The webapp looks like this:

It happens with both Bison and Gemini Pro, hope you can help me.

Operating system used: Ubuntu

Alexandru · September 2024

Can you confirm version of the LLM Answers plugin and the DSS version you currently using?

Jorge Carlos · September 2024

https://community.dataiku.com/discussion/comment/44497#Comment_44497

My DSS version is 12.6.5, for the LLM Answers I am not using a plugin, just using the connection to Vertex AI in Administration>Connections and then using the default visual Webapp that is displayed when I click on New Webapp>Visual

Dataiku Webapp "Answers" not working with Google Cloud LLMs

Setup Info

Answers

Welcome!

Welcome!

Quick Links

Categories

Sign up to take part

Dataiku Webapp "Answers" not working with Google Cloud LLMs

Setup Info

Answers

Welcome!

Welcome!

Quick Links

Categories