Using large context for a Gen AI prompt

joseangel
joseangel Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1

Hi,

I'm trying to create a prompt to ask questions to a LLM and get an answer based on 5,000 reviews for a product. I know there are ways to classify or perform sentiment analysis, but what I want to do is to ask an LLM a question about the whole bunch of reviews.

I tried using RAG, but it is my understanding the this method retrieves some chunks of information to feed the LLM, but I need the LLM to be informed about the whole set of reviews.

Finally, I've tried to to join all the rows of reviews into only one, using the Group recipe, so that using Prompt Studios I can ask the LLM about that row, but I get an error that the input limit has been reached based on the number of tokens.

Does somebody know how I can achieve to get the LLM to use all the information so that it can answer my question by any means?

Thanks in advance

Answers

  • AdrienL
    AdrienL Dataiker, Alpha Tester Posts: 196 Dataiker

    For a LLM to take into account all of the reviews in a single prompt, you would need to leverage a LLM with a very large context window, that can accommodate all of the concatenated text. But that is currently not easy for such a large amount of text.

    Among the large context window models out there, you can try Claude 3 Opus (200k tokens IIRC) or GPT-4o (128k). But even a 200k context window would only fit 40 tokens per reviews since you need 5k of them. So maybe 15-20 English words per review, which seems very short.

    If this is not enough, you can try to first use a text summarization recipe to summarize the reviews, one by one, before concatenation. Or you can try a prompt recipe (still row by row before concatenation), asking the LLM to yield just a couple key words…

  • Zoey
    Zoey Registered Posts: 1

    Trying to fit 5K reviews into one prompt is a tough ask with the token limits on most LLMs, even the ones with big context windows. What’s worked for me in a similar situation is breaking the reviews into smaller chunks and summarizing them first. Once you’ve got summaries, you can combine those and feed them to the LLM—it’s way easier for it to handle.


    If you’re dealing with a lot of data like this, an AI Content Platform can help too. These tools make it simpler to sort, summarize, and prep the info so the LLM can focus on giving you good answers.

Setup Info
    Tags
      Help me…