Agents used by many concurrent users
If we develop a multi-agent application, how does that perform with say 100 users all asking questions at the same time?
I notice with some experimental webapps that the first use of an agent can take considerably longer than subsequent uses. I would infer that there is a backend agent server process that lives for a while waiting for other callers.
Is this so, and does that mean that agents scale well with many concurrent users?
In terms of scalability and concurrency, how does a bespoke project webapp vs Agent Connect,/Answers/Hub compare (assuming all is on the backend, not containerized) ?
Operating system used: RHEL
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,381 DataikerHi John,
That's correct, a call to an agent will start a new "LLM" kernel, hence the "cold start" observations. These kernels can be reused by concurrent or new requests and have a default TTL of 10 minutes.
If these are all local, with no containers, the start-up should be pretty fast but can be slower than subsequent requests.These will scale horizontally to cater to more concurrent requests if there are more requests than the existing kernels can handle. The default maximum requests per kernel is 16.
While these parameters can be configured using the default, they should cater to the vast majority of cases.
If you encounter specific issues during testing, please submit details via a support ticket.
Generally speaking, the Answers/Agent Connect/Hub scale well with default settings. Keep in mind the recommended settings : to start these notably "number of Processes must always be set to 0".Thanks