-
Notifications
You must be signed in to change notification settings - Fork 219
Description
Preamble
Currently EPP operates in an active-passive mode, many of us are searching for a way to enable active-active. The biggest holdback is the Approximate Prefix Cache, as it keeps state purely in mem. Two independent analysis of these results have been done:
Additionally, investigation is being done by llm-d to determine the feasibility of llm-d's precise prefix cache scorer's ability to distribute its data across multiple replicas (allowing for active-active). Should this prove fruitful, the default EPP algo can provide an active-active strategy, allowing for a much more resilient system.
Task
This issue is to track the effort of documentation. We should document the above exploration so that a user looking to implement their own plugin can build off our efforts, determine if their plugin is active-active compliant, and ways to make it active-active compliant. Essentially, providing guidance and our experience to a user such that they do not need to explore the space as we have.