Skip to content

Document HA Active-Active/Active-Passive Guidance #2102

@kfswain

Description

@kfswain

Preamble

Currently EPP operates in an active-passive mode, many of us are searching for a way to enable active-active. The biggest holdback is the Approximate Prefix Cache, as it keeps state purely in mem. Two independent analysis of these results have been done:

Additionally, investigation is being done by llm-d to determine the feasibility of llm-d's precise prefix cache scorer's ability to distribute its data across multiple replicas (allowing for active-active). Should this prove fruitful, the default EPP algo can provide an active-active strategy, allowing for a much more resilient system.

Task

This issue is to track the effort of documentation. We should document the above exploration so that a user looking to implement their own plugin can build off our efforts, determine if their plugin is active-active compliant, and ways to make it active-active compliant. Essentially, providing guidance and our experience to a user such that they do not need to explore the space as we have.

Metadata

Metadata

Assignees

No one assigned

    Labels

    triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions