|
| 1 | +# Pluggable Body-Based Routing (BBR) Framework |
| 2 | + |
| 3 | +Author(s): @davidbreitgand @srampal |
| 4 | + |
| 5 | +## Proposal Status |
| 6 | + |
| 7 | +***Draft*** |
| 8 | + |
| 9 | +## Summary |
| 10 | + |
| 11 | +The Gateway API Inference Extension (v1.2.1) includes an initial implementation of Body-Based Routing (BBR). Currently, BBR provides a single capability: it extracts the model name from the request body and adds it to the `X-Gateway-Model-Name` header. This header is then used to route the request to the appropriate InferencePool and its associated Endpoint Picker Extension (EPP) instances. |
| 12 | + |
| 13 | +The current BBR implementation is limited and lacks extensibility. Similar to the [pluggability introduced in the scheduling subsystem](../0845-scheduler-architecture-proposal/README.md), BBR should support custom extensions without requiring modifications to the GIE code base. |
| 14 | + |
| 15 | +This proposal introduces a plugin architecture for BBR that allows developers to implement custom logic. Plugins could be organized into a chain or DAG for ordered and concurrent execution. |
| 16 | + |
| 17 | +See [this document](https://docs.google.com/document/d/1So9uRjZrLUHf7Rjv13xy_ip3_5HSI1cn1stS3EsXLWg/edit?tab=t.0#heading=h.55jwocr94axs) for additional context amd reference. |
| 18 | + |
| 19 | +## Goals |
| 20 | + |
| 21 | +The pluggable BBR Framework aims at addressing the following goals |
| 22 | + |
| 23 | +### Immediate Goals |
| 24 | + |
| 25 | +- Avoid monolithic architecture |
| 26 | +- Mimic pluggability and configurability of the scheduling subsystem without coupling between the two |
| 27 | +- Limit changes to the BBR feature to avoid any changes in the rest of the code base |
| 28 | +- Follow best practices and experience from the Scheduling subsystem |
| 29 | + pluggability effort. For example, extending the system to support the above |
| 30 | + should be through implementing well defined `Plugin` interfaces and registering |
| 31 | + them in the BBR subsystem; any configuration would be done in the |
| 32 | + same way (e.g., code and/or configuration file) |
| 33 | +- Reuse common code from EPP, such as `TypedName`, wherever make sense, but avoid reusing specialized code with non-BBR functionality to avoid abuse |
| 34 | +- Provide reference plugin implementation(s). |
| 35 | + |
| 36 | +### Extended Goals |
| 37 | + |
| 38 | +- Enable organizing plugins into a topology for sequential and concurrent execution. Note that while BBR stands for Body-Based Routing and this proposal does not aim at general Payload Processing, routing decisions might require pre-processing/postprocessing operations |
| 39 | +- Avoid redundant recurrent body parsing across plugins in a topology for the sake of performance |
| 40 | +- Enable extensible collection and registration of metrics using lessons from the pluggable scheduling sub-system |
| 41 | + |
| 42 | +## Non-Goals |
| 43 | + |
| 44 | +- Modify existing GIE abstractions |
| 45 | +- Fully align plugins, registries, and factories across BBR and EPP |
| 46 | +- Dynamically reconfigure plugins and plugin topologies at runtime |
| 47 | +- Enable extensibility of the BBRPlugin registration mechanisms in third party extensions |
| 48 | + |
| 49 | +## Proposal |
| 50 | + |
| 51 | +### Overview |
| 52 | + |
| 53 | +There is an embedded `BBRPlugin` interface building on the `Plugin` interface adopted from EPP. This interface should be implemented by any BBR plugin. Each plugin is identified by its `TypedName` (adopted from EPP), where `TypedName().Type` gives the string representing the type of the plugin and `TypedName().Name()` returns the string representing the plugins implementation. BBR is refactored to implement the registered factory pattern. |
| 54 | + |
| 55 | +In addition, as an extended functionality, a `PluginsChain` interface is defined to define an order of plugin executions. In the future, `PluginsChain` might be replaced by `PluginsDAG` to allow for more complex topological order and concurrency. |
| 56 | + |
| 57 | +`PluginsChain` only contains ordered `BBRPlugin` types registered in the `PluginRegistry`. `RequestPluginsChain` and `ResponsePluginsChain` are optionally configured for handling requests and responses respectively. If no configuration is provided, default `PluginsChain` instances will be configured automatically. |
| 58 | + |
| 59 | +Depending on a `BBRPlugin` functionality and implementation, the plugin might require full or selective body parsing. To save the parsing overhead, if there is at least one `BBRPlugin` in the `PluginsChain` that requires full body parsing, the parsing is performed only once into a shared official appropriate `openai-go` struct (either `openai.CompletionNewParams` or `openai.ChatCompletionNewParams` depending on the request endpoint). This struct is shared for read-only to all plugins in the chain. Each `BBRplugin` receives the shared struct by value. If a plugin needs to mutate the body, in the initial implementation, it MUST work on its own copy, and the a mutated body is returned separately by each plugin. |
| 60 | + |
| 61 | +Even simple BBR plugin implementations can considerably differ in their performance w.r.t. to latency and memory. This justifies different implementations of BBR Plugins in different contexts. |
| 62 | + |
| 63 | + |
| 64 | + |
| 65 | +[The benchmark details and code can be found here](https://github.com/davidbreitgand/scripts/tree/main/benchmarks). |
| 66 | + |
| 67 | +### Suggested Components |
| 68 | + |
| 69 | +The sketch of the proposed framework is shown in the figure below. |
| 70 | + |
| 71 | + |
| 72 | +### Suggested BBR Pluggable Framework Interfaces |
| 73 | + |
| 74 | +```go |
| 75 | +// ------------------------------------ Defaults ------------------------------------------ |
| 76 | + |
| 77 | +const DefaultPluginType = "MetadataExtractor" |
| 78 | +const DefaultPluginImplementation = "simple-model-selector" |
| 79 | + |
| 80 | +// BBRPlugin defines the interface for plugins in the BBR framework |
| 81 | +type BBRPlugin interface { |
| 82 | + plugins.Plugin |
| 83 | + |
| 84 | + // Execute runs the plugin logic on the request body. |
| 85 | + // A plugin's implementation logic CAN mutate the body of the message. |
| 86 | + // A plugin's implementation MUST return a map of headers |
| 87 | + // If no headers are set by the implementation, the map must be empty |
| 88 | + // A value of a header in an extended implementation NEED NOT to be identical to the value of that same header as would be set |
| 89 | + // in a default implementation. |
| 90 | + // Example: in the body of a request model is set to "semantic-model-selector", |
| 91 | + // which, say, stands for "select a best model for this request at minimal cost" |
| 92 | + // A plugin implementation of "semantic-model-selector" sets X-Gateway-Model-Name to any valid |
| 93 | + // model name from the inventory of the backend models and also mutates the body accordingly |
| 94 | + |
| 95 | + Execute(requestBodyBytes []byte) (headers map[string]string, mutatedBodyBytes []byte, err error) |
| 96 | +} |
| 97 | + |
| 98 | + |
| 99 | +// NeedsFullParsing is an optional capability interface. |
| 100 | +// Plugins that require full body parsing implement this marker method. |
| 101 | +// The method has no return value; presence of the method is the signal. |
| 102 | +type NeedsFullParsing interface { |
| 103 | + FullParsingNeeded(){} |
| 104 | +} |
| 105 | + |
| 106 | +// placeholder for BBRPlugin constructors |
| 107 | +// Concrete constructors are assigned to this type |
| 108 | + |
| 109 | +type PluginFactoryFunc func() (bbrplugins.BBRPlugin, error) |
| 110 | + |
| 111 | +### Defaults |
| 112 | + |
| 113 | +A default plugin instance that sets `X-Gateway-Model-Name` header will always be configured automatically if a specific plugin is not configured. The default plugin will only set the header without body mutation. |
| 114 | + |
| 115 | +### Current BBR reimplementation as BBRPlugin |
| 116 | + |
| 117 | +Will be done according to this proposal and phased approach detailed in the next section. |
| 118 | + |
| 119 | +### Implementation Phases |
| 120 | + |
| 121 | +The pluggable framework will be implemented iteratively over several phases and a series of small PRs. |
| 122 | + |
| 123 | +1. Introduce `BBRPlugin` `MetadataExtractor`, interface, registry, default plugin implementation (`simple-model-selector`) and its factory. Plugin configuration will be implemented via environment variables set in helm chart |
| 124 | +1. Introduce plugins topogy (initially a `PluginsChain`) |
| 125 | +1. Introduce shared struct (shared among the plugins of a plugins chain) to |
| 126 | +1. Introduce an interface for guardrail plugin, introduce simple reference implementation, experiment with plugins chains on request and response messages |
| 127 | +1. Refactor metrics as needed to work with the new pluggable framework |
| 128 | +1. Implement configuration via manifests similar to those in EPP |
| 129 | +1. Implement `PluginsDAG` to allow for more complex topological order and concurrency. |
| 130 | +1. Continously learn lessons from this implementation and scheduling framework to improve the implementation |
| 131 | +1. Aim at aligning and cross-polination with the [AI GW WG]("https://github.com/kubernetes-sigs/wg-ai-gateway"). |
| 132 | + |
| 133 | +## Open Questions |
| 134 | + |
| 135 | +1. More elaborate topology definition and execution |
| 136 | +1. More elaborate shared memory architecture for the best performance |
| 137 | +1. Considerations for handling newer OpenAI API |
| 138 | +1. OpenAI API continues to evolve and most recently they added the "responses api" which has some stateful logic in addition to the ChatCompletions endpoint. The design will be extended also to cover the OpenAI Responses API. For example the `PluginsChain` might be extended to provide common utilities to either help with state caching or letting plugins handle that completely. |
| 139 | +1. TBA |
| 140 | + |
| 141 | +## Note 1 |
| 142 | + |
| 143 | +The proposed interfaces can slightly change from those implemented in the [initial PR 1981]("https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1981"). |
| 144 | +The initial PR will be refactored into a series of small PRs which should be evaluated in reference to this proposal. |
0 commit comments