You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -20,36 +20,43 @@ See [this document](https://docs.google.com/document/d/1So9uRjZrLUHf7Rjv13xy_ip3
20
20
21
21
The pluggable BBR Framework aims at addressing the following goals
22
22
23
+
### Immediate Goals
24
+
23
25
- Avoid monolithic architecture
24
26
- Mimic pluggability and configurability of the scheduling subsystem without coupling between the two
25
-
- Enable organizing plugins into a topology for ordered and concurrent execution
26
-
- Avoid redundant recurrent body parsing across plugins in a topology for the sake of performance
27
27
- Limit changes to the BBR feature to avoid any changes in the rest of the code base
28
28
- Follow best practices and experience from the Scheduling subsystem
29
29
pluggability effort. For example, extending the system to support the above
30
30
should be through implementing well defined `Plugin` interfaces and registering
31
31
them in the BBR subsystem; any configuration would be done in the
32
32
same way (e.g., code and/or configuration file)
33
33
- Reuse common code from EPP, such as `TypedName`, wherever make sense, but avoid reusing specialized code with non-BBR functionality to avoid abuse
34
+
- Provide reference plugin implementation(s).
35
+
36
+
### Extended Goals
37
+
38
+
- Enable organizing plugins into a topology for sequential and concurrent execution. Note that while BBR stands for Body-Based Routing and this proposal does not aim at general Payload Processing, routing decisions might require pre-processing/postprocessing operations
39
+
- Avoid redundant recurrent body parsing across plugins in a topology for the sake of performance
34
40
- Enable extensible collection and registration of metrics using lessons from the pluggable scheduling sub-system
35
-
- Provide reference plugin implementations.
36
41
37
42
## Non-Goals
38
43
39
44
- Modify existing GIE abstractions
40
45
- Fully align plugins, registries, and factories across BBR and EPP
41
46
- Dynamically reconfigure plugins and plugin topologies at runtime
47
+
- Enable extensibility of the BBRPlugin registration mechanisms in third party extensions
42
48
43
49
## Proposal
44
50
45
51
### Overview
46
52
47
-
There is an embedded `BBRPlugin` interface building on the `Plugin` interface adopted from EPP. This interface should be implemented by any BBR plugin. Each pluigin is identified by its `TypedName` (adopted from EPP), where `TypedName().Type` gives the string representing the type of the plugin and `TypedName().Name()` returns the string representing the plugins implementation. BBR is refactored to implement the registered factory pattern. To that end, a `PluginRegistry` interface and its implementation are added to register `BBRPlugin` factories and concrete implementations created by the factories.
48
-
In addition, a `PluginsChain` interface is defined to define an order of plugin executions. In the future, `PluginsChain` will be replaced by `PluginsDAG` to allow for more complex topological order and concurrency.
53
+
There is an embedded `BBRPlugin` interface building on the `Plugin` interface adopted from EPP. This interface should be implemented by any BBR plugin. Each plugin is identified by its `TypedName` (adopted from EPP), where `TypedName().Type` gives the string representing the type of the plugin and `TypedName().Name()` returns the string representing the plugins implementation. BBR is refactored to implement the registered factory pattern.
54
+
55
+
In addition, as an extended functionality, a `PluginsChain` interface is defined to define an order of plugin executions. In the future, `PluginsChain` might be replaced by `PluginsDAG` to allow for more complex topological order and concurrency.
49
56
50
57
`PluginsChain` only contains ordered `BBRPlugin` types registered in the `PluginRegistry`. `RequestPluginsChain` and `ResponsePluginsChain` are optionally configured for handling requests and responses respectively. If no configuration is provided, default `PluginsChain` instances will be configured automatically.
51
58
52
-
Depending on a `BBRPlugin` functionality and implementation, the plugin might require full or selective body parsing. To save the parsing overhead, if there is at least one `BBRPlugin` in the `PluginsChain` that requires full body parsing, the parsing is performed only once into a shared official appropriate `openai-go` struct (either `openai.CompletionNewParams` or `openai.ChatCompletionNewParams` depending on the request endpoint). This struct is shared for read-only to all plugins in the chain. Each `BBRplugin` receives the shared struct by value. If a plugin needs to mutate the body, in the initial implementation, it MUST work on its own copy, and the a mutated body is returned separately by each plugiin.
59
+
Depending on a `BBRPlugin` functionality and implementation, the plugin might require full or selective body parsing. To save the parsing overhead, if there is at least one `BBRPlugin` in the `PluginsChain` that requires full body parsing, the parsing is performed only once into a shared official appropriate `openai-go` struct (either `openai.CompletionNewParams` or `openai.ChatCompletionNewParams` depending on the request endpoint). This struct is shared for read-only to all plugins in the chain. Each `BBRplugin` receives the shared struct by value. If a plugin needs to mutate the body, in the initial implementation, it MUST work on its own copy, and the a mutated body is returned separately by each plugin.
53
60
54
61
### Suggested Components
55
62
@@ -60,19 +67,16 @@ The sketch of the proposed framework is shown in the figure below.
// ModelHeader is a constant defined in ./pkg/bbr/plugins/interfaces
219
-
h[ModelHeader] = requestBody.Model
107
+
A default plugin instance that sets `X-Gateway-Model-Name` header will always be configured automatically if a specific plugin is not configured. The default plugin will only set the header without body mutation.
220
108
221
-
// Body is not mutated in this implementation hence returning original requestBodyBytes. This is intentional.
Will be done according to this proposal and phased approach detailed in the next section.
229
112
230
113
### Implementation Phases
231
114
232
-
The pluggable framework will be implemented iteratively over several phases.
115
+
The pluggable framework will be implemented iteratively over several phases and a series of small PRs.
233
116
234
-
1. Introduce `BBRPlugin``MetadataExtractor`, interface, registry, plugins chain, sample plugin implementation (`SimpleModelExtraction`) and its factory. Plugin configuration will be implemented via environment variables set in helm chart
235
-
1. Introduce a second plugin interface, `ModelSelector` and sample plugin implementation
236
-
1. Introduce shared struct (shared among the plugins of a plugins chain)
117
+
1. Introduce `BBRPlugin` `MetadataExtractor`, interface, registry, default plugin implementation (`simple-model-selector`) and its factory. Plugin configuration will be implemented via environment variables set in helm chart
118
+
1. Introduce plugins topogy (initially a `PluginsChain`)
119
+
1. Introduce shared struct (shared among the plugins of a plugins chain) to
237
120
1. Introduce an interface for guardrail plugin, introduce simple reference implementation, experiment with plugins chains on request and response messages
238
121
1. Refactor metrics as needed to work with the new pluggable framework
239
122
1. Implement configuration via manifests similar to those in EPP
@@ -243,9 +126,13 @@ The pluggable framework will be implemented iteratively over several phases.
243
126
244
127
## Open Questions
245
128
129
+
1. More elaborate topology definition and execution
246
130
1. More elaborate shared memory architecture for the best performance
131
+
1. Considerations for handling newer OpenAI API
132
+
1. OpenAI API continues to evolve and most recently they added the "responses api" which has some stateful logic in addition to the ChatCompletions endpoint. The design will be extended also to cover the OpenAI Responses API. For example the `PluginsChain` might be extended to provide common utilities to either help with state caching or letting plugins handle that completely.
247
133
1. TBA
248
134
249
135
## Note
250
136
251
-
The proposed interfaces can slightly change from those implemented in the initial [PR 1981](https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1981)
137
+
The proposed interfaces can slightly change from those implemented in the initial [PR 1981](https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1981).
138
+
The initial PR will be refactored into a series of small PRs which should be evaluated in reference to this proposal.
0 commit comments