Commit fd95218
authored
[feat] Introduce high-level key-value (KV) interface (#28)
## Summary
This PR introduces a **High-Level Key-Value (KV) Interface** to
TransferQueue, offering a Redis-style API that can enjoy most of the
advanced features provided by TransferQueue.
## Background
In previous versions of TransferQueue, the learning curve was relatively
sharp for new users. To perform basic operations, users had to:
1. Understand `BatchMeta` `SampleMeta` and `FieldMeta` design (as
illustrated in
[tutorial/02_metadat_concepts.py](https://github.com/Ascend/TransferQueue/blob/main/tutorial/02_metadata_concepts.py)
2. Navigate the flexible but complex
[`TransferQueueClient`](https://github.com/Ascend/TransferQueue/blob/main/transfer_queue/client.py)
API.
Although PR #26 simplified
the initialization process, the core interaction still required exposing
low-level details. This PR bridges that gap by providing a familiar,
easy-to-use KV abstraction.
## TransferQueue API Architecture
With this PR, TransferQueue now supports a two-level API architecture to
satisfy different user needs.
| Level | Tier | Style | Fine-Grained Access | Streaming | Sampler |
Multiple-Backends |
|---|---|---|---|---|---|---|
| High | **KV Interface** (this PR) | Put/Get/List/Clear | ✓ | ○ | ✗ | ✓
|
| High | **StreamingDataLoader** (#23) | PyTorch DataLoader | ✓ |✓ | ✓ |
✓ |
| Low | **TransferQueueClient** | Metadata-based | ✓ | ✓ | ✓ | ✓ |
### High-Level API
#### Key-Value based API (This PR)
**Methods**
- **(async_)kv_put**: Insert/Update a multi-column sample by key, with
optional metadata tag
- **(async_)kv_batch_put**: Put multiple key-value pairs efficiently in
batch
- **(async_)kv_batch_get**: Retrieve samples (by keys), supporting
column selection (by fields)
- **(async_)kv_list**: List keys and tags (metadata) in a partition
- **(async_)kv_clear**: Remove key-value pairs from storage
**Key Features**
- **Redis-style Semantics**: Familiar KV interface (Put/Get/List) for
zero learning curve
- **Fine-grained Access**: Update or retrieve specific fields (columns)
within a key (row) without full op.
- **Partition Isolation**: Logical separation of storage namespaces
- **Metadata Tags**: Lightweight metadata for status tracking
- **Pluggable Backends**: Supports multiple backends
#### StreamingDataLoader API
Refer to our [RoadMap](#1)
and related PRs(#23).
The usage example can be found in
[tutorial/06_streaming_dataloader.py](https://github.com/Ascend/TransferQueue/blob/main/tutorial/06_streaming_dataloader.py).
### Low-Level API
Directly manipulate the `TransferQueueClient`. Refer to
[tutorial/03_metadata_concepts.py](https://github.com/Ascend/TransferQueue/blob/main/tutorial/03_metadata_concepts.py),
[tutorial/04_understanding_controller.py](https://github.com/Ascend/TransferQueue/blob/main/tutorial/04_understanding_controller.py)
and
[tutorial/05_custom_sampler.py](https://github.com/Ascend/TransferQueue/blob/main/tutorial/05_custom_sampler.py)
for details.
## Usage Example
Please refer to
[tutorial/02_kv_interface.py](https://github.com/Ascend/TransferQueue/blob/main/tutorial/02_kv_interface.py)
and
[tests/e2e/test_kv_interface_e2e.py](https://github.com/Ascend/TransferQueue/blob/main/tests/e2e/test_kv_interface_e2e.py)
for details.
```python3
import torch
from tensordict import TensorDict
import transfer_queue as tq
# initialize TQ
tq.init()
# prepare data
batch_input_ids = torch.tensor(
[
[4, 5, 6],
[7, 8, 9],
[10, 11, 12],
[13, 14, 15],
]
)
batch_attention_mask = torch.ones_like(batch_input_ids)
data_batch = TensorDict(
{
"input_ids": batch_input_ids,
"attention_mask": batch_attention_mask,
},
batch_size=batch_input_ids.size(0),
)
keys = ["1_0", "1_1", "1_2", "2_0"] # 4 keys for 4 samples
tags = [{"global_steps": 1, "status": "running", "model_version": 1} for _ in range(len(keys))]
partition_id = "test"
# use kv interface to put into TQ
tq.kv_batch_put(keys=keys, partition_id=partition_id, fields=data_batch, tags=tags)
# list all keys and tags
all_keys, all_tags = tq.kv_list(partition_id=partition_id)
for k, t in zip(all_keys, all_tags, strict=False):
print(f" - key='{k}' | tag={t}")
# retrieve all data
retrieved_all = tq.kv_batch_get(keys=all_keys, partition_id=partition_id)
print(f" Fields: {list(retrieved_all.keys())}")
```
## Use Cases & Limitations
**Best For**:
- Scenarios requiring fine-grained data access (e.g., updating a reward
score for a specific prompt).
- Integration with external ReplayBuffers or Single-Controller
architectures that manage sample dispatching logic.
**Limitations (vs. Streaming/Low-level APIs):**
- No built-in production/consumption tracking: Users must manually check
status via tags or manage logic externally.
- No Built-in Sampler: Must implement data dispatch by ReplayBuffer or
single-controller externally.
- Not Fully Streaming: Consumers typically wait for a controller to
dispatch `keys` before fetching, rather than a continuous stream.
---------
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>1 parent 7c9e970 commit fd95218
23 files changed
Lines changed: 3117 additions & 865 deletions
File tree
- .github/workflows
- tests
- e2e
- transfer_queue
- storage/managers
- utils
- tutorial
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| 39 | + | |
39 | 40 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
| 34 | + | |
| 35 | + | |
35 | 36 | | |
36 | 37 | | |
37 | 38 | | |
| |||
91 | 92 | | |
92 | 93 | | |
93 | 94 | | |
94 | | - | |
95 | | - | |
| 95 | + | |
96 | 96 | | |
97 | | - | |
98 | | - | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
99 | 102 | | |
100 | | - | |
101 | 103 | | |
102 | | - | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
103 | 125 | | |
104 | | - | |
| 126 | + | |
105 | 127 | | |
106 | | - | |
| 128 | + | |
| 129 | + | |
107 | 130 | | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | | - | |
| 131 | + | |
112 | 132 | | |
113 | | - | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
114 | 144 | | |
115 | 145 | | |
116 | 146 | | |
| |||
131 | 161 | | |
132 | 162 | | |
133 | 163 | | |
134 | | - | |
| 164 | + | |
135 | 165 | | |
136 | 166 | | |
137 | 167 | | |
| |||
186 | 216 | | |
187 | 217 | | |
188 | 218 | | |
189 | | - | |
| 219 | + | |
190 | 220 | | |
191 | 221 | | |
192 | 222 | | |
| |||
250 | 280 | | |
251 | 281 | | |
252 | 282 | | |
253 | | - | |
| 283 | + | |
254 | 284 | | |
255 | 285 | | |
256 | 286 | | |
| |||
299 | 329 | | |
300 | 330 | | |
301 | 331 | | |
302 | | - | |
303 | | - | |
304 | | - | |
305 | | - | |
306 | | - | |
307 | | - | |
308 | | - | |
309 | | - | |
310 | | - | |
311 | | - | |
312 | | - | |
313 | | - | |
314 | | - | |
315 | | - | |
316 | | - | |
317 | 332 | | |
318 | 333 | | |
319 | 334 | | |
| |||
0 commit comments