Skip to content

Tcp End#98

Open
SHshenhao wants to merge 2 commits into
mainfrom
tcp-v3
Open

Tcp End#98
SHshenhao wants to merge 2 commits into
mainfrom
tcp-v3

Conversation

@SHshenhao
Copy link
Copy Markdown

add Tcp End

root and others added 2 commits May 14, 2026 13:24
Introduce a standalone-asio-based TcpEndpoint in dlslime/csrc/engine/tcp/
with four async communication primitives, all supporting timeout (default 30s).

Architecture highlights:
- 17-byte SessionHeader (Mooncake-aligned): {size, addr, opcode} with 3 opcodes
  (OP_SEND, OP_READ, OP_WRITE) supporting 4 primitives (recv matched passively)
- TcpContext: shared io_context + connection pool + background thread,
  multiple endpoints can share one context to reduce thread count
- TcpConnectionPool: (host, port)-keyed connection reuse, 60s idle timeout
- ServerSession: async_read callback chain (readHeader->dispatch->readBody loop)
  with 64KB chunked reads for large payloads
- Symmetric connection rendezvous (is_initiator by host:port comparison)

Async primitives:
- async_send(chunk, timeout_ms=30000): post to io_ctx, async_write, signal future
- async_recv(chunk, timeout_ms=30000): FIFO registration, ServerSession matches
  incoming OP_SEND, memcpy to user buffer, signal future
- async_read(assign, timeout_ms=30000): post OP_READ header, async_read response
  data, connection reserved until response arrives
- async_write(assign, timeout_ms=30000): post OP_WRITE header+payload via
  async_write, signal future

Timeout: SO_SNDTIMEO on socket for send/write, future.wait_for(ms) timed
busy-spin (machnet_pause) for recv/read.  All return TcpFuture with wait()
and wait_for(seconds) -> int|None.

Files: 16 new (10 in tcp/), 5 modified (CMakeLists chain + bind.cpp)
Tests: 5 Python cases (send/recv, write/read, recv timeout, send timeout,
default timeout) all pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Remove void* stream from all 4 async_* methods (RDMA leftover, never used)
- Remove timeout_ms from async_recv (recv timeout via future.wait_for())
- Remove ineffective SO_SNDTIMEO calls (no effect on asio::async_write)
- Update pybind11 bindings and tests to match
- Add tcp/plan.md with v3 architecture documentation

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@SHshenhao SHshenhao requested a deployment to self-hosted-rdma May 14, 2026 14:19 — with GitHub Actions Waiting
@JimyMa JimyMa self-requested a review May 14, 2026 14:22
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


root seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants