Symptom
Normal session shutdown produces tracebacks that look like errors but happen on every clean teardown:
livepeer_gateway.trickle_publisher.TrickleSegmentWriteError:
Trickle segment writer timed out after 5.0s
Trickle segment close suppressed seq=0
...
TimeoutError
Root cause (revised after investigation)
Not an asyncio.wait_for cancellation race (initial hypothesis was wrong). Verified via timestamped logs after migrating to asyncio.timeout — the timeouts still fire because they're legitimate:
- Mid-stream, the HTTP post-body consumer for the active segment becomes wedged or closed (the orchestrator-side reader stops reading, or the connection drops).
SegmentWriter's internal queue fills up.
- The next
await self.queue.put(data) in _stream_pipe_to_trickle blocks waiting for queue space.
- After 5s, the timeout fires →
TrickleSegmentWriteError → MediaPublish logs "dropped segment ... mid-stream" and enters drain mode.
- On
/stream/stop, SegmentWriter.close() calls queue.put(None) to signal end-of-stream — but the consumer is gone, so it hangs.
- After 5s the timeout fires →
"Trickle segment close suppressed" warning logged.
So during shutdown, every active segment with a wedged consumer adds 5s + a misleading-looking traceback before teardown completes. PR #7 / live_grayscale/test.sh reproduces it consistently.
Proposed fix (needs work)
SegmentWriter needs to know when its associated HTTP post-task is dead and short-circuit close() / write() rather than block on queue.put forever. Options:
- Track the post-task on the segment; on
close() check post_task.done() before queue.put(None).
- Use
put_nowait() and absorb QueueFull.
- Reduce the close-path timeout to e.g. 0.5s (still misleading but faster).
What PR #11 does not fix
PR #11 migrates wait_for → asyncio.timeout which is modernization, not a fix for this issue. The shutdown noise persists after that migration; investigation here is separate.
Reproduction
cd examples/runner/live_grayscale
docker compose up -d --wait --build
SKIP_VIEWER=1 bash test.sh # passes
docker logs live_grayscale # shows the tracebacks during shutdown
Severity
Cosmetic — teardown completes correctly, but the noise causes false alarms during debugging and adds 5s × N-active-segments to shutdown latency.
Symptom
Normal session shutdown produces tracebacks that look like errors but happen on every clean teardown:
Root cause (revised after investigation)
Not an
asyncio.wait_forcancellation race (initial hypothesis was wrong). Verified via timestamped logs after migrating toasyncio.timeout— the timeouts still fire because they're legitimate:SegmentWriter's internal queue fills up.await self.queue.put(data)in_stream_pipe_to_trickleblocks waiting for queue space.TrickleSegmentWriteError→MediaPublishlogs"dropped segment ... mid-stream"and enters drain mode./stream/stop,SegmentWriter.close()callsqueue.put(None)to signal end-of-stream — but the consumer is gone, so it hangs."Trickle segment close suppressed"warning logged.So during shutdown, every active segment with a wedged consumer adds 5s + a misleading-looking traceback before teardown completes. PR #7 /
live_grayscale/test.shreproduces it consistently.Proposed fix (needs work)
SegmentWriterneeds to know when its associated HTTP post-task is dead and short-circuitclose()/write()rather than block onqueue.putforever. Options:close()checkpost_task.done()beforequeue.put(None).put_nowait()and absorbQueueFull.What PR #11 does not fix
PR #11 migrates
wait_for→asyncio.timeoutwhich is modernization, not a fix for this issue. The shutdown noise persists after that migration; investigation here is separate.Reproduction
Severity
Cosmetic — teardown completes correctly, but the noise causes false alarms during debugging and adds 5s × N-active-segments to shutdown latency.