Add uvloop-inspired async task API by benoitc · Pull Request #25 · benoitc/erlang-python

benoitc · 2026-03-11T18:57:33Z

Summary

Add thread-safe async task API to py_event_loop module
py_event_loop:run/3,4 - Blocking run of async Python functions
py_event_loop:create_task/3,4 - Non-blocking task submission
py_event_loop:await/1,2 - Wait for task result with timeout
py_event_loop:spawn_task/3,4 - Fire-and-forget task execution

Uses enif_send for thread-safe wakeup from dirty schedulers. Task queue protected by mutex. Results delivered via Erlang message passing.

Also adds erlang.sleep() and erlang.call() for sync context, plus explicit scheduling API (erlang.schedule(), erlang.schedule_py()).

Add erlang.sleep() function that works in both async and sync contexts: - Async: returns asyncio.sleep() which uses Erlang timer system - Sync: uses erlang.call('_py_sleep') callback with receive/after, truly releasing the dirty scheduler for cooperative yielding Remove unused _erlang_sleep NIF which only released the GIL but blocked the pthread. The callback approach properly suspends the Erlang process. Changes: - Add sleep() to _erlang_impl and export to erlang module - Add _py_sleep callback in py_event_loop.erl - Remove py_erlang_sleep NIF and dispatch_sleep_complete - Remove sync_sleep fields from event loop struct - Remove sleep handlers from py_event_worker - Update tests to use erlang.sleep()

Update docstring and asyncio.md to clarify: - Both sync and async modes release the dirty NIF scheduler - Async: yields to event loop via asyncio.sleep()/call_later() - Sync: suspends Erlang process via receive/after callback Also fix outdated architecture diagram that referenced removed sleep_wait/dispatch_sleep_complete NIF.

- Make erlang.call() blocking (no replay) - Add erlang.schedule(), schedule_py(), consume_time_slice() - ScheduleMarker type for explicit dirty scheduler release

Add documentation for: - erlang.call() now blocks (no replay) - erlang.schedule() for Erlang callback continuation - erlang.schedule_py() for Python function continuation - erlang.consume_time_slice() for cooperative scheduling

Implement uvloop-inspired thread-safe task submission for running async Python coroutines from any Erlang dirty scheduler thread. Core changes: - Add task queue (ErlNifIOQueue) to event loop for atomic operations - nif_call_soon_threadsafe: serialize and enqueue task, send wakeup - nif_process_ready_tasks: dequeue and schedule tasks on event loop - py_event_worker handles task_ready message to process queue High-level Erlang API: - py_event_loop:run/3,4 - blocking run, wait for result - py_event_loop:create_task/3,4 - non-blocking, returns ref - py_event_loop:spawn/3,4 - fire-and-forget with optional notify - py_event_loop:await/1,2 - wait for task result Uses enif_send() for thread-safe wakeup from any dirty scheduler, avoiding the thread-local event loop issues with asyncio.

This reverts commit cba1903.

Implements a thread-safe async task queue that works from dirty schedulers: - Add task_queue (ErlNifIOQueue) and py_loop fields to erlang_event_loop_t - nif_submit_task: Thread-safe task submission via enif_ioq and enif_send - nif_process_ready_tasks: Dequeue tasks, create coroutines, schedule on loop - py_event_worker handles task_ready wakeup message - High-level Erlang API: run/3,4, create_task/3,4, await/1,2, spawn_task/3,4 - Python ErlangEventLoop registers with global loop via _set_global_loop_ref - Register callbacks early in supervisor to ensure availability

- Add uvloop-style lazy Python loop creation in process_ready_tasks - Only call _run_once when coroutines are scheduled (not for sync functions) - Use enif_send directly for sync function results (faster path) - Fix queue size tracking in task processing loop Before: 1003 ms/task (1 task/sec) After: 0.009 ms/task (117K tasks/sec)

Pass timeout_hint=0 to _run_once() when coroutines are scheduled, preventing the event loop from blocking for up to 1 second when work is already pending. This matches uvloop's approach of computing exact sleep times. Changes: - Add timeout_hint parameter to ErlangEventLoop._run_once() - Update C code to pass timeout=0 after scheduling coroutines - Add bench_channel_async.erl for sync vs async comparison

Reduces async task API overhead by: - Early exit before GIL acquisition when task queue is empty - Caching asyncio module and _run_and_send function across calls - Only calling _run_once when coroutines are actually scheduled Performance improvements: - create_task + await: ~40% faster (157K vs 113K tasks/sec) - Concurrent tasks: ~30% faster (360K vs 275K tasks/sec)

uvloop-style optimizations for the Python event loop: - Handle pooling: reuse Handle objects in call_soon() instead of allocating - Time caching: cache time.monotonic() at start of each _run_once iteration - Clear context references when returning handles to pool These reduce allocations and syscalls in the hot path.

Restructure nif_process_ready_tasks into two phases: - Phase 1: Dequeue all tasks WITHOUT GIL (NIF operations only) - Phase 2: Acquire GIL once, process entire batch, release Benefits: - GIL held only during Python operations, not NIF operations - Batch up to 64 tasks per GIL acquisition - Task queue mutex released before GIL acquired (no lock overlap)

SuspensionRequiredException inherits from BaseException, not Exception, so the except Exception block didn't catch it. This caused the suspension mechanism to replay the entire function, making time measurements show ~0 elapsed time instead of the actual sleep duration. The fix catches BaseException and falls back to time.sleep() for correct timing behavior in py:call contexts. For dirty scheduler release in sync contexts, py:exec/py:eval should be used instead.

Add 15 new tests covering: - Stdlib operations (math.sqrt, pow, floor, ceil) - Operator module functions (add, mul) - Error handling (invalid module, function, timeout) - Concurrency (multiple processes, batch tasks) - Edge cases (empty args, large results, nested data) Tests use stdlib modules to avoid context issues with __main__.

- Add growable pending queue with capacity doubling (256 to 16384 max) - Port snapshot-detach pattern to py_get_pending and py_run_once_for to reduce mutex contention during PyList building - Add callable cache (64 slots) to avoid PyImport/GetAttr per task - Add task wakeup coalescing with atomic task_wake_pending flag - Add drain-until-empty loop in py_event_worker for task processing - Replace enif_make_ref with ATOM_UNDEFINED in fd reselect hot paths - Remove unused _readers_by_cid, _writers_by_cid, _timer_heap from Python - Add wakeup coalescing to call_soon_threadsafe

Two bugs in the coalesced wakeup mechanism: 1. Early return at task_count==0 didn't clear task_wake_pending, blocking all future wakeups when a stale task_ready hit this path. 2. MAX_TASK_BATCH limit of 64 caused bursts >64 tasks to stall after the first batch since no new task_ready was sent for remaining tasks. Fix: Clear task_wake_pending before task_count check, and return 'more' atom when tasks remain. The Erlang drain loop sends task_ready to self and returns, yielding to the mailbox so select/timer messages aren't starved under sustained task traffic.

Two issues with handle pooling: 1. TimerHandle objects were being returned to the pool. asyncio.sleep keeps a reference to timer handles and cancels them in a finally block. When recycled, the cancel() affects the wrong callback, causing concurrent tasks to hang. 2. Context was set to None for pooled handles instead of copying the current context (matching Handle.__init__ behavior). This caused AttributeError when running callbacks. Also fix missing Ctx variable in test_asyncio_gather test.

- Use stdlib functions (math.sqrt, etc.) instead of __main__ module functions that weren't available in the global interpreter - Reduce timeouts from 5s to 1s since tests should succeed quickly - Remove acceptance of timeout as valid result - tests should pass Test suite now runs in ~13s instead of ~28s.

- dup() fd before registering in py_reactor_context to avoid tcp_inet driver conflict on FreeBSD - Add set_event_loop_priv_dir NIF to ensure sys.path is set in subinterpreter contexts before importing _erlang_impl._loop

- Add {schedule, binary(), tuple()} to context_call/5 and context_eval/3 specs - Add 'more' to process_ready_tasks/1 spec - Remove dead atom clause from handle_schedule/3 (callback_name is always binary)

- Add Python version compatibility table (3.9-3.14) - Document Python 3.14 SHARED_GIL subinterpreter support - Document FreeBSD fd handling improvements - Add new Python APIs: erlang.sleep(), channel.receive(), erlang.spawn_task() - Add Async Task API from Erlang side - Add Virtual Environment Management section - Add Dual Pool Support section - Add troubleshooting for Python 3.14 and FreeBSD issues

- Update ensure_venv examples to use correct signature: py:ensure_venv(Path, RequirementsFile, Opts) - Fix venv_info return format to use binary keys: #{<<"active">> := true, <<"venv_path">> := Path}

benoitc added 30 commits March 11, 2026 08:39

Add blocking erlang.call() and explicit scheduling API

2c93eee

- Make erlang.call() blocking (no replay) - Add erlang.schedule(), schedule_py(), consume_time_slice() - ScheduleMarker type for explicit dirty scheduler release

Clarify dirty scheduler behavior in channel.receive and sleep docs

800fa34

Revert "Add thread-safe async task API (call_soon_threadsafe pattern)"

d821938

This reverts commit cba1903.

Simplify cb_sleep timeout handling

ae39ce5

Document async task API in changelog and asyncio docs

22bdf0c

Add event loop architecture documentation with diagrams

173f498

Document two-phase task processing in architecture docs

63efe69

Merge fix/fix-erlang-call: Fix erlang.sleep() timing

d53114f

Merge origin/main into feature/async-task-api

f989c8b

Fix FreeBSD fd stealing and Python 3.14 subinterpreter imports

1f75db2

- dup() fd before registering in py_reactor_context to avoid tcp_inet driver conflict on FreeBSD - Add set_event_loop_priv_dir NIF to ensure sys.path is set in subinterpreter contexts before importing _erlang_impl._loop

Fix dialyzer warnings: update NIF specs for schedule and more returns

fb63922

- Add {schedule, binary(), tuple()} to context_call/5 and context_eval/3 specs - Add 'more' to process_ready_tasks/1 spec - Remove dead atom clause from handle_schedule/3 (callback_name is always binary)

Fix time() to return fresh value when loop not running

93d5a07

Optimize py_venv_SUITE: use shared venv, remove 1.1s sleep

394177b

Fix ensure_venv and venv_info docs to match actual API

cb1b28f

- Update ensure_venv examples to use correct signature: py:ensure_venv(Path, RequirementsFile, Opts) - Fix venv_info return format to use binary keys: #{<<"active">> := true, <<"venv_path">> := Path}

benoitc merged commit 57ac0b8 into main Mar 12, 2026
11 checks passed

benoitc deleted the feature/async-task-api branch March 12, 2026 10:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add uvloop-inspired async task API#25

Add uvloop-inspired async task API#25
benoitc merged 31 commits intomainfrom
feature/async-task-api

benoitc commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

benoitc commented Mar 11, 2026

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant