Skip to content

Add serial (UART) distribution support#2249

Open
pguyot wants to merge 1 commit intoatomvm:release-0.7from
pguyot:w13/serial-dist
Open

Add serial (UART) distribution support#2249
pguyot wants to merge 1 commit intoatomvm:release-0.7from
pguyot:w13/serial-dist

Conversation

@pguyot
Copy link
Copy Markdown
Collaborator

@pguyot pguyot commented Mar 30, 2026

Implement distributed Erlang over serial/UART connections, enabling disterl between devices connected via serial lines.

Also fix net_kernel accept handler to support alternative carriers by wildcarding the family/protocol atoms instead of hardcoding inet/tcp.

Also add os.beam to estdlib.avm fixing dialyzer errors.

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

@pguyot
Copy link
Copy Markdown
Collaborator Author

pguyot commented Mar 31, 2026

CI currently fails on newly merged riscv64 jit builds

%%
%% During the handshake phase, `send/2' and `recv/3' are called
%% synchronously. Packets are framed with a 2-byte big-endian length
%% prefix, matching the TCP distribution handshake format.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really suggest adding at the end of a serial frame always a CRC32 at the end.
A more typical scenario is:
Start of Frame byte (like 0x55 or 0xAA) + Length + Payload + CRC32

@pguyot pguyot force-pushed the w13/serial-dist branch from c7eb4e9 to 2c6dd85 Compare April 1, 2026 21:01
Implement distributed Erlang over serial/UART connections, enabling
disterl between devices connected via serial lines.

Also fix net_kernel accept handler to support alternative carriers by
wildcarding the family/protocol atoms instead of hardcoding inet/tcp.

Also add `os.beam` to `estdlib.avm` fixing dialyzer errors.

Signed-off-by: Paul Guyot <pguyot@kallisys.net>
@pguyot pguyot force-pushed the w13/serial-dist branch from 2c6dd85 to 448883b Compare April 2, 2026 05:41
@petermm
Copy link
Copy Markdown
Contributor

petermm commented Apr 2, 2026

usual caveats, pick and choose:
https://ampcode.com/threads/T-019d4d8d-a66a-7178-b26c-8597ca51214e

PR Review: Add serial (UART) distribution support

Commit: 448883b6d243af96390f2897127d1a8fbf60a765
Author: Paul Guyot
Reviewer: AI-assisted review
Verdict: ⚠️ Request changes — solid design, but several issues should be addressed before merge.


Summary

Implements distributed Erlang over serial/UART connections, enabling disterl between devices connected via serial lines. Also fixes net_kernel accept handler to wildcard family/protocol atoms, and adds os.beam to estdlib.avm.

Files changed: 14 files, +1814 / -3 lines


🔴 High Priority Issues

1. Simultaneous connect / stale setup race condition

File: libs/estdlib/src/serial_dist.erl — link_manager loop

The link manager checks mailbox for {setup, SetupPid} before reading UART data. If both nodes autoconnect simultaneously, both can enter setup mode before either reads inbound handshake data. On TCP this is resolved by having two sockets; on serial there is only one link.

Additionally, do_setup/6 waits a hardcoded 5000ms for {link_manager, DistController}, but the link manager may process the queued setup later. If SetupPid already exited, the link manager can create a controller, send to a dead pid, and wait forever on 'DOWN'.

Suggested fix: Include target node in setup message for tie-breaking, prefer accept if inbound frame data is present, and monitor SetupPid before handoff:

%% In serial_dist.erl — link_manager/4
link_manager(Kernel, UartPort, UartMod, Buffer) ->
    %% Read UART first, then check for setup
    UartMod:write(UartPort, ?SYNC_MAGIC),
    NewBuffer =
        case UartMod:read(UartPort, 500) of
            {ok, Data} -> <<Buffer/binary, Data/binary>>;
            {error, timeout} -> Buffer;
            {error, Reason} -> exit({uart_read_error, Reason})
        end,
    case serial_dist_controller:scan_frame(NewBuffer, 16) of
        {ok, _Payload, _Rest} ->
            do_accept_handshake(Kernel, UartPort, UartMod, NewBuffer);
        {crc_error, _} ->
            link_manager(Kernel, UartPort, UartMod, <<>>);
        {need_more, Trimmed} when byte_size(Trimmed) > 4 ->
            do_accept_handshake(Kernel, UartPort, UartMod, Trimmed);
        {need_more, Trimmed} ->
            %% Only check setup AFTER confirming no inbound data
            receive
                {setup, SetupPid} ->
                    case is_process_alive(SetupPid) of
                        true -> do_setup_handshake(Kernel, UartPort, UartMod, SetupPid);
                        false -> link_manager(Kernel, UartPort, UartMod, Trimmed)
                    end
            after 0 -> ok
            end,
            link_manager(Kernel, UartPort, UartMod, Trimmed)
    end.

2. UART write errors silently ignored

File: libs/estdlib/src/serial_dist_controller.erl

send_framed/3 return value is ignored in tick/1 cast and send_data_loop/1. The sent counter increments even on failure. A broken TX line will appear as a live connection.

Suggested fix:

%% handle_call({send, ...})
handle_call(
    {send, Data},
    _From,
    #state{uart = Uart, uart_module = UartMod, sent = Sent} = State
) ->
    DataBin = iolist_to_binary(Data),
    DataSize = byte_size(DataBin),
    case send_framed(UartMod, Uart, <<DataSize:16, DataBin/binary>>) of
        ok ->
            {reply, ok, State#state{sent = Sent + 1}};
        {error, Reason} ->
            {stop, {serial_write_error, Reason}, {error, Reason}, State}
    end;

%% handle_cast(tick, ...)
handle_cast(tick, #state{uart = Uart, uart_module = UartMod, sent = Sent} = State) ->
    case send_framed(UartMod, Uart, <<0:32>>) of
        ok ->
            {noreply, State#state{sent = Sent + 1}};
        {error, Reason} ->
            {stop, {serial_write_error, Reason}, State}
    end;

%% send_data_loop/1
send_data_loop(#state{dhandle = DHandle, uart = Uart, uart_module = UartMod, sent = Sent} = State) ->
    case erlang:dist_ctrl_get_data(DHandle) of
        none ->
            ok = erlang:dist_ctrl_get_data_notification(DHandle),
            State;
        Data ->
            DataBin = ?PRE_PROCESS(Data),
            DataSize = byte_size(DataBin),
            case send_framed(UartMod, Uart, <<DataSize:32, DataBin/binary>>) of
                ok ->
                    send_data_loop(State#state{sent = Sent + 1});
                {error, Reason} ->
                    exit({serial_write_error, Reason})
            end
    end.

3. net_kernel accept wildcard removes validation

File: libs/estdlib/src/net_kernel.erl lines 288–294

The change from inet, tcp to _Family, _Protocol removes all transport validation. Any local process can now trigger accept_connection/5 with arbitrary metadata. The {Kernel, unsupported_protocol} response in serial_dist:do_accept_handshake/4 also becomes dead code.

%% Before (too strict):
handle_info({accept, AcceptPid, SocketPid, inet, tcp}, ...

%% After (too loose):
handle_info({accept, AcceptPid, SocketPid, _Family, _Protocol}, ...

Suggested fix: Validate AcceptPid matches the state's accept pid:

handle_info(
    {accept, AcceptPid, SocketPid, _Family, _Protocol},
    #state{accept_pid = AcceptPid, proto_dist = ProtoDist} = State
) ->

🟡 Medium Priority Issues

4. Documentation describes wrong wire protocol

File: doc/src/distributed-erlang.md lines ~158–180

The docs describe handshake packets as <<Length:16, Payload>> and data packets as <<Length:32, Payload>>, but the actual wire format is:

<<16#AA, 16#55, Length, Payload, CRC32:32>>

The sync marker and CRC32 framing are the actual protocol but are described separately as a different layer. The three-layer breakdown is misleading — there are really two phases of the same framing protocol.

5. Documentation loop order doesn't match code

File: doc/src/distributed-erlang.md lines ~188–199

The doc says the link manager reads UART, strips sync, then maybe accepts, then maybe handles setup. The code checks for {setup, ...} first, before any UART read. This mismatch matters because it affects simultaneous-connect analysis.

6. Example code/docs node naming mismatch

File: examples/erlang/serial_disterl.erl

Comments say names like ttys003@myhost but code generates basename(lowercase(peripheral)) ++ "@serial.local" — so actual names are like ttys003@serial.local.

7. Large MAX_DATA_FRAME_SIZE on MCU targets

File: libs/estdlib/src/serial_dist_controller.erl line 66

MAX_DATA_FRAME_SIZE = 16#100000 (1 MiB). On memory-constrained targets (ESP32 has ~300 KB usable RAM), corrupted data with a plausible length could force large buffer allocation. Consider making this configurable or lowering the default.

8. recv/3 ignores the Length argument

File: libs/estdlib/src/serial_dist_controller.erl lines 125–129

The Length parameter is accepted but never used — the implementation always reads one complete framed packet. This is likely correct for the serial carrier but should be documented.

9. select/1 always returns true

File: libs/estdlib/src/serial_dist.erl lines 277–278

A single UART can only reach one peer, but select/1 claims it can connect to any node. Should document this as "one point-to-point peer per UART".

10. Singleton link manager process

File: libs/estdlib/src/serial_dist.erl lines 118–123

serial_dist_link_manager is a global registered name — only one serial distribution link per VM. This constraint should be documented.


🟢 Low Priority / Nits

11. Partial handshake bytes lost on timeout

In recv_handshake_packet/4, if a later read/2 returns {error, timeout}, the partially accumulated bytes from that recv attempt are discarded (the gen_server state retains the old buffer). This is fine if timeout is always fatal, but worth a comment.

12. CRC32 security note

CRC32 is correctly used for corruption detection, but docs should explicitly note it provides no authentication/security — it only detects accidental line noise.

13. No BEAM interop test

The commit tests AtomVM↔AtomVM serial distribution. The general docs claim BEAM compatibility, but there is no BEAM serial carrier test. OtherVersion = 6 is hardcoded in setup.


Test Coverage Gaps

Missing tests for:

  • CRC mismatch in handshake and data frames
  • Sync marker (AA 55) appearing inside payload data
  • Garbage/noise before a valid frame
  • Oversized/bogus length field rejection
  • UART write failure propagation
  • Timeout with partial frame accumulated
  • Simultaneous autoconnect from both nodes
  • Reconnect after controller crash
  • Link manager stale setup / dead SetupPid
  • Cookie mismatch behavior

What's Good

  • Clean separation between serial_dist (connection management) and serial_dist_controller (framing/protocol)
  • Well-designed mock UART HAL for testing without hardware
  • Good test suite covering happy paths (roundtrip, fragmentation, tick, getstat, initial data)
  • Comprehensive integration tests using socat virtual serial ports
  • Thorough documentation with examples for both ESP32 and Unix
  • The framing protocol with sync+length+CRC is a reasonable choice for point-to-point serial
  • scan_frame correctly handles false sync matches via length check + CRC validation

Suggested Test Implementations

The following tests can be added to tests/libs/estdlib/test_serial_dist.erl. They follow the existing conventions (mock UART, make_frame/make_handshake_frame helpers, stop_controller/1).

Wire to test/0 — add these calls to the AtomVM branch:

%% In test/0, after existing controller tests:
            ok = test_scan_frame_crc_error(),
            ok = test_scan_frame_sync_in_payload(),
            ok = test_scan_frame_garbage_before_frame(),
            ok = test_scan_frame_oversized_length(),
            ok = test_scan_frame_trailing_sync_byte(),
            ok = test_controller_recv_crc_error(),
            ok = test_controller_multiple_frames_in_buffer(),

Test: CRC mismatch detected by scan_frame

test_scan_frame_crc_error() ->
    Payload = <<"hello">>,
    Len = byte_size(Payload),
    LenAndPayload = <<Len:16, Payload/binary>>,
    GoodCRC = erlang:crc32(LenAndPayload),
    BadCRC = GoodCRC bxor 16#FFFFFFFF,
    Frame = <<16#AA, 16#55, LenAndPayload/binary, BadCRC:32>>,
    {crc_error, <<>>} = serial_dist_controller:scan_frame(Frame, 16),
    %% Also test with trailing data after the bad frame
    FrameWithTrail = <<Frame/binary, "trail">>,
    {crc_error, <<"trail">>} = serial_dist_controller:scan_frame(FrameWithTrail, 16),
    ok.

Test: Sync marker (AA 55) inside payload

test_scan_frame_sync_in_payload() ->
    %% Payload contains the sync magic bytes — must not confuse the parser
    Payload = <<16#AA, 16#55, "data_after_sync">>,
    Frame = make_handshake_frame(Payload),
    {ok, Payload, <<>>} = serial_dist_controller:scan_frame(Frame, 16),
    %% Two sync markers inside payload
    Payload2 = <<"before", 16#AA, 16#55, "middle", 16#AA, 16#55, "end">>,
    Frame2 = make_handshake_frame(Payload2),
    {ok, Payload2, <<>>} = serial_dist_controller:scan_frame(Frame2, 16),
    ok.

Test: Garbage/noise before a valid frame

test_scan_frame_garbage_before_frame() ->
    Payload = <<"hello">>,
    Frame = make_handshake_frame(Payload),
    %% Random garbage before the valid frame
    Garbage = <<"random noise bytes">>,
    Buffer = <<Garbage/binary, Frame/binary>>,
    {ok, Payload, <<>>} = serial_dist_controller:scan_frame(Buffer, 16),
    %% Garbage that contains a lone 0xAA (not followed by 0x55)
    Garbage2 = <<16#AA, "more_garbage">>,
    Buffer2 = <<Garbage2/binary, Frame/binary>>,
    {ok, Payload, <<>>} = serial_dist_controller:scan_frame(Buffer2, 16),
    ok.

Test: Oversized/bogus length field rejected

test_scan_frame_oversized_length() ->
    %% Handshake max is 8192. Create a frame with length > max.
    %% scan_frame should skip this false sync and return need_more.
    BogusLen = 9000,
    LenAndPayload = <<BogusLen:16, "x">>,
    CRC = erlang:crc32(LenAndPayload),
    BogusFrame = <<16#AA, 16#55, LenAndPayload/binary, CRC:32>>,
    {need_more, _} = serial_dist_controller:scan_frame(BogusFrame, 16),
    %% But if a valid frame follows after, it should be found
    ValidFrame = make_handshake_frame(<<"ok">>),
    Buffer = <<BogusFrame/binary, ValidFrame/binary>>,
    {ok, <<"ok">>, <<>>} = serial_dist_controller:scan_frame(Buffer, 16),
    ok.

Test: Trailing 0xAA byte preserved across reads

test_scan_frame_trailing_sync_byte() ->
    %% A lone 0xAA at the end might be the start of a sync marker.
    %% scan_frame should keep it in the buffer.
    Buffer = <<16#AA>>,
    {need_more, <<16#AA>>} = serial_dist_controller:scan_frame(Buffer, 16),
    %% Empty buffer
    {need_more, <<>>} = serial_dist_controller:scan_frame(<<>>, 16),
    %% Complete sync but no length bytes yet
    {need_more, <<16#AA, 16#55>>} = serial_dist_controller:scan_frame(<<16#AA, 16#55>>, 16),
    ok.

Test: Controller recv with CRC error returns error

test_controller_recv_crc_error() ->
    {A, B} = mock_uart_hal:create_pair(),
    {ok, Ctrl} = serial_dist_controller:start(A, mock_uart_hal),
    %% Send a frame with bad CRC
    Payload = <<"hello">>,
    Len = byte_size(Payload),
    LenAndPayload = <<Len:16, Payload/binary>>,
    BadCRC = erlang:crc32(LenAndPayload) bxor 1,
    BadFrame = <<16#AA, 16#55, LenAndPayload/binary, BadCRC:32>>,
    ok = mock_uart_hal:write(B, BadFrame),
    {error, crc_error} = serial_dist_controller:recv(Ctrl, 0, 2000),
    stop_controller(Ctrl),
    mock_uart_hal:close(A),
    mock_uart_hal:close(B),
    ok.

Test: Multiple back-to-back frames in one buffer read

test_controller_multiple_frames_in_buffer() ->
    %% When two handshake frames arrive in a single UART read,
    %% successive recv calls should return them in order.
    {A, B} = mock_uart_hal:create_pair(),
    {ok, Ctrl} = serial_dist_controller:start(A, mock_uart_hal),
    Frame1 = make_handshake_frame(<<"first">>),
    Frame2 = make_handshake_frame(<<"second">>),
    %% Write both frames as a single chunk
    ok = mock_uart_hal:write(B, <<Frame1/binary, Frame2/binary>>),
    {ok, P1} = serial_dist_controller:recv(Ctrl, 0, 2000),
    <<"first">> = iolist_to_binary(P1),
    {ok, P2} = serial_dist_controller:recv(Ctrl, 0, 2000),
    <<"second">> = iolist_to_binary(P2),
    stop_controller(Ctrl),
    mock_uart_hal:close(A),
    mock_uart_hal:close(B),
    ok.

Helper needed in make_frame (already exists)

The tests above reuse the existing make_frame/1 and make_handshake_frame/1 helpers. A data-phase helper would be useful for future data-phase tests:

%% Build a data-phase frame (32-bit length prefix).
make_data_frame(Payload) ->
    Len = byte_size(Payload),
    make_frame(<<Len:32, Payload/binary>>).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants