Enhancements for the esp32_flash task#52
Conversation
5685f2a to
91a5fc9
Compare
91a5fc9 to
783d3ad
Compare
586a1ef to
d334c12
Compare
3a391b9 to
4d4e57e
Compare
9b7d519 to
fe79208
Compare
|
|
||
| %% @private | ||
| get_part_tempfile(State) -> | ||
| OutDir = filename:absname(rebar_dir:base_dir(State)), |
There was a problem hiding this comment.
We probably should call mktemp(1) here
There was a problem hiding this comment.
Indeed. When I was first implementing this I intended to look into OTP methods for using temp files, but by the time I got the tricky bits working I forget to go back and address this.
There was a problem hiding this comment.
I did use mktemp here, which prompted me to add atomvm_rebar3_plugin:external_command/3 for better re-usability (making the transition to replace use of os:cmd/1,2 with open_port/2 in other providers). I also added a temp logfile for esptool.py output. It turns out that the dumpfile created by mktemp causes esptool.py an error, so it needs to be deleted immediately, but the tempfile generated name is still used. Both the esptool.py output log and dumpfile are cleaned up before exiting successfully, but retained, and the full path to the logfile displayed if there is a failure.
885f434 to
db356bd
Compare
5cfb5c7 to
15f60c3
Compare
dbbf440 to
54d23fd
Compare
54d23fd to
c0862f4
Compare
Changes error reporting for the esp32_flash task to include stack trace when rebar3 is executed with DIAGNOSTIC=1 or DEBUG=1. Signed-off-by: Winford <winford@object.stream>
c0862f4 to
9afe127
Compare
The `offset` for the beam application partition is now read from the partition table on the attached esp32 device. When a custom partition table is used that does not use `main.avm` for the beam app partition name the `app_partition` parameter should be used to specify the application partition to be flashed. Valid application partition sub-types (the type is `data`) are `phy` or `0xAA`. If the `offset` parameter is specified it will be used to verify that the offset address of the application partition matched the expected value. This may be used to prevent flashing to a standard build of AtomMV for application that require a custom partition table. The `port` that the ESP32 is attached to is now auto discovered by default. When more than one ESP32 device is plugged into USB the port should be specified to control which device is flashed. Error reporting has been improved with descriptive error messages. Dialyzer warnings for the esp32_flash task have been fixed when analyzing atomvm_rebar3_plugin. Signed-off-by: Winford <winford@object.stream>
| ### Fixed | ||
| - The `esp32_flash` task aborts when an error occurs, rather than attempt to continue after a step | ||
| has failed. | ||
| >>>>>>> ddc33c5 (Enhancements for the esp32_flash task) |
|
AMP review: (want to land this before adding https://github.com/benoitc/erlang-python flashing etc.) usual caveats pick and choose.. PR Review —
|
| # | Severity | Issue |
|---|---|---|
| 1 | Blocker | Stray merge-conflict marker in CHANGELOG.md |
| 2 | Major | Partition entry parser mis-decodes offsets (wrong byte layout / endianness) |
| 3 | Major | wait_for_exit raises over_2_minutes, but the decoder matches min_5 |
| 4 | Major | os:find_executable("esptool.py") can return false, propagated into open_port |
| 5 | Major | Baud is only normalised in env_opts/0; rebar.config strings crash integer_to_list/1 |
| 6 | Major | Partition-read command does not pass --chip / --baud even though the user configured them |
| 7 | Major | Temp file is deleted then re-used (race) and is leaked on every error path |
| 8 | Minor | no_dump_file raised but never decoded |
| 9 | Minor | Dead code: read_app_offset/2, partition_at_offset/5, several unreachable decode_abort_reason clauses |
| 10 | Minor | ?DEFAULT_OPTS re-evaluates os:find_executable/1 on every reference |
| 11 | Minor | Env var name ATOMVM_REBAR3_PLUGIN_ESP32_APP_PARTITION breaks the existing _ESP32_FLASH_* convention |
| 12 | Minor | Catch clause C:rebar_abort:S is correct but obscure; prefer throw:rebar_abort:S |
| 13 | Nit | Cleanup nits in esp_part_dump.erl (huge case ladder, unused fs_type() constructors, etc.) |
1. [Blocker] Merge-conflict marker in CHANGELOG.md
CHANGELOG.md contains a literal >>>>>>> marker. This will break any Markdown renderer and almost certainly was unintentional:
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -38,7 +38,6 @@
### Fixed
- The `esp32_flash` task aborts when an error occurs, rather than attempt to continue after a step
has failed.
->>>>>>> ddc33c5 (Enhancements for the esp32_flash task)
## [0.7.5] (2025.05.27)2. [Major] Partition-entry parser mis-decodes offsets
The clauses in parse_data_partitions/2 use a hand-rolled bit layout that is partly wrong:
<<16#aa50:16, 16#01, 16#01, 0, Address:4/binary, _:3/binary, PName:16/binary, 0:32>>The ESP-IDF on-disk layout for a partition entry is:
| Bytes | Field |
|---|---|
| 0-1 | magic 0xAA 0x50 (i.e. 0x50AA LE) |
| 2 | type |
| 3 | subtype |
| 4-7 | offset (little-endian uint32) |
| 8-11 | size (little-endian uint32) |
| 12-27 | label (16 bytes, NUL-padded) |
| 28-31 | flags (little-endian uint32) |
The current pattern instead consumes one literal 0 byte (offset's LSB) and then four bytes (Address:4/binary) that span the upper 3 bytes of offset + the LSB of size, and decodes them with binary:decode_unsigned/1 (big-endian by default).
This "accidentally" returns the right value when:
- the offset is 256-byte aligned (so the LSB is
0, which is what the literal0enforces); and - the offset fits in the upper 3 bytes (i.e. < 16 MiB); and
- the size is 256-byte aligned (so the byte that bleeds into
Addressis0).
It silently breaks for:
- Offsets ≥ 16 MiB — e.g. an app partition at
0x1000000will be decoded as0x100. Real for ESP32-S3 / 32 MiB boards. - Partitions whose size LSB ≠ 0 — e.g.
size=0x1234corrupts the offset value. - Partitions whose offset LSB ≠ 0 — match fails entirely, the catch-all
corrupt_partition_datafires even though the table is valid.
There is also code smell: the same 9-clause ladder duplicates the entire bit pattern just to dispatch on SubType. Decode once and dispatch on values:
parse_data_partitions(<<>>, Acc) ->
lists:reverse(Acc);
parse_data_partitions(
<<16#AA, 16#50, Type:8, SubType:8,
Offset:32/little, _Size:32/little,
PName:16/binary,
_Flags:32/little, Rest/binary>>, Acc) ->
[Name | _] = binary:split(PName, <<0>>),
Entry = {Offset, binary_to_list(Name), classify(Type, SubType)},
parse_data_partitions(Rest, [Entry | Acc]);
parse_data_partitions(
<<16#EB, 16#EB, _Padding:14/binary, _MD5:16/binary, Rest/binary>>, Acc) ->
%% MD5 marker — partition table proper has ended. Remaining bytes are 0xFF padding.
_ = Rest,
lists:reverse(Acc);
parse_data_partitions(<<16#FF, _/binary>>, Acc) ->
%% Erased flash beyond the last entry.
lists:reverse(Acc);
parse_data_partitions(_, _) ->
error(corrupt_partition_table).
classify(16#01, 16#00) -> {reserved, ota};
classify(16#01, 16#01) -> phy;
classify(16#01, 16#02) -> {reserved, nvs};
classify(16#01, 16#03) -> {reserved, coredump};
classify(16#01, 16#04) -> {reserved, nvs_keys};
classify(16#01, 16#05) -> {reserved, efuse};
classify(16#01, 16#06) -> undefined;
classify(16#01, 16#81) -> {reserved, fat};
classify(16#01, 16#82) -> {reserved, spiffs};
classify(16#01, 16#83) -> {reserved, littlefs};
classify(16#01, 16#AA) -> avm_app;
classify(16#01, Sub) -> Sub; %% any other data subtype
classify(_, _) -> protected. %% non-data partition (type != 0x01)This is far shorter, easier to test, and actually correct for any aligned offset.
Note: the existing tests use a fixture with all-aligned 64 KiB-aligned, sub-16 MiB offsets, so this bug is invisible in CI. Adding a fixture with a partition at e.g.
0x1000000(or just unit-testingparse_data_partitions/1directly on hand-written bytes) would catch it.
3. [Major] Timeout error atom mismatch
atomvm_rebar3_plugin:external_command/3 raises:
error({timeout, CmdName, over_2_minutes})…but the decoder matches a different atom:
{timeout, Name, min_5} ->
rebar_api:abort("External command ~s failed (5 minute timeout exceeded)", [Name]);The result is that a real timeout falls all the way through decode_abort_reason/1 to the generic Reason -> rebar_api:abort("Unexpected abort! Reason ~w", ...) branch, giving a confusing message.
- {timeout, Name, min_5} ->
- rebar_api:abort("External command ~s failed (5 minute timeout exceeded)", [Name]);
+ {timeout, Name, over_2_minutes} ->
+ rebar_api:abort("External command ~s failed (2 minute timeout exceeded)", [Name]);(Or change the raiser to match the decoder — but the decoder is the user-facing string, so update both consistently.)
4. [Major] ?DEFAULT_OPTS esptool default can be false
-define(DEFAULT_OPTS, #{
esptool => os:find_executable("esptool.py"),
...
}).If esptool.py is not on PATH, os:find_executable/1 returns false. That value flows through env_opts/0 (because os:getenv("…", false) returns false when the env var is unset) into external_command(false, …), and finally into open_port({spawn_executable, false}, …), which crashes with a generic badarg rather than the much friendlier {enoent, "esptool.py"} clause that already exists.
The default also doesn't fall back to esptool (without .py) — newer ESP-IDF installs ship it under that name on Linux.
%% atomvm_esp32_flash_provider.erl
default_esptool() ->
case os:find_executable("esptool.py") of
false ->
case os:find_executable("esptool") of
false -> "esptool.py"; %% let external_command produce {enoent, ...}
Path -> Path
end;
Path -> Path
end.
default_opts() ->
#{
esptool => default_esptool(),
chip => "auto",
port => "auto",
baud => 115200,
offset => auto,
app_partition => "main.avm"
}.Then in env_opts/0:
env_opts() ->
Defaults = default_opts(),
#{
esptool => os:getenv("ATOMVM_REBAR3_PLUGIN_ESP32_FLASH_ESPTOOL",
maps:get(esptool, Defaults)),
...
}.This also fixes finding #10 (no repeated find_executable per ?DEFAULT_OPTS expansion).
5. [Major] Baud not normalised after merging rebar.config
env_opts/0 runs values through maybe_integer_from_string/1, but get_opts/1 then layers RebarOpts and ParsedOpts over the result:
maps:merge(
maps:merge(env_opts(), RebarOpts),
ParsedOpts
).A user putting {baud, "921600"} (string) into rebar.config will reach integer_to_list(Baud) later in do_flash/8 and crash. Same for offset.
Normalise at the very end of get_opts/1:
get_opts(State) ->
{ParsedArgs, _} = rebar_state:command_parsed_args(State),
rebar_api:debug("ParsedArgs = ~w", [ParsedArgs]),
RebarOpts = atomvm_rebar3_plugin:get_atomvm_rebar_provider_config(State, ?PROVIDER),
ParsedOpts = atomvm_rebar3_plugin:proplist_to_map(ParsedArgs),
- maps:merge(
- maps:merge(env_opts(), RebarOpts),
- ParsedOpts
- ).
+ Merged = maps:merge(maps:merge(env_opts(), RebarOpts), ParsedOpts),
+ Merged#{
+ baud := maybe_integer_from_string(maps:get(baud, Merged)),
+ offset := maybe_integer_from_string(maps:get(offset, Merged))
+ }.Once that is in place, the maybe_integer_from_string(maps:get(offset, Opts)) call in do/1 becomes redundant.
6. [Major] Partition-read does not honour configured --chip / --baud
do_flash/8 flashes with the user's --chip and --baud:
StdArgs = ["--chip", Chip, "--baud", integer_to_list(Baud), ..., "write_flash", ...]…but esp_part_dump:dump_device_partition/4 reads the partition table with none of those flags:
BaseArgs = ["read_flash", "0x8000", "0xC00", TempFile],
Args = case Port of "auto" -> BaseArgs; _ -> ["--port", Port | BaseArgs] end,This:
- forces
esptool.pyto auto-detect chip and use its default 115200 baud, even when the user explicitly configured a non-default value (slow on big tables, can fail on flaky cables); and - diverges from how the actual flash will run, so partition discovery may succeed/fail differently from the flash itself.
-dump_device_partition(Esptool, Port, TempFile, Logfile) ->
+dump_device_partition(Esptool, Chip, Port, Baud, TempFile, Logfile) ->
BaseArgs = [
+ "--chip", Chip,
+ "--baud", integer_to_list(Baud),
"read_flash",
"0x8000",
"0xC00",
TempFile
],…and thread Chip / Baud through read_app_offset/7 and partition_at_offset/7. The provider already has the values handy in do_flash/8.
7. [Major] Tempfile race and leak
get_part_tempfiles/0:
TempFile = create_temp_file("partitions.bin.XXXXX"),
%% we are generating the file name as a side-effect, the partition table dump file should not actually exist yet.
file:delete(TempFile),Two problems:
- Race:
mktempcreates the file atomically as a security measure; immediatelyunlink-ing it and then later havingesptool.pyre-create the same path defeats that. Another process canmktempand grab the freed name. - Leak on failure: the only
file:delete(TempFile)after the work happens at the bottom ofdo_flash/8. Any abort (rebar_api:abortthrows) bypasses it, so every failed flash leavespartitions.bin.XXXXXandesptool.log.XXXXXfiles behind in$TMPDIR.
Use a temp directory and clean up in after:
do_flash(ProjectApps, EspTool, Chip, Port, Baud, Offset, Partition, _State) ->
TempDir = create_temp_dir("atomvm_esp32_flash.XXXXXX"),
TempFile = filename:join(TempDir, "partitions.bin"),
EspToolLog = filename:join(TempDir, "esptool.log"),
try
Address = resolve_address(EspTool, Chip, Port, Baud,
Offset, Partition, TempFile, EspToolLog),
...
after
_ = file:delete(TempFile),
_ = file:delete(EspToolLog),
_ = file:del_dir(TempDir)
end.
create_temp_dir(Format) ->
{ok, Path} = atomvm_rebar3_plugin:external_command(
os:find_executable("mktemp"), ["-d", Format], none),
Path.(If you want to keep the log file when esptool fails — which the existing code does via the error/3 re-raise carrying LogFile — copy it out of the temp dir into a stable location before cleaning up.)
8. [Minor] no_dump_file is never decoded
esp_part_dump.erl raises error(no_dump_file) in read_app_offset/2 and partition_at_offset/2, but decode_abort_reason/1 only handles no_device_dump. A no_dump_file would fall through to the generic clause.
These two arities aren't called from the plugin today (see finding #9), but if you keep them as a public API:
no_device_dump ->
rebar_api:abort("No partition table retrieved from device.", []);
+ no_dump_file ->
+ rebar_api:abort("No partition dump file available.", []);…or change esp_part_dump.erl to raise no_device_dump consistently.
9. [Minor] Dead code
esp_part_dump:read_app_offset/2andesp_part_dump:partition_at_offset/5are exported but never called inside the plugin and have no tests. Either add tests + document them as public API or drop them.decode_abort_reason/1has several clauses that nothing in the codebase raises:invalid_partition_tableno_data_partitions{"esptool.py failure", Status}
Delete them or wire them up — the dead branches are misleading when debugging.
10. [Minor] ?DEFAULT_OPTS re-evaluates on every reference
Because ?DEFAULT_OPTS is a macro, every occurrence (maps:get(esptool, ?DEFAULT_OPTS), etc.) re-builds the map and re-runs os:find_executable/1. Cheap, but easy to avoid:
-env_opts() ->
- #{
- esptool => os:getenv("...", maps:get(esptool, ?DEFAULT_OPTS)),
- ...
- }.
+env_opts() ->
+ Defaults = default_opts(),
+ #{
+ esptool => os:getenv("...", maps:get(esptool, Defaults)),
+ ...
+ }.(Combine with finding #4.)
11. [Minor] Env var name breaks the existing convention
All other esp32 env vars are prefixed ATOMVM_REBAR3_PLUGIN_ESP32_FLASH_*:
ATOMVM_REBAR3_PLUGIN_ESP32_FLASH_ESPTOOL
ATOMVM_REBAR3_PLUGIN_ESP32_FLASH_CHIP
ATOMVM_REBAR3_PLUGIN_ESP32_FLASH_PORT
ATOMVM_REBAR3_PLUGIN_ESP32_FLASH_BAUD
ATOMVM_REBAR3_PLUGIN_ESP32_FLASH_OFFSET
The new one drops _FLASH_:
ATOMVM_REBAR3_PLUGIN_ESP32_APP_PARTITION
Rename to ATOMVM_REBAR3_PLUGIN_ESP32_FLASH_APP_PARTITION for consistency (and update README.md + test/run.sh). If you'd rather accept both for a release cycle, fall back to the legacy name with a deprecation log.
12. [Minor] Obscure exception clause
catch
C:rebar_abort:S ->
...This does work — rebar_api:abort/2 ends up as throw(rebar_abort), so C is bound to throw and the literal rebar_abort matches the reason. But it reads as if rebar_abort were a variable. Spell it explicitly:
- C:rebar_abort:S ->
- rebar_api:error("A fatal error occurred in the ~p task.", [?PROVIDER]),
- rebar_api:debug("Class=~p, Error=~p~nSTACKTRACE:~n~p~n", [C, rebar_abort, S]),
+ throw:rebar_abort:S ->
+ rebar_api:error("A fatal error occurred in the ~p task.", [?PROVIDER]),
+ rebar_api:debug("STACKTRACE:~n~p~n", [S]),
{error, rebar_abort};13. [Nit] Misc
esp_part_dump.erl'sfs_type()advertisesbyte()so any unknown subtype is abytevalue — but the type spec lists explicit constructors first. Either keep the catch-all and remove redundancy, or dropbyte()and force the catch-all to map to{custom, byte()}.dump_device_partition/4returnsPartition_data— Erlang convention isPartitionData(snake-cased variables are rare in OTP source).read_app_offset/5simply returns whatlookup_partition_by_name/2returns; the intermediateOffset = ..., Offset.can belookup_partition_by_name(...)directly. Same forpartition_at_offset/5.- The 32-byte literal in the MD5-marker pattern uses
16#ffffffffffffffffffffffffffff:112(hard to count)._Padding:14/binaryis identical and readable. external_command/3's spec advertisesok | {ok, iodata()}depending onLogFile; that "polymorphic by argument" return type makes callers either ignore the payload or pattern-match defensively. Two functions (run_command/2+run_command_with_log/3) would be clearer.test/driver/scripts/esptool.shunsetsATOMVM_REBAR3_PLUGIN_PARTITION_DATAinside the script, but that runs in a subshell so the unset is a no-op. Either drop it or unset from the calling test.Makefile:etestnow exportsTEST=1only to make the provider emit the verboseesptool.py ...line. Consider a more specific name (ATOMVM_REBAR3_PLUGIN_DEBUGor similar) — a bareTEST=1env var is a common shell convention that other tools might also react to.
Suggested fix-up commit (single small change)
If you want one focused follow-up commit that clears the blocker plus the two highest-impact correctness issues (timeout atom and os:find_executable returning false), the smallest such patch is:
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@
- The `esp32_flash` task aborts when an error occurs, rather than attempt to continue after a step
has failed.
->>>>>>> ddc33c5 (Enhancements for the esp32_flash task)
## [0.7.5] (2025.05.27)
diff --git a/src/atomvm_esp32_flash_provider.erl b/src/atomvm_esp32_flash_provider.erl
@@
- {timeout, Name, min_5} ->
- rebar_api:abort("External command ~s failed (5 minute timeout exceeded)", [Name]);
+ {timeout, Name, over_2_minutes} ->
+ rebar_api:abort("External command ~s failed (2 minute timeout exceeded)", [Name]);
@@
- esptool => os:find_executable("esptool.py"),
+ esptool => case os:find_executable("esptool.py") of
+ false -> "esptool.py"; %% defer to {enoent, _} reporting
+ P -> P
+ end,The remaining items (partition parser, baud/offset normalisation, tempfile cleanup, chip/baud on read_flash) deserve their own commits with accompanying tests.
Verification checklist for a follow-up
-
grep '>>>>>>>' CHANGELOG.mdreturns nothing. -
make eteststill green after baud/offset normalisation + partition-parser rewrite. - New unit test for
esp_part_dump:parse_data_partitions/1covering offsets0x9000,0x1000000, and a partition whose size has a non-zero low byte. - New test that forces the 2-minute timeout path (e.g. an
esptool.shthatsleeps) and confirms the user-visible message mentions the actual timeout. -
unset ATOMVM_REBAR3_PLUGIN_ESP32_FLASH_APP_PARTITIONintest/run.shonce the env var is renamed.
Changes include automatic detection of the beam application partition (default
main.avm) to prevent overwriting the wrong flash sectors. This prevents overwriting Elixir library modules for Elixir supported builds, and assures that the application will be written to the same address that AtomVM will load and run. When flashing to a device with a custom partition table theapp_partitionparameter should be supplied if the app partition is not namedmain.avm.Other changes include: