fix(homeassistant): bootstrap state via REST instead of WS get_states#383
Conversation
|
Warning Review limit reached
More reviews will be available in 19 minutes and 11 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThe pull request migrates Home Assistant powermeter initialization from a WebSocket get_states request to a per-entity REST bootstrap scheduled after auth_ok, with task cancellation on stop/reconnect and updated tests and changelog. ChangesREST-based initial state bootstrap for Home Assistant powermeter
Sequence DiagramsequenceDiagram
participant WebSocket
participant HomeAssistant
participant REST
participant Cache
WebSocket->>HomeAssistant: auth_ok
HomeAssistant->>HomeAssistant: cancel prior _fetch_states_task
HomeAssistant->>HomeAssistant: schedule _fetch_initial_states task
loop per tracked entity
HomeAssistant->>REST: GET /api/states/{entity_id} (Bearer)
REST-->>HomeAssistant: { state: "...", ... }
HomeAssistant->>Cache: update cached entity state
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/astrameter/powermeter/homeassistant_test.py (1)
581-599: ⚡ Quick winCover the post-await stale-write race in this test.
This only proves
cancel()is called on a hanging task. It never drives the real_fetch_initial_states()path where cancellation can arrive after the lastawaitand still let a stale REST payload write into_entity_values. Please add a regression that lets the GET complete, calls_reset_for_reconnect(), and asserts the old bootstrap cannot reseed the cache.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/astrameter/powermeter/homeassistant_test.py` around lines 581 - 599, The test only verifies cancel() was invoked but doesn't cover the race where `_fetch_initial_states()` completes its network GET after `_reset_for_reconnect()` and then writes stale data into `_entity_values`; add a new regression within `test_reconnect_cancels_in_flight_fetch` (or a companion test) that launches the real `_fetch_initial_states()`-like coroutine as `pm._fetch_states_task`, arrange it to perform its final write after a short delay (e.g., await an Event or sleep) so you can call `pm._reset_for_reconnect()` before that write completes, then let the hanging GET finish and assert the old bootstrap did NOT reseed `pm._entity_values` (verify values remain whatever the reset set or remain empty). Use the actual method name `_fetch_initial_states` / task slot `_fetch_states_task` and check `_entity_values` to prove the stale payload cannot overwrite the cache.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/astrameter/powermeter/homeassistant.py`:
- Around line 159-163: The REST bootstrap coroutine cancelled via
_fetch_states_task can still run to completion and call _update_entity_value, so
add a generation token (e.g., self._conn_gen or self._bootstrap_gen) that is
incremented in _reset_for_reconnect() and captured at the start of the bootstrap
coroutine; before applying each response update call (the place that currently
calls _update_entity_value and writes into _entity_values) re-check that the
captured generation still equals the current token and that the entity is still
unset/fresh (e.g., not already present in self._entity_values or newer than the
bootstrap) and abort the write if not; apply the same pattern for other
bootstrap REST sections that call _update_entity_value (the other places noted)
to avoid stale overwrites and races with subscribe_entities updates.
---
Nitpick comments:
In `@src/astrameter/powermeter/homeassistant_test.py`:
- Around line 581-599: The test only verifies cancel() was invoked but doesn't
cover the race where `_fetch_initial_states()` completes its network GET after
`_reset_for_reconnect()` and then writes stale data into `_entity_values`; add a
new regression within `test_reconnect_cancels_in_flight_fetch` (or a companion
test) that launches the real `_fetch_initial_states()`-like coroutine as
`pm._fetch_states_task`, arrange it to perform its final write after a short
delay (e.g., await an Event or sleep) so you can call
`pm._reset_for_reconnect()` before that write completes, then let the hanging
GET finish and assert the old bootstrap did NOT reseed `pm._entity_values`
(verify values remain whatever the reset set or remain empty). Use the actual
method name `_fetch_initial_states` / task slot `_fetch_states_task` and check
`_entity_values` to prove the stale payload cannot overwrite the cache.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 636270bd-a491-4d9c-8956-658898bf31ea
📒 Files selected for processing (3)
CHANGELOG.mdsrc/astrameter/powermeter/homeassistant.pysrc/astrameter/powermeter/homeassistant_test.py
| if self._fetch_states_task and not self._fetch_states_task.done(): | ||
| # An in-flight REST bootstrap from the previous connection | ||
| # could otherwise race in after the reset and resurrect a | ||
| # stale value. | ||
| self._fetch_states_task.cancel() |
There was a problem hiding this comment.
Guard REST bootstrap writes with a freshness check.
Cancelling the task is not enough here. If cancellation lands after await resp.json() returns, the coroutine will still execute _update_entity_value(...) once before hitting another await, so an old bootstrap can repopulate _entity_values after _reset_for_reconnect(). The same unconditional write can also overwrite a fresher subscribe_entities update that arrived while the GET was in flight.
Use a bootstrap/connection generation token and re-check that the entity is still unset before applying the REST response.
Proposed fix
class HomeAssistant(Powermeter):
def __init__(...):
...
self._fetch_states_task: asyncio.Task[None] | None = None
+ self._bootstrap_generation = 0
self._entities_ready = asyncio.Event()
self._message_event = asyncio.Event()
def _reset_for_reconnect(self) -> None:
...
+ self._bootstrap_generation += 1
if self._fetch_states_task and not self._fetch_states_task.done():
self._fetch_states_task.cancel()
...
async def _handle_message(self, ws: aiohttp.ClientWebSocketResponse, raw: str) -> None:
...
elif msg_type == "auth_ok":
...
+ generation = self._bootstrap_generation
if self._fetch_states_task and not self._fetch_states_task.done():
self._fetch_states_task.cancel()
- self._fetch_states_task = asyncio.create_task(self._fetch_initial_states())
+ self._fetch_states_task = asyncio.create_task(
+ self._fetch_initial_states(generation)
+ )
- async def _fetch_initial_states(self) -> None:
+ async def _fetch_initial_states(self, generation: int) -> None:
if not self._session:
return
headers = {"Authorization": f"Bearer {self.access_token}"}
for eid in sorted(self._tracked_entities):
if self._entity_values.get(eid) is not None:
continue
...
- if isinstance(data, dict):
+ if (
+ generation == self._bootstrap_generation
+ and self._entity_values.get(eid) is None
+ and isinstance(data, dict)
+ ):
self._update_entity_value(eid, data.get("state"))Also applies to: 237-239, 256-278
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/astrameter/powermeter/homeassistant.py` around lines 159 - 163, The REST
bootstrap coroutine cancelled via _fetch_states_task can still run to completion
and call _update_entity_value, so add a generation token (e.g., self._conn_gen
or self._bootstrap_gen) that is incremented in _reset_for_reconnect() and
captured at the start of the bootstrap coroutine; before applying each response
update call (the place that currently calls _update_entity_value and writes into
_entity_values) re-check that the captured generation still equals the current
token and that the entity is still unset/fresh (e.g., not already present in
self._entity_values or newer than the bootstrap) and abort the write if not;
apply the same pattern for other bootstrap REST sections that call
_update_entity_value (the other places noted) to avoid stale overwrites and
races with subscribe_entities updates.
There was a problem hiding this comment.
The race can't happen under asyncio's single-threaded model. _reset_for_reconnect() runs only from _ws_loop (a separate task), so when it calls _fetch_states_task.cancel(), the bootstrap task is necessarily suspended at an await (session.get / resp.json / async with exit). The synchronous _update_entity_value(...) write sits after those awaits with no intervening await, and cancel() sets _must_cancel so the next task step raises CancelledError at the suspension point even if the awaited result is ready — the write is never reached. If the task instead finished before reset, the reset's clear loop nulls the values anyway. A token would be dead code.
There was a problem hiding this comment.
@tomquist, you're right — I retract the concern. The asyncio single-threaded model does provide the guarantee here:
- Cooperative scheduling — when
_reset_for_reconnect()is executing inside_ws_loop,_fetch_initial_statesis necessarily suspended at one of itsawaitpoints (session.get,resp.json(), or theasync with __aexit__), so no "between awaits" execution is possible. _must_cancelsemantics —Task.cancel()sets the flag, and the injectedCancelledErrorfires at the very next resumption point (theasync withexit at minimum), which is before the synchronous_update_entity_valuewrite.except asyncio.CancelledError: raise— the cancellation propagates cleanly rather than being swallowed.- Completed-task path — if the task already finished before the reset, the
for eid … self._entity_values[eid] = Noneloop in_reset_for_reconnectcovers that case.
A generation token would genuinely be dead code. The existing implementation is correct as-is.
…t of #383) WebSocket `get_states` returns every entity in the Home Assistant state machine — wasteful for setups with many entities when we only need one or a handful. Per-entity REST GETs against `/api/states/<entity_id>` are far cheaper. Direct port of Python PR #383: * Drop the `State::get_states_id` field and the WS `get_states` send + result handling added in #382. * Add `http: Arc<dyn HttpClient>` and `bootstrap_task: Arc<Mutex<Option<JoinHandle<()>>>>` to `HomeAssistant`. The bootstrap task is what we abort on reconnect / stop so a late response can't resurrect a stale value after `_reset_for_reconnect` clears the cache. * On `auth_ok`, after sending `subscribe_entities`, spawn the bootstrap task: it iterates the tracked entity set, GETs `/api/states/<entity_id>` (with the same bearer token used by the WS handshake), and feeds the response `state` field through the existing `update_value` helper. * Per-entity errors (404, network failure, decode) are logged at debug and swallowed — a single missing entity must not break the whole bootstrap; the WS stream may still deliver a value later. * The bootstrap loop skips entities whose `entity_values` slot is already populated, so it doesn't waste requests (or clobber a fresher WS-snapshot value). Aborts the bootstrap task on `run_loop` reconnect-reset and on `stop()`. Matches Python's `_fetch_states_task` cancellation semantics.
The previous get_states bootstrap used the WebSocket get_states command, which returns every entity in the Home Assistant state machine — wasteful for setups with many entities when we only care about one or a handful. Swap to per-entity REST GET /api/states/<entity_id> requests, scheduled as a background task right after subscribe_entities. Cancel any in-flight bootstrap on disconnect or stop so a late response can't resurrect a stale value after _reset_for_reconnect clears the cache.
812156b to
5bf5713
Compare
Drive the real _fetch_initial_states with a response that resolves only after _reset_for_reconnect cancels the task, proving the post-await write never lands on the cleared cache.
The previous get_states bootstrap used the WebSocket get_states command, which returns every entity in the Home Assistant state machine — wasteful for setups with many entities when we only care about one or a handful.
Swap to per-entity REST GET /api/states/<entity_id> requests, scheduled as a background task right after subscribe_entities. Cancel any in-flight bootstrap on disconnect or stop so a late response can't resurrect a stale value after _reset_for_reconnect clears the cache.
Summary by CodeRabbit