Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions tests/msc4140/delayed_event_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -371,6 +371,11 @@ func TestDelayedEvents(t *testing.T) {

Copy link
Copy Markdown
Collaborator Author

@MadLittleMods MadLittleMods Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As experienced when running this test against the worker-based Synapse setup we use alongside the Synapse Pro Rust apps, https://github.com/element-hq/synapse-rust-apps/actions/runs/24910122124/job/72949760158?pr=360 (https://github.com/element-hq/synapse-rust-apps/pull/360)

Error encountered:

❌ TestDelayedEvents/delayed_state_events_are_kept_on_server_restart (10.12s)
      delayed_event_test.go:425: StopServer hs1
      delayed_event_test.go:429: StartServer hs1
      delayed_event_test.go:443: CSAPI.MustDo GET http://127.0.0.1:32978/_matrix/client/v3/rooms/%21MbDncghrqxTzEmQhCP:hs1/state/com.example.test/1 returned non-2xx code: 404 Not Found - body: {"errcode":"M_NOT_FOUND","error":"Event not found."}

I haven't actually checked whether this PR fixes the problem there (just theory)

Copy link
Copy Markdown
Collaborator Author

@MadLittleMods MadLittleMods Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this happen?

I guess this happens because the worker that processes delayed events and updates the rooms state, isn't necessarily the one that serves state requests. Is this even true?

It looks like the main process in Synapse handles processing delayed events.

And it looks like /state requests can be handled by workers. Although, the regex there is slightly strange as ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/state$ doesn't cover /state/{eventType}/{stateKey} requests (only /state). ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/state/ is also there but listed as "event sending requests" probably because the GET vs PUT is the same path. In the actual worker config we are using in the workerized Synapse image for Complement, /state/{eventType}/{stateKey} isn't covered by any workers.

So I guess both are processed on the Synapse main process and this shouldn't be a problem? Perhaps this is a problem with Synapse itself 🤔

It shouldn't have anything to do with running with the Synapse rust apps as those currently only cover state federation servlets (/_matrix/federation/v1/state_ids/{roomId, /_matrix/federation/v1/state/{roomId}, /_matrix/federation/v1/event/{eventId}) which isn't the client API.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @AndrewFerr any insight?

Copy link
Copy Markdown
Contributor

@reivilibre reivilibre Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the main process in Synapse handles processing delayed events.

Even if so, the state will have to get persisted on the correct event_persister for the room and the main process might be serving stale data until the invalidation comes through. So that could explain the difference.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, good catch on the event_persister wrench being thrown in here @reivilibre!

I'm going to merge as this is already a step forward and got review attention/approval but we should also update the other spots that try to check /state as a one-off (wherever user.Do(t, "GET", getPathForState(...)) is happening). The negative assertions are a bit tricky.

numberOfDelayedEvents := 0

// Start a sync loop (initial sync)
since := user.MustSyncUntil(
t, client.SyncReq{},
)

// Send an initial delayed event that will be ready to send as soon as the server
// comes back up.
user.MustDo(
Expand Down Expand Up @@ -440,7 +445,9 @@ func TestDelayedEvents(t *testing.T) {
remainingDelayedEventCount := countDelayedEvents(t, delayedEventResponse)
// Sanity check that the room state was updated correctly with the delayed events
// that were sent.
user.MustDo(t, "GET", getPathForState(roomID, eventType, stateKey1))
since = user.MustSyncUntil(t, client.SyncReq{Since: since}, client.SyncStateHas(roomID, func(ev gjson.Result) bool {
return ev.Get("type").Str == eventType && ev.Get("state_key").Str == stateKey1
}))
Comment on lines +448 to +450
Copy link
Copy Markdown
Collaborator Author

@MadLittleMods MadLittleMods Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the way traditional /sync works (de-duplicates events in state that are already in timeline), this assertion was flawed and actually failed and I ended up reverting this PR in #867

I haven't actually checked whether this PR fixes the problem there (just theory)

(should've actually tested this 🤦 but got distracted by the early approval)

Re-introducing the fixes in #869


// Wait until we see another delayed event being sent (ensure things resumed and are continuing).
time.Sleep(10 * time.Second)
Expand All @@ -452,7 +459,9 @@ func TestDelayedEvents(t *testing.T) {
// FIXME: Ideally, we'd check specifically for the last one that was sent but it
// will be a bit of a juggle and fiddly to get this right so for now we just check
// one.
user.MustDo(t, "GET", getPathForState(roomID, eventType, stateKey2))
since = user.MustSyncUntil(t, client.SyncReq{Since: since}, client.SyncStateHas(roomID, func(ev gjson.Result) bool {
return ev.Get("type").Str == eventType && ev.Get("state_key").Str == stateKey2
}))
})
}

Expand Down
Loading