Defer log_lines search_vector indexing off the insert path#4818
Open
stuartc wants to merge 3 commits into
Open
Defer log_lines search_vector indexing off the insert path#4818stuartc wants to merge 3 commits into
stuartc wants to merge 3 commits into
Conversation
The AFTER INSERT trigger on log_lines computed to_tsvector synchronously inside the insert transaction (it never actually deferred anything) and double-wrote every row via a self-UPDATE. It also lacked the program_limit_exceeded guard dataclips got, so an oversized message aborted the insert (rolling back the whole batch for insert_all). Remove the trigger so inserts leave search_vector NULL, and backfill it out-of-band via a new Oban worker (Lightning.LogLines.SearchVectorWorker) on a dedicated search_indexing queue. Adds a guarded safe_to_tsvector SQL function and a partial index (WHERE search_vector IS NULL) so draining pending rows stays cheap. Covers both the single (run:log) and batch (run:batch_logs) insert paths. Live log streaming is unaffected (push-based via PubSub); only full-text log search lags slightly behind ingestion.
Follow-up hardening on the search_vector deferral, from a review pass: - Fix the worker's snowball chain. Oban's default unique states include :executing and :completed, so a running snowball job matched itself when enqueueing its successor and the insert was silently deduped, breaking the chain after one hop. Restrict uniqueness to [:available, :scheduled]. - Make the pending-search index migration re-runnable: a failed CREATE INDEX CONCURRENTLY leaves an INVALID index that IF NOT EXISTS would skip and ATTACH would leave the parent invalid forever. Drop any invalid leftover first. - Make safe_to_tsvector NULL-defensive and re-runnable (CREATE OR REPLACE, drop STRICT, COALESCE the doc) so it never returns NULL and leaves a row stuck in the pending set. Also the template for the dataclip fix. - Add snowball uniqueness regression tests.
Trim hindsight/diff-narrating comments down to what's non-obvious, and rewrite the SearchVectorWorker moduledoc to read as documentation of the mechanism rather than a justification of the change.
f3f45af to
cae40bc
Compare
|
The single raw SQL call is parameterized with a module constant ( Security Review ✅
|
Member
|
@stuartc , fantastic find here. I noticed that the description seemed to reference inserting multiple log lines at once. Wanted to double check: batch log lines insert is turned off on app.openfn.org, right? I think we tried it a few months back and when the crashes started occurring we switched it off. I believe Joe has been tracking a bug related to log line ordering on the bulk log line feature too. (Just double checking that we're all on the same page here.) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This changes how
log_linesfull-text search gets indexed — taking it off thelog-insert hot path to cut down the
run:logchannel timeouts we've been seeingunder heavy log volume.
The old
AFTER INSERTtrigger computedto_tsvectorsynchronously inside theinsert transaction (so it never actually deferred anything) and double-wrote
every row via a self-
UPDATE. It also had noprogram_limit_exceededguard, soa single oversized message would abort the insert and roll back the whole batch
on the
insert_allpath.Now inserts leave
search_vectorNULL and a background Oban worker(
Lightning.LogLines.SearchVectorWorker) backfills it out-of-band on a dedicatedsearch_indexingqueue. There's a guardedsafe_to_tsvectorSQL function and apartial index (
WHERE search_vector IS NULL) so finding pending rows stayscheap. Live log streaming is unaffected — it's push-based over PubSub and never
reads
search_vector; only full-text log search now lags ingestion slightly,typically under a minute.
Three migrations, ordered 1a→1b→1c: add
safe_to_tsvector, add the partial index(built per-partition
CONCURRENTLYthen attached to the parent), then drop thetrigger.
Closes #4425
Validation steps
mix ecto.migrate). Confirm thelog_linesset_search_vectortrigger is gone,safe_to_tsvectorexists, and thepartial index is VALID across the parent + all 100 partitions. The
dataclipstrigger should be untouched.search_vectoris NULL and it isn't matched by logsearch yet.
Lightning.LogLines.SearchVectorWorker— the row'ssearch_vectorispopulated and now matches
to_tsquery('english_nostop', …).erroring, and doesn't get stuck retrying.
mix test test/lightning/log_lines/search_vector_worker_test.exs test/lightning/runs_test.exsAdditional notes for the reviewer
there's backlog, falling back to a 1-minute cron heartbeat. Concurrency is 1
on the dedicated queue (SKIP LOCKED makes bumping it safe later).
uniquestates arerestricted to
[:available, :scheduled]on purpose — the default includes:executing/:completed, which makes a running job dedup its own successorand kills the chain — and the index migration drops any INVALID leftover
before re-creating, so a failed
CONCURRENTLYbuild can be re-run.safe_to_tsvectoris also meant as the template for the dataclip trigger fix(Dataclip insert times out building the search vector, which loses the run #4800), which is separate.
AI Usage
Pre-submission checklist