Skip to content

Harden generated media tags and batch tag merge#19

Merged
brianmeyer merged 1 commit intomasterfrom
codex/rec-192-193-tag-followups
Mar 22, 2026
Merged

Harden generated media tags and batch tag merge#19
brianmeyer merged 1 commit intomasterfrom
codex/rec-192-193-tag-followups

Conversation

@brianmeyer
Copy link
Copy Markdown
Owner

Summary

  • strip fenced markdown wrappers before parsing generated media tags
  • merge duplicate search_batch tag sets deterministically across contributing hits
  • add regressions for fenced JSON tag output and duplicate batch-hit tag merging

Testing

  • pytest -q tests/test_storage.py -k "generated_media_tags_strip_fenced_json or memory_lookup_surfaces_media_tags or index_image_stores_caption_in_text_body or index_video_stores_video_caption_in_text_body"
  • pytest -q tests/test_search_batch.py -k "same_document"
  • pytest -q tests/test_storage.py tests/test_search_batch.py tests/test_search_pipeline.py

Closes REC-192
Closes REC-193

@brianmeyer brianmeyer merged commit e309699 into master Mar 22, 2026
1 of 3 checks passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bf4ec6fa1d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/recallforge/search.py
Comment on lines +1603 to +1604
else:
merged[filepath]['results'].append(result)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve query order when merging duplicate result tags

The new tag merge path appends duplicate hits in whatever order all_results is iterated, but all_results is populated via as_completed, so completion timing can reorder queries between runs. In batches where multiple queries hit the same filepath with different tags, tags=_merge_tags(data['results']) will produce different tag orders for identical inputs, which breaks reproducibility and can make deterministic-order assertions flaky.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant