Skip to content

Avoid pulling too many objects into memory for indexing#7689

Open
labkey-jeckels wants to merge 6 commits into
developfrom
fb_indexingLimit
Open

Avoid pulling too many objects into memory for indexing#7689
labkey-jeckels wants to merge 6 commits into
developfrom
fb_indexingLimit

Conversation

@labkey-jeckels
Copy link
Copy Markdown
Contributor

Rationale

I investigated a heap dump yesterday that had 1.6 million newly created materials/samples in memory in an attempt to index them. We should work in much smaller batches.

Changes

  • Adopt the pattern we've been using successfully in ExperimentServiceImpl to work in chunks of 1,000
  • Optimize samples, array runs, and batches

Tasks 📍

  • Claude Code Review
  • Manual Testing
  • Test Automation

@labkey-jeckels labkey-jeckels requested a review from a team May 23, 2026 22:37
@labkey-jeckels labkey-jeckels self-assigned this May 23, 2026
@labkey-jeckels labkey-jeckels added this to the 26.06 milestone May 23, 2026
if (modifiedSince != null)
filterSQL.append(" AND ER.Modified > ?").add(modifiedSince);

List<? extends ExpRun> runs = ExperimentService.get().getExpRuns(filterSQL, _ -> true, queue.getContainer(), SearchService.INDEXING_LIMIT);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this uses ORDER BY RowId, but this contract isn't very explicit. I think maxRowIdProcessed is implicit final, I like the readability of making it explicit.

materials.forEach(m -> {
ExpMaterialImpl impl = new ExpMaterialImpl(m);
impl.index(queue, null /* null tableInfo since samples may belong to multiple containers*/);
maxRowIdProcessed.setValue(Math.max(maxRowIdProcessed.longValue(), impl.getRowId()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(applies to above as well). How would we detect that our assumption of ordering holds? Should we throw an exception if impl.getRowId() is ever < maxRowIdProcessed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants