Improve CO save performance for bulk updates by acwhite211 · Pull Request #8057 · specify/specify7

acwhite211 · 2026-05-04T22:18:09Z

The new CollectionObject/Component catalognumber uniqueness rule was running during every CollectionObject save, even when the catalognumber and collection were unchanged. These catalognumber checks seemed to be unnecessary likely the main cause of the slowdown during batch edits, data set commits, and bulk copy operations.

The most straight forward fix seemed to be skipping the component catalognumber duplicate check when a CO save does not change catalognumber or collection. I also added caching to the CollectionPreferences uniqueness setting during bulk create, bulk copy, and data set upload operations. One other thing that might add some speed up is scoping component duplicate lookups to just the current collection? I added that for now, but might remove it if it doesn't provide much speedup or needs to remain unique between collections.

Still need to do some more testing and performance checking to confirm this solution completely solves the issue.

Checklist

Self-review the PR after opening it to make sure the changes look good and
self-explanatory (or properly documented)
Add relevant issue to release milestone
Add pr to documentation list

Testing instructions

#8055 (comment)

In Batch Edit, update a non-catalognumber field across many Collection Objects and confirm the operation completes noticeably faster than before.
In Data Set commit, create or update Collection Objects and confirm progress advances steadily without the previous long delay.
Create a Collection Object with a catalognumber that matches a Component in the same collection and confirm the duplicate warning still appears.
Create or update a Collection Object with a catalog number that matches a Component in a different collection and confirm it is allowed.
Confirm regular Collection Object create/edit workflows still save successfully.

Test cat num uniqueness feat from #6708

Uniqueness: Preference to have catalog number unique across CO and Component table

In your collection preference add:
"uniqueCatalogNumberAccrossComponentAndCO": {"behavior": {"uniqueness": true}}
Save
Clear cache
Refresh
Create a new CO
Save
Open a new tab
Create a CO
Save
Add a component
Enter the cat num used in your first CO in your component cat number field
Try to save
Verify there is a save blocker saying that the cat num value is already in use
Do the same but this time use a cat num value from another component in your component cat num field
Verify there is also a save blocker when using another component cat num value

coderabbitai · 2026-05-04T22:18:17Z

📝 Walkthrough

Walkthrough

A context-scoped caching mechanism for catalog-number preference lookups is introduced across bulk operations and workbench uploads to reduce redundant computations. The cache is initialized via a context manager and consulted by preference-checking logic before full computation. All bulk creation, bulk copy, and upload entry points are wired to activate this cache.

Changes

Catalog Number Preference Caching

Layer / File(s)	Summary
Cache Infrastructure `specifyweb/backend/businessrules/utils.py`	`ContextVar`-backed cache with `(collection.id, user.id)` keys stores computed unique catalog-number preferences. New `cache_unique_catnum_preferences()` context manager initializes/resets the cache per context. Function `get_unique_catnum_across_comp_co_coll_pref()` consults cache before computing and caches the result.
Business Logic Refinement `specifyweb/backend/businessrules/rules/collectionobject_rules.py`	Added `_collection_object_catalog_check_needed(co)` helper to conditionally gate catalog-number duplication checks based on whether the object has a catalog number and is new or pre-existing. Updated `collectionobject_pre_save` to run the BusinessRuleException path only when agent has `specifyuser` and the helper returns true. Simplified duplicate-check query to remove `exclude(pk=co.pk)`.
Bulk Operation Wiring `specifyweb/backend/bulk_copy/bulk_copy.py`, `specifyweb/specify/api/dispatch.py`	`collection_dispatch_bulk_copy` and `collection_dispatch_bulk` wrap their per-object `post_resource()` loops inside `cache_unique_catnum_preferences()` context. Both functions remain unchanged otherwise.
Workbench Upload Integration `specifyweb/backend/workbench/upload/upload.py`	Main upload savepoint in `do_upload()` is wrapped with `cache_unique_catnum_preferences()` context manager to cache preferences during the upload loop.

🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Automatic Tests	⚠️ Warning	PR includes significant functional changes but lacks new automatic tests for the caching mechanism and catalog check logic, despite existing comprehensive test infrastructure.	Add tests verifying cache initialization/reset, catalog check detection, caching mechanism usage in bulk operations, and skipped checks when catalog fields unchanged.
Testing Instructions	⚠️ Warning	PR template requires 'Testing instructions' section but it is marked TODO and not filled in.	Add 'Testing instructions' section specifying test suites, manual verification steps, performance benchmarking, and regression testing to validate the changes.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main objective: improving CollectionObject save performance during bulk operations through caching and optimized checks.
Linked Issues check	✅ Passed	The PR addresses the core requirement from issue `#8055` by implementing optimizations (caching of uniqueness preferences and skipping unnecessary checks) to restore acceptable bulk operation performance.
Out of Scope Changes check	✅ Passed	All changes are directly focused on performance optimization for bulk operations and catalog number uniqueness checks; no unrelated modifications detected.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch issue-8055

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

acwhite211 · 2026-05-04T22:20:34Z

@coderabbitai review

coderabbitai · 2026-05-04T22:20:39Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

specifyweb/backend/businessrules/utils.py (1)

33-57: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid caching the fallback False after a preference read/parsing failure.

Right now any exception path leaves unique_catnum_enabled = False and still stores that value in the context cache. A single transient get_app_resource(...)/JSON failure would then disable the uniqueness rule for every later save in the same bulk operation.

Suggested fix

 def get_unique_catnum_across_comp_co_coll_pref(collection, user) -> bool:
     import specifyweb.backend.context.app_resource as app_resource

     cache = _unique_catnum_pref_cache.get()
@@
-    unique_catnum_enabled: bool = False
+    unique_catnum_enabled: bool = False
+    cacheable = True
@@
     except json.JSONDecodeError:
+        cacheable = False
         logger.warning(f"Error: Could not decode JSON for collection preferences")
     except TypeError as e:
+        cacheable = False
         logger.warning(f"Error: Unexpected data structure in collection preferences: {e}")
     except Exception as e:
+        cacheable = False
         logger.warning(f"An unexpected error occurred: {e}")

-    if cache is not None:
+    if cache is not None and cacheable:
         cache[cache_key] = unique_catnum_enabled

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specifyweb/backend/businessrules/utils.py` around lines 33 - 57, The current
try/except always leaves unique_catnum_enabled=False on errors and then writes
that fallback into cache; change this so the cache is only updated when the
preference read/parse actually succeeded. For example, after calling
app_resource.get_app_resource and json.loads and validating the nested keys (the
logic that computes unique_catnum_enabled from collection_prefs_json,
unique_catalog_number_pref and behavior), set a local success flag (or move the
cache write) and only assign cache[cache_key] = unique_catnum_enabled when
success is True and cache is not None; do not write to cache inside any except
path that handles JSONDecodeError/TypeError/Exception. Ensure references:
unique_catnum_enabled, cache, cache_key, app_resource.get_app_resource,
collection_prefs_json, json.loads, unique_catalog_number_pref, behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@specifyweb/backend/businessrules/utils.py`:
- Around line 33-57: The current try/except always leaves
unique_catnum_enabled=False on errors and then writes that fallback into cache;
change this so the cache is only updated when the preference read/parse actually
succeeded. For example, after calling app_resource.get_app_resource and
json.loads and validating the nested keys (the logic that computes
unique_catnum_enabled from collection_prefs_json, unique_catalog_number_pref and
behavior), set a local success flag (or move the cache write) and only assign
cache[cache_key] = unique_catnum_enabled when success is True and cache is not
None; do not write to cache inside any except path that handles
JSONDecodeError/TypeError/Exception. Ensure references: unique_catnum_enabled,
cache, cache_key, app_resource.get_app_resource, collection_prefs_json,
json.loads, unique_catalog_number_pref, behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7b811e07-b495-4def-9f5b-ac142c11a48f

📥 Commits

Reviewing files that changed from the base of the PR and between 52eaa9d and b3b0b2c.

📒 Files selected for processing (5)

specifyweb/backend/bulk_copy/bulk_copy.py
specifyweb/backend/businessrules/rules/collectionobject_rules.py
specifyweb/backend/businessrules/utils.py
specifyweb/backend/workbench/upload/upload.py
specifyweb/specify/api/dispatch.py

acwhite211 · 2026-05-04T22:25:40Z

Merci, Monsieur Lapin 🐰

acwhite211 · 2026-05-06T14:50:52Z

If this fix isn't enough of a speed up, I have another solution idea that involves a schema migration for adding an index on the Component.catalognumber field:

    operations = [
        migrations.AddIndex(
            model_name="component",
            index=models.Index(
                fields=["catalognumber"],
                name="ComponentCatNumIDX",
            ),
        ),
    ]

bhumikaguptaa

In Batch Edit, update a non-catalognumber field across many Collection Objects and confirm the operation completes noticeably faster than before.
In Data Set commit, create or update Collection Objects and confirm progress advances steadily without the previous long delay.
Create a Collection Object with a catalognumber that matches a Component in the same collection and confirm the duplicate warning still appears.
Create or update a Collection Object with a catalog number that matches a Component in a different collection and confirm it is allowed.
Confirm regular Collection Object create/edit workflows still save successfully.

Everything works as expected except when I tried to upload a dataset with approximately 12000 records via Workbench, it took around 20 minutes (and is still going on) to validate and then upload the dataset. The time it took is similar to the time Demo fish/main took. I am unsure if it is supposed to take this much time.

kwhuber

In Batch Edit, update a non-catalognumber field across many Collection Objects and confirm the operation completes noticeably faster than before.
In Data Set commit, create or update Collection Objects and confirm progress advances steadily without the previous long delay.
Create a Collection Object with a catalognumber that matches a Component in the same collection and confirm the duplicate warning still appears.
Create or update a Collection Object with a catalog number that matches a Component in a different collection and confirm it is allowed.
Confirm regular Collection Object create/edit workflows still save successfully.
For Data set commit, I noticed marginal improvement, but it wasn't too high of a wait to begin with. I altered a comments field of ~4800 CO records. I tested Data set manipulation on uw_geo and on main experienced 17 records committed every 3 seconds; meanwhile on the issue-8055 branch I noticed about 20-25 records committed every 3 seconds so slightly quicker.

acwhite211 · 2026-05-06T21:16:27Z

I worked some more on trying to speed up the data set commit by adding some caching functionality. I updated the fix so the bulk/dataset context now caches Component catalog numbers per collection.

Good to test out the speed of the data set commit again.

If the caching solution looks good, then maybe we can discus if adding a index to the component.catalognumber field would be better.

CarolineDenis · 2026-05-07T10:08:37Z

Updated 1153 COs in less than 5 min

acwhite211 added 5 commits May 4, 2026 17:05

Cache catalognumber uniqueness preferences during bulk operations

730b84d

Skip component duplicate checks when CO catalog fields are unchanged

fb7880c

Reuse uniqueness preference lookup across bulk copy requests

fa7a6ba

Cache uniqueness preference lookup for bulk create endpoints

c12d76e

Cache catalognumber uniqueness preference during data set upload

b3b0b2c

github-project-automation Bot moved this to 📋Back Log in General Tester Board May 4, 2026

github-project-automation Bot added this to General Tester Board May 4, 2026

acwhite211 changed the base branch from main to v7_12_0_6 May 4, 2026 22:20

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

coderabbitai Bot approved these changes May 4, 2026

View reviewed changes

CarolineDenis added this to the 7.12.0.6 milestone May 5, 2026

acwhite211 marked this pull request as ready for review May 6, 2026 15:19

acwhite211 requested review from a team May 6, 2026 15:19

bhumikaguptaa reviewed May 6, 2026

View reviewed changes

kwhuber approved these changes May 6, 2026

View reviewed changes

CarolineDenis requested a review from melton-jason May 6, 2026 19:56

grantfitzsimmons and others added 2 commits May 6, 2026 15:00

Merge branch 'v7_12_0_6' into issue-8055

b558145

Cache Component catalog lookups during bulk CO saves

3313343

CarolineDenis requested review from a team, bhumikaguptaa and kwhuber May 7, 2026 08:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve CO save performance for bulk updates#8057

Improve CO save performance for bulk updates#8057
acwhite211 wants to merge 7 commits intov7_12_0_6from
issue-8055

acwhite211 commented May 4, 2026 •

edited by CarolineDenis

Loading

Uh oh!

coderabbitai Bot commented May 4, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (2 warnings)

Uh oh!

acwhite211 commented May 4, 2026

Uh oh!

coderabbitai Bot commented May 4, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

acwhite211 commented May 4, 2026

Uh oh!

acwhite211 commented May 6, 2026

Uh oh!

bhumikaguptaa left a comment

Uh oh!

kwhuber left a comment

Uh oh!

acwhite211 commented May 6, 2026 •

edited

Loading

Uh oh!

CarolineDenis commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

acwhite211 commented May 4, 2026 • edited by CarolineDenis Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Testing instructions

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (2 warnings)

Uh oh!

acwhite211 commented May 4, 2026

Uh oh!

coderabbitai Bot commented May 4, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

acwhite211 commented May 4, 2026

Uh oh!

acwhite211 commented May 6, 2026

Uh oh!

bhumikaguptaa left a comment

Choose a reason for hiding this comment

Uh oh!

kwhuber left a comment

Choose a reason for hiding this comment

Uh oh!

acwhite211 commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CarolineDenis commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

acwhite211 commented May 4, 2026 •

edited by CarolineDenis

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading

acwhite211 commented May 6, 2026 •

edited

Loading