Skip to content

perf(dir-catalog): avoid unchanged manifest reloads#7234

Open
jackye1995 wants to merge 1 commit into
lance-format:mainfrom
jackye1995:jack/dir-namespace-manifest-cache
Open

perf(dir-catalog): avoid unchanged manifest reloads#7234
jackye1995 wants to merge 1 commit into
lance-format:mainfrom
jackye1995:jack/dir-namespace-manifest-cache

Conversation

@jackye1995

Copy link
Copy Markdown
Contributor

Summary:

  • add dataset stale/successor helpers and commit-handler version probes
  • use the successor probe to skip unchanged directory manifest reloads
  • cover V1/V2 successor checks, external-manifest fallback, and cross-namespace manifest refresh

@github-actions github-actions Bot added A-namespace Namespace impls bug Something isn't working labels Jun 11, 2026
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.53488% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-namespace-impls/src/dir/manifest.rs 58.33% 2 Missing and 3 partials ⚠️
rust/lance/src/dataset.rs 80.95% 2 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@jackye1995 jackye1995 force-pushed the jack/dir-namespace-manifest-cache branch from 02db606 to aa031e7 Compare June 11, 2026 19:24
@jackye1995

Copy link
Copy Markdown
Contributor Author

Benchmark comparison against #7176

Compared this PR's unchanged-read benchmark with the prior copy-on-write __manifest benchmark from #7176: #7176 (comment)

Important caveat: #7176 measured write-create-namespace commit throughput on EC2 c7i.48xlarge; this run measured warm-read-describe-table on EC2 c7i.12xlarge. So this is not an apples-to-apples replacement benchmark. It is useful as a reference because #7176 established the CoW write-side cost, while this PR targets unchanged read-side reload cost.

This PR: unchanged warm reads

Setup: S3 jack-devland-build, us-east-1, inline optimization enabled, 80 measured ops, 10 warmup ops.

__manifest entries upstream main ops/s this PR ops/s throughput delta upstream avg ms this PR avg ms avg latency delta upstream p50 / p99 ms this PR p50 / p99 ms
1,000 6.342 21.959 3.46x 125.83 38.80 -69.2% 87.59 / 339.00 37.00 / 84.40
10,000 14.690 18.634 1.27x 57.87 44.65 -22.8% 49.29 / 158.08 40.90 / 92.98
100,000 9.908 13.900 1.40x 87.72 63.65 -27.4% 75.90 / 215.51 46.96 / 123.09

All runs had 0 errors.

Reference from #7176: CoW write throughput

For the same row counts, #7176 reported continuous single-process write-create-namespace numbers:

__manifest entries #7176 inline write ops/s #7176 inline p50 / p99 ms #7176 no-index write ops/s #7176 no-index p50 / p99 ms this PR warm-read ops/s this PR warm-read p50 / p99 ms
1,000 2.04 442 / 990 3.55 271 / 394 21.959 37.00 / 84.40
10,000 1.50 639 / 1034 2.43 408 / 520 18.634 40.90 / 92.98
100,000 1.13 811 / 1722 2.06 478 / 637 13.900 46.96 / 123.09

Takeaway: #7176 shows CoW commits remain single-writer and roughly O(rows). This PR does not change write-side commit behavior. It improves the unchanged read path by avoiding a full latest-manifest reload when current_version + 1 does not exist. In this S3 run, that gives 1.27x-3.46x higher unchanged-read throughput versus rebased upstream main, and keeps read p99 substantially below the CoW write baseline for the overlapping sizes.

@jackye1995 jackye1995 changed the title fix: avoid unchanged manifest reloads perf(dir-catalog): avoid unchanged manifest reloads Jun 11, 2026
@LuQQiu

LuQQiu commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

One thing, what if there is cleanup happen??
like my cached version is old 41, and dataset migrates so quick and suddenly now the latest version is 50, 42 version is cleaned up????

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-namespace Namespace impls bug Something isn't working performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants