Skip to content

feat: add Lucene 9 index provider (oak-search-luceneNg)#2817

Open
bhabegger wants to merge 5 commits intoapache:OAK-12089from
bhabegger:oak-12089-lucene9-core
Open

feat: add Lucene 9 index provider (oak-search-luceneNg)#2817
bhabegger wants to merge 5 commits intoapache:OAK-12089from
bhabegger:oak-12089-lucene9-core

Conversation

@bhabegger
Copy link
Copy Markdown
Contributor

@bhabegger bhabegger commented Mar 26, 2026

Summary

Introduces oak-search-luceneNg, a new Oak module that provides a Lucene 9 based index engine. Indexes opt in explicitly via type=lucene9; all existing indexes are unaffected.

  • New oak-search-luceneNg module: index editor, query index, tracker, index node, storage (Oak JCR node-based directory), and OSGi wiring
  • Full query parity with the legacy lucene engine: property restrictions, path/type filters, fulltext, sorting, and excerpts
  • Facet parity for all three ACL modes: insecure, statistical (TapeSampling), and secure (per-document access check) — Lucene 9 API adaptations and null-safe MatchingDocs.bits handling
  • LuceneNgFacetCommonTest extends the shared FacetCommonTest suite for end-to-end JCR-level facet coverage
  • AbstractIndexComparisonTest inlined into oak-search test-jar; oak-search-test module removed
  • README documents feature parity vs legacy Lucene and Elastic

Introduces oak-search-luceneNg, a new Oak module providing a Lucene 9
based index engine under type=lucene9, with full parity to the legacy
lucene implementation for property queries, fulltext, sorting, excerpts,
and facets (insecure, statistical, and secure ACL modes).

Key changes:
- New oak-search-luceneNg module: index editor, query index, tracker,
  index node, storage, and OSGi wiring
- Facet parity: LuceneNgSecure/StatisticalSortedSetDocValuesFacetCounts
  ported to Lucene 9 APIs with null-safe MatchingDocs.bits handling
- LuceneNgFacetCommonTest extends FacetCommonTest for JCR-level coverage
- AbstractIndexComparisonTest inlined into oak-search test-jar;
  oak-search-test module removed
- getRootBuilder removed from ContextAwareCallback and IndexUpdate
- leaf OSGi property removed from LuceneIndexProviderService
- README documents feature parity vs legacy Lucene and Elastic

Made-with: Cursor
@bhabegger bhabegger force-pushed the oak-12089-lucene9-core branch from 28c304c to 66fb544 Compare March 27, 2026 16:59
@bhabegger bhabegger changed the title feat: add lucene9 index provider as safe opt-in target feat: add Lucene 9 index provider (oak-search-luceneNg) Mar 27, 2026
@bhabegger bhabegger marked this pull request as ready for review March 30, 2026 05:10
When indexNodeName=true, the index editor writes the namespace-stripped
local name of each node into FieldNames.NODE_NAME. The query engine maps
LOCALNAME() equality and LIKE restrictions to TermQuery/WildcardQuery on
that field.

Function restrictions prefixed with "function*@" (e.g. "function*@:localname")
are generated alongside the dedicated ":localname" restriction by Oak's SQL2
parser; they are now silently dropped from plan evaluation, cost calculation,
and the Lucene query to prevent false negatives.

Adds NodeNameCommonTest (shared) and LuceneNgNodeNameCommonTest.

Made-with: Cursor
…ADME

Address PR review comments from thomasmueller:
- Rename "Multi-index queries" to "Composite node store queries" and add a
  footnote explaining the composite node store scenario.
- Add a footnote for "Index augmentors" describing the IndexFieldProvider /
  FulltextQueryTermsProvider extension points.

Made-with: Cursor
Copy link
Copy Markdown
Member

@thomasmueller thomasmueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some bikeshedding: I wonder if we should use the term "Lucene 9" at all.

@@ -0,0 +1,26 @@
# oak-search-luceneNg

Lucene 9 index provider for Oak (`type="lucene9"`).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

Suggested change
Lucene 9 index provider for Oak (`type="lucene9"`).
Lucene NG index provider for Oak (`type="lucene-ng"`).

Hoping we can upgrade to Lucene 10 without having to change the type. I do assume the index storage version won't change, or that there is an option to add compatibility.

If not, and we do need to reindex for lucene 10, we can still add "lucene-ng-10" if that should be needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea is that the code should be less sensitive to Lucene upgrades. However, I wasn't really sure about having a fixed name and the in some future having to have a lucene-ng-ng ;) Here we could interpret it as lucene since 9 and still use that for lucene 10, 11 up to and imaginary lucene 12 that breaks compatibility again (which might never happen as likely the APIs are more stable now).

Personally, I fine with lucene-ng as I myself had doubts.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the NG in the code for now is fine since we do have 2 versions, but at some point my expectation is that the legacy code be removed and the NG no longer be new generation and be refactored with proper non NG names. So in the code I'm fine, in the type however, this sticks more. That's mostly what bother me with ng in the type.

Copy link
Copy Markdown
Contributor Author

@bhabegger bhabegger Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasmueller What about lucene2026 ? Less tied to 9 doesn't have explicit ng ?

Follows Maven/Oak convention of lowercase hyphenated artifact names.
The Java package (org.apache.jackrabbit.oak.plugins.index.luceneNg) is
unchanged as it is an internal implementation detail.

Made-with: Cursor
…etCommonTest

The three tests (basic faceting, multiple dimensions, facet with filter)
are all already exercised by FacetCommonTest via the JCR API.
LuceneNgFacetCommonTest runs that suite against Lucene 9 and is the
canonical coverage. The ignored class added no value.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants