feat: add Lucene 9 index provider (oak-search-luceneNg)#2817
feat: add Lucene 9 index provider (oak-search-luceneNg)#2817bhabegger wants to merge 5 commits intoapache:OAK-12089from
Conversation
e4bc5ad to
dbccf97
Compare
Introduces oak-search-luceneNg, a new Oak module providing a Lucene 9 based index engine under type=lucene9, with full parity to the legacy lucene implementation for property queries, fulltext, sorting, excerpts, and facets (insecure, statistical, and secure ACL modes). Key changes: - New oak-search-luceneNg module: index editor, query index, tracker, index node, storage, and OSGi wiring - Facet parity: LuceneNgSecure/StatisticalSortedSetDocValuesFacetCounts ported to Lucene 9 APIs with null-safe MatchingDocs.bits handling - LuceneNgFacetCommonTest extends FacetCommonTest for JCR-level coverage - AbstractIndexComparisonTest inlined into oak-search test-jar; oak-search-test module removed - getRootBuilder removed from ContextAwareCallback and IndexUpdate - leaf OSGi property removed from LuceneIndexProviderService - README documents feature parity vs legacy Lucene and Elastic Made-with: Cursor
28c304c to
66fb544
Compare
When indexNodeName=true, the index editor writes the namespace-stripped local name of each node into FieldNames.NODE_NAME. The query engine maps LOCALNAME() equality and LIKE restrictions to TermQuery/WildcardQuery on that field. Function restrictions prefixed with "function*@" (e.g. "function*@:localname") are generated alongside the dedicated ":localname" restriction by Oak's SQL2 parser; they are now silently dropped from plan evaluation, cost calculation, and the Lucene query to prevent false negatives. Adds NodeNameCommonTest (shared) and LuceneNgNodeNameCommonTest. Made-with: Cursor
…ADME Address PR review comments from thomasmueller: - Rename "Multi-index queries" to "Composite node store queries" and add a footnote explaining the composite node store scenario. - Add a footnote for "Index augmentors" describing the IndexFieldProvider / FulltextQueryTermsProvider extension points. Made-with: Cursor
thomasmueller
left a comment
There was a problem hiding this comment.
Just some bikeshedding: I wonder if we should use the term "Lucene 9" at all.
| @@ -0,0 +1,26 @@ | |||
| # oak-search-luceneNg | |||
|
|
|||
| Lucene 9 index provider for Oak (`type="lucene9"`). | |||
There was a problem hiding this comment.
What about:
| Lucene 9 index provider for Oak (`type="lucene9"`). | |
| Lucene NG index provider for Oak (`type="lucene-ng"`). |
Hoping we can upgrade to Lucene 10 without having to change the type. I do assume the index storage version won't change, or that there is an option to add compatibility.
If not, and we do need to reindex for lucene 10, we can still add "lucene-ng-10" if that should be needed.
There was a problem hiding this comment.
So the idea is that the code should be less sensitive to Lucene upgrades. However, I wasn't really sure about having a fixed name and the in some future having to have a lucene-ng-ng ;) Here we could interpret it as lucene since 9 and still use that for lucene 10, 11 up to and imaginary lucene 12 that breaks compatibility again (which might never happen as likely the APIs are more stable now).
Personally, I fine with lucene-ng as I myself had doubts.
There was a problem hiding this comment.
Also the NG in the code for now is fine since we do have 2 versions, but at some point my expectation is that the legacy code be removed and the NG no longer be new generation and be refactored with proper non NG names. So in the code I'm fine, in the type however, this sticks more. That's mostly what bother me with ng in the type.
There was a problem hiding this comment.
@thomasmueller What about lucene2026 ? Less tied to 9 doesn't have explicit ng ?
Follows Maven/Oak convention of lowercase hyphenated artifact names. The Java package (org.apache.jackrabbit.oak.plugins.index.luceneNg) is unchanged as it is an internal implementation detail. Made-with: Cursor
…etCommonTest The three tests (basic faceting, multiple dimensions, facet with filter) are all already exercised by FacetCommonTest via the JCR API. LuceneNgFacetCommonTest runs that suite against Lucene 9 and is the canonical coverage. The ignored class added no value. Made-with: Cursor
Summary
Introduces
oak-search-luceneNg, a new Oak module that provides a Lucene 9 based index engine. Indexes opt in explicitly viatype=lucene9; all existing indexes are unaffected.oak-search-luceneNgmodule: index editor, query index, tracker, index node, storage (Oak JCR node-based directory), and OSGi wiringluceneengine: property restrictions, path/type filters, fulltext, sorting, and excerptsMatchingDocs.bitshandlingLuceneNgFacetCommonTestextends the sharedFacetCommonTestsuite for end-to-end JCR-level facet coverageAbstractIndexComparisonTestinlined intooak-searchtest-jar;oak-search-testmodule removed