[#10445] improvement(lance): Upgrade lance-core to v2.0.1 and lance-namespace-core to v0.4.5#10637
[#10445] improvement(lance): Upgrade lance-core to v2.0.1 and lance-namespace-core to v0.4.5#10637bbiiaaoo wants to merge 3 commits intoapache:mainfrom
Conversation
Code Coverage Report
Files
|
…ance-namespace-core to v0.4.5 - Bump com.lancedb:lance-core to org.lance:lance-core v2.0.1 - Bump com.lancedb:lance-namespace-core to org.lance:lance-namespace-core v0.4.5 - Refactor all package imports from com.lancedb to org.lance - Adapt LanceTableOperations where ModeEnum was replaced by String - Fix corresponding unit and IT tests to adapt to the new APIs
There was a problem hiding this comment.
Pull request overview
Upgrades Gravitino’s Lance integration to the new org.lance artifact coordinates/versions and adapts REST server + common operations to upstream breaking API/protocol changes (models, builders, mode handling, error mapping), along with doc/test updates.
Changes:
- Upgraded Lance dependencies to
org.lance:lance-core:2.0.1andorg.lance:lance-namespace-core:0.4.5, migrating imports/packages fromcom.lancedb.*toorg.lance.*. - Updated Lance REST server endpoints and exception mapping to the new namespace SDK semantics (string modes, new error codes, additional root routes).
- Added/updated compatibility utilities and refactored tests/integration tests and docs to match the new client/server behavior.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| lance/lance-rest-server/src/test/java/org/apache/gravitino/lance/service/rest/TestLanceNamespaceOperations.java | Updates REST resource tests for new Lance models/error codes/mode strings. |
| lance/lance-rest-server/src/test/java/org/apache/gravitino/lance/service/rest/TestGravitinoLanceTableOperations.java | Updates unit tests for alter/drop column request model changes. |
| lance/lance-rest-server/src/test/java/org/apache/gravitino/lance/integration/test/LanceRESTServiceIT.java | Refactors IT to new org.lance client APIs and avoids reflection for TableApi. |
| lance/lance-rest-server/src/main/java/org/apache/gravitino/lance/service/rest/LanceTableOperations.java | Adapts table REST endpoints to new SDK (string modes, new models, error behavior). |
| lance/lance-rest-server/src/main/java/org/apache/gravitino/lance/service/rest/LanceNamespaceOperations.java | Adapts namespace REST endpoints and adds “root” routes (/list, /describe). |
| lance/lance-rest-server/src/main/java/org/apache/gravitino/lance/service/LanceExceptionMapper.java | Remaps Gravitino exceptions to org.lance exception hierarchy + HTTP status mapping. |
| lance/lance-rest-server/build.gradle.kts | Updates excludes for new groupId and adds test dependency as needed. |
| lance/lance-common/src/main/java/org/apache/gravitino/lance/common/utils/SerializationUtils.java | Reimplements header properties JSON parsing using Gravitino’s Jackson utilities. |
| lance/lance-common/src/main/java/org/apache/gravitino/lance/common/ops/LanceTableOperations.java | Updates common table ops interface to new SDK types/signatures (mode as string). |
| lance/lance-common/src/main/java/org/apache/gravitino/lance/common/ops/LanceNamespaceOperations.java | Updates common namespace ops interface to new SDK types/signatures (mode/behavior as string). |
| lance/lance-common/src/main/java/org/apache/gravitino/lance/common/ops/gravitino/PageUtil.java | Adds internal pagination helper for list APIs. |
| lance/lance-common/src/main/java/org/apache/gravitino/lance/common/ops/gravitino/ObjectIdentifier.java | Adds internal identifier parsing helper compatible with new SDK behavior. |
| lance/lance-common/src/main/java/org/apache/gravitino/lance/common/ops/gravitino/JsonArrowSchemaConverter.java | Adds Arrow-schema-to-JSON conversion for new JsonArrowSchema models. |
| lance/lance-common/src/main/java/org/apache/gravitino/lance/common/ops/gravitino/GravitinoLanceTableOperations.java | Updates Gravitino-backed table ops implementation for new models/error types/mode normalization. |
| lance/lance-common/src/main/java/org/apache/gravitino/lance/common/ops/gravitino/GravitinoLanceTableAlterHandler.java | Updates alter-columns handling for AlterColumnsEntry model changes. |
| lance/lance-common/src/main/java/org/apache/gravitino/lance/common/ops/gravitino/GravitinoLanceNamespaceWrapper.java | Updates exception mapping when validating/loading catalogs with new SDK exceptions. |
| lance/lance-common/src/main/java/org/apache/gravitino/lance/common/ops/gravitino/GravitinoLanceNameSpaceOperations.java | Updates namespace operations + adds mode/behavior parsing compatible with string inputs. |
| lance/lance-common/src/main/java/org/apache/gravitino/lance/common/ops/gravitino/CommonUtil.java | Adds internal stacktrace formatting utility (replacement for upstream util). |
| lance/lance-common/build.gradle.kts | Updates excludes for new groupId and adds Lance core dependency. |
| gradle/libs.versions.toml | Bumps Lance/Lance-namespace versions and switches to org.lance coordinates. |
| docs/lance-rest-service.md | Updates docs/examples for new dependency coordinates and request fields. |
| core/src/main/java/org/apache/gravitino/stats/storage/LancePartitionStatisticStorage.java | Migrates to new org.lance dataset/fragment builder APIs. |
| core/build.gradle.kts | Updates dependency exclusion to new groupId. |
| catalogs/catalog-lakehouse-generic/src/test/java/org/apache/gravitino/catalog/lakehouse/lance/TestLanceTableOperations.java | Updates unit tests for new index API (IndexOptions). |
| catalogs/catalog-lakehouse-generic/src/test/java/org/apache/gravitino/catalog/lakehouse/lance/integration/test/CatalogGenericCatalogLanceIT.java | Updates ITs for new dataset open/write APIs. |
| catalogs/catalog-lakehouse-generic/src/main/java/org/apache/gravitino/catalog/lakehouse/lance/LanceTableOperations.java | Migrates table ops to new Lance dataset/index APIs. |
Comments suppressed due to low confidence (3)
docs/lance-rest-service.md:286
- The create-empty-table curl example still sends
propertiesin the JSON body, but the server implementation now reads table properties only from thex-lance-table-propertiesheader for/create-empty. Update the documentation to match the actual API behavior, or (preferably) keep bodypropertiessupported for backward compatibility.
# Create a new empty table
curl -X POST http://localhost:9101/lance/v1/table/lance_catalog%24schema%24table02/create-empty \
-H 'Content-Type: application/json' \
-d '{
"id": ["lance_catalog", "schema", "table02"],
"location": "/tmp/lance_catalog/schema/table02",
"properties": { "description": "This is table02" }
}'
docs/lance-rest-service.md:354
- In the Java example,
CreateTableRequestsetsidandmodebut does not set a table location (or any equivalent). Since the REST server requires the location (via header or request field depending on client), this snippet is likely to fail as written. Please update the example to include the location in the supported way for the new SDK.
// Create a table with schema inferred from Arrow IPC file
CreateTableRequest createTableRequest = new CreateTableRequest();
createTableRequest.setId(Lists.newArrayList("lance_catalog", "schema", "table03"));
createTableRequest.setMode("create");
org.apache.arrow.vector.types.pojo.Schema schema =
new org.apache.arrow.vector.types.pojo.Schema(
Arrays.asList(
Field.nullable("id", new ArrowType.Int(32, true)),
Field.nullable("value", new ArrowType.Utf8())));
byte[] body = ArrowUtils.generateIpcStream(schema);
ns.createTable(createTableRequest, body);
docs/lance-rest-service.md:397
- In the Python example,
CreateTableRequestsetsidandmodebut omitslocation. Unless the new client implicitly supplies location another way, this example will fail to create a table. Update the snippet to include the required location (or document the correct mechanism for supplying it).
# Create a table with schema inferred from Arrow IPC file
create_table_request = ln.CreateTableRequest(
id=['lance_catalog', 'schema', 'table03'],
mode='create'
)
with open('schema.ipc', 'rb') as f:
body = f.read()
ns.create_table(create_table_request, body)
|
@bbiiaaoo could you fix the CI and address the review comment to make this PR ready for review? |
261bb70 to
5f1ac2b
Compare
|
@bbiiaaoo |
…0.4.5 and polish validations/tests
- use explicit root namespace id ("") for /v1/namespace/list and /v1/namespace/describe
- keep create-empty backward compatibility: accept body properties and merge with header properties (header wins)
- clarify alter_columns validation message when rename is missing
- update Lance REST Java doc example to use "uri"/"delimiter" connection keys
- catch JsonProcessingException (instead of generic Exception) in SerializationUtils
- add TestSerializationUtils for valid/blank/invalid JSON cases
- replace FQN TableNotFoundException usages in TestLanceNamespaceOperations with import + simple name
- extend REST tests for root namespace endpoints and create-empty location/properties behaviors
| # Create a new empty table | ||
| curl -X POST http://localhost:9101/lance/v1/table/lance_catalog%24schema%24table02/create-empty \ | ||
| -H 'Content-Type: application/json' \ | ||
| -H "x-lance-table-properties: {\"description\":\"This is table02\"}" \ |
There was a problem hiding this comment.
Is the header x-lance-table-properties optional or required?
There was a problem hiding this comment.
x-lance-table-properties is optional for create-empty. If omitted, it defaults to an empty map. Request-body properties are still accepted for backward compatibility (header wins on key conflicts). I will clarify this in the documentation in the next commit.
|
|
||
| private CommonUtil() {} | ||
|
|
||
| static String formatCurrentStackTrace() { |
There was a problem hiding this comment.
What is the method used for? Can Throwables.getStackTraceAsString() meet your needs?
There was a problem hiding this comment.
formatCurrentStackTrace() is used to populate the Lance exception detail field. Good point on utility reuse; I will switch it to Throwables.getStackTraceAsString(...) in the next commit.
| private final GravitinoLanceNamespaceWrapper namespaceWrapper; | ||
| private final GravitinoClient client; | ||
|
|
||
| private enum CreateMode { |
There was a problem hiding this comment.
Why do we need to add those three enums? Can't we use enums in the Lance API?
There was a problem hiding this comment.
In lance-namespace 0.4.5, mode/behavior are now plain strings in generated request models (no public enums to reuse). We keep local enums only as normalized internal states for type-safe branching after parsing input strings.
| @PathParam("id") String tableId, | ||
| @QueryParam("delimiter") @DefaultValue(NAMESPACE_DELIMITER_DEFAULT) String delimiter, | ||
| CreateEmptyTableRequest request, | ||
| Map<String, Object> requestBody, |
There was a problem hiding this comment.
Why is the value type of requestBody Object?
There was a problem hiding this comment.
Good question. We intentionally use Map<String, Object> here for backward compatibility.
In lance-namespace 0.4.5, CreateEmptyTableRequest does not include a properties field, but we still accept legacy request-body properties from existing clients. Using Object allows us to safely normalize non-string JSON values (for example numbers/booleans) into strings before passing them downstream.
| * @return the response of the create table operation | ||
| */ | ||
| @SuppressWarnings("deprecation") | ||
| CreateEmptyTableResponse createEmptyTable( |
There was a problem hiding this comment.
I recalled that createEmptyTable has been replaced with declareTable, so will you plan to support it?
There was a problem hiding this comment.
Thanks for the reminder, you are absolutely right.
createEmptyTable is deprecated in lance-namespace 0.4.5, and declareTable is the preferred API now. We plan to add declareTable support in the next commit, while keeping create-empty for backward compatibility with existing clients.
| import org.lance.namespace.model.JsonArrowSchema; | ||
|
|
||
| /** Converts Arrow schema to Lance Namespace JsonArrowSchema model. */ | ||
| class JsonArrowSchemaConverter { |
There was a problem hiding this comment.
Is this class directly copied from Lance repo?
There was a problem hiding this comment.
It is not directly copied from the current Lance repo as-is.
During the upgrade, the old utility class we previously relied on was no longer available from the new Java artifacts, so we added a local converter in Gravitino to preserve compatible JsonArrowSchema behavior for this code path.
| return Stream.concat(setPropertiesStream, removePropertiesStream).toArray(arrayCreator); | ||
| } | ||
|
|
||
| private static CreateMode parseCreateMode(String instance, String mode) { |
There was a problem hiding this comment.
This one seems to be duplicated with normalizeCreateMode. Can you try to optimize it?
There was a problem hiding this comment.
Good catch. There is duplicated normalization logic between namespace parseCreateMode and table normalizeCreateMode.
They are not fully identical because namespace mode is converted to an internal enum, while table mode is normalized to a string property passed downstream (and register mode also accepts REGISTER as an alias).
I’ll optimize this by extracting the shared token-normalization logic and reusing it in both paths, while keeping the behavior-specific mappings unchanged.
… mode normalization
- add `declareTable` to `LanceTableOperations` and implement it in
`GravitinoLanceTableOperations` via metadata-only create flow
- add REST endpoint `POST /v1/table/{id}/declare` with request handling and validation
- deprecate `createEmptyTable` in common ops API and document `DeclareTable` as preferred
- update docs:
- add `DeclareTable` endpoint entry and usage example
- clarify `x-lance-table-properties` is optional for `create-empty`
- extract shared token normalization to `CommonUtil.normalizeToken(...)`
and reuse it in namespace/table mode parsing to remove duplication
- replace manual stacktrace building with
`Throwables.getStackTraceAsString(...)`
- extend REST unit tests and integration tests for declare-table behavior
| | DeregisterTable | Unregister a table from a namespace (metadata only, data remains) | POST | `/lance/v1/table/{id}/deregister` | 1.1.0 | | ||
| | CreateEmptyTable | Declare a table and store the metadata without touching lance table data, for more, please refer to [doc](https://docs.lancedb.com/api-reference/rest/table/create-an-empty-table) | POST | `/lance/v1/table/{id}/create-empty` | 1.1.0 | | ||
| | CreateEmptyTable | **Deprecated**: Use `DeclareTable` instead. Declare a table and store the metadata without touching lance table data, for more, please refer to [doc](https://docs.lancedb.com/api-reference/rest/table/create-an-empty-table) | POST | `/lance/v1/table/{id}/create-empty` | 1.1.0 | | ||
| | DeclareTable | Declare a table and store the metadata without touching lance table data. This is the preferred replacement for `CreateEmptyTable`. | POST | `/lance/v1/table/{id}/declare` | 1.1.0 | |
There was a problem hiding this comment.
The version should be 1.3.0
|
| return DropMode.FAIL; | ||
| } | ||
| String normalized = CommonUtil.normalizeToken(mode); | ||
| if ("FAIL".equals(normalized)) { |
There was a problem hiding this comment.
Could u use DropMode.valueOf(normalized)?
What changes were proposed in this pull request?
org.lance:lance-core:2.0.1org.lance:lance-namespace-core:0.4.5com.lancedb.*toorg.lance.*.Enum->String) and updated builder-style APIs.ObjectIdentifier,PageUtil,CommonUtil,JsonArrowSchemaConverter) to align with the new Lance namespace SDK behavior.Why are the changes needed?
com.lancedbtoorg.lanceand introduced breaking API changes in newer versions.Fix: #10445
Does this PR introduce any user-facing change?
org.lancedependencies and the new request style (for example, lowercase mode values such ascreate).How was this patch tested?
catalog-lakehouse-generic,lance-common, andlance-rest-server.curl -> Gravitinorequests, covering Lance namespace and table workflows on the upgraded dependencies.