Skip to content

Commit 9164ba0

Browse files
feat(backend/kernel): add use_kernel=True flag — route through the Rust kernel via PyO3 (#787)
* feat(backend/kernel): route use_sea=True through the Rust kernel Phase 2 of the PySQL × kernel integration plan (databricks-sql-kernel/docs/designs/pysql-kernel-integration.md). Wires `use_sea=True` to a new `backend/kernel/` module that delegates to the Rust kernel via the `databricks_sql_kernel` PyO3 extension (kernel PR #13). New module: `src/databricks/sql/backend/kernel/` - `client.py` — `KernelDatabricksClient(DatabricksClient)`. Lazy- imports `databricks_sql_kernel` so a connector install without the kernel wheel doesn't `ImportError` at startup; only `use_sea=True` surfaces the missing-extra message. Implements open/close_session, sync + async execute_command (async_op=True goes through `Statement.submit()` and stashes the handle in a dict keyed on `CommandId`), cancel/close_command, get_query_state, get_execution_result, and the metadata calls (catalogs / schemas / tables / columns) via `Session.metadata().list_*`. Real server-issued session and statement IDs flow through (no synthetic UUIDs). - `auth_bridge.py` — translate the connector's `AuthProvider` into kernel `Session` kwargs. PAT (including federation-wrapped PAT — `get_python_sql_connector_auth_provider` always wraps the base in `TokenFederationProvider`, so a naive isinstance check never matches) routes through `auth_type="pat"`. Everything else routes through `auth_type="external"` with a callback that delegates to `auth_provider.add_headers({})`. (External today is rejected by the kernel at `build_auth_provider`; the separate kernel-side enablement PR will flip it on.) - `result_set.py` — `KernelResultSet(ResultSet)`. Duck-typed over `databricks_sql_kernel.ExecutedStatement` (sync execute) and `ResultStream` (metadata + async await_result) since both expose `arrow_schema()` / `fetch_next_batch()` / `fetch_all_arrow()` / `close()`. Same FIFO batch buffer the prior ADBC POC used, so `fetchmany(n)` for n smaller than the kernel's natural batch size doesn't re-fetch. - `type_mapping.py` — Arrow → PEP 249 description-string mapper. Lifted from the prior ADBC POC; centralised here so future kernel-result wrappers reuse the same mapping. Kernel errors → PEP 249 exceptions: `KernelError.code` is mapped in a single table to `ProgrammingError` / `OperationalError` / `DatabaseError`. The structured fields (`sql_state`, `error_code`, `query_id`, …) are copied onto the re-raised exception so callers can branch on them without reaching through `__cause__`. Routing: `Session._create_backend` flips the `use_sea=True` branch to instantiate `KernelDatabricksClient` instead of the native `SeaDatabricksClient`. The native `backend/sea/` module is left in place (no users on `use_sea=True` after this PR; its long- term fate is out of scope here). Packaging: `[tool.poetry.extras] kernel = ["databricks-sql-kernel"]`. `pip install 'databricks-sql-connector[kernel]'` pulls in the kernel wheel; `use_sea=True` without the extra raises a pointed ImportError telling the user how to install it. Known gaps (acknowledged, will be follow-ups): - Parameter binding (`execute_command(parameters=[...])`) raises NotSupportedError — PyO3 `Statement.bind_param` lands in a follow-up. - Statement-level `query_tags` raises NotSupportedError. - `get_tables(table_types=[...])` returns unfiltered rows (the native SEA backend's filter is keyed on `SeaResultSet`; needs a small port to operate on `KernelResultSet`). - External-auth end-to-end blocked on the kernel-side `AuthConfig::External` enablement PR. - Volume PUT/GET (staging operations): kernel has no Volume API. Test plan: - Unit: 37 new tests across `tests/unit/test_kernel_auth_bridge.py` (auth provider → kwargs mapping, including federation-wrapped PAT and the External trampoline call-counter check), `tests/unit/test_kernel_type_mapping.py` (Arrow type mapping + description shape), and `tests/unit/test_kernel_result_set.py` (buffer semantics, fetchmany across batch boundaries, idempotent close, close() swallowing handle-close failures). All pass. - Full unit suite: 600 pre-existing tests still pass; one pre-existing failure (`test_useragent_header` — agent detection adds `agent/claude-code` in this env) was already failing on main, unrelated to this change. - Live e2e against dogfood with `use_sea=True`: SELECT 1, `range(10000)`, `fetchmany` pacing, `fetchall_arrow`, all four metadata calls (returned 75 catalogs / 144 schemas in main / 47 tables in `system.information_schema` / 15 columns), `session_configuration={'ANSI_MODE': 'false'}` round-trips, bad SQL surfaces as DatabaseError with `code='SqlError'` and `sql_state='42P01'` on the exception. All checks pass. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * refactor(backend/kernel): PAT-only auth, drop External trampoline The earlier auth_bridge routed OAuth/MSAL/federation through the kernel's External token-provider trampoline (a Python callable the kernel invoked per HTTP request). Removing that for now. Why: routing OAuth into the kernel inherently requires per-request token resolution to keep refresh working during a long-running session. Two viable mechanisms (kernel-native OAuth, or the External callback); both have costs (duplicate OAuth flows vs GIL-per-request). Punting the decision until there's actual demand on use_sea=True. Today: the bridge accepts PAT (including TokenFederationProvider- wrapped PAT, which is how `get_python_sql_connector_auth_provider` always shapes it). Any non-PAT auth_provider raises a clear NotSupportedError pointing the user at use_sea=False (Thrift). This shrinks the auth_bridge to ~50 lines and means the kernel- side External enablement PR is no longer on the connector's critical path — there's no kernel-side prerequisite for shipping use_sea=True for PAT users. Unit tests updated: - TokenFederationProvider-wrapped PAT still routes to PAT (kept). - Generic OAuth provider raises NotSupportedError (new). - ExternalAuthProvider raises NotSupportedError (new). - Silent non-PAT provider raises NotSupportedError (new) — reject the type itself rather than trying to extract a token we already know we can't use. Live e2e against dogfood with use_sea=True (PAT): all checks still pass (SELECT 1, range(10000), fetchmany pacing, four metadata calls, session_configuration round-trip, structured DatabaseError on bad SQL). Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * test(e2e): live kernel-backend (use_sea=True) suite Moves the previously-ad-hoc /tmp/connector_smoke.py into the repo as a real pytest module under tests/e2e/ — same convention as the rest of the e2e suite. Uses the existing session-scoped `connection_details` fixture from the top-level conftest so it shares the credential surface with every other live test. 11 tests cover: - connect() with use_sea=True opens a session. - SELECT 1: rows + description shape (column name + dbapi type slug). - SELECT * FROM range(10000): multi-batch drain. - fetchmany() pacing across the buffer boundary. - fetchall_arrow() returns a pyarrow Table. - All four metadata methods (catalogs / schemas / tables / columns). - session_configuration={'ANSI_MODE': 'false'} round-trips. - Bad SQL surfaces as DatabaseError with `code='SqlError'` and `sql_state='42P01'` attached as exception attributes. Module-level skips: - `databricks_sql_kernel` not importable → whole module skipped via pytest.importorskip (the wheel hasn't been installed). - Live creds missing → fixture-level skip with a pointed message. Run: `pytest tests/e2e/test_kernel_backend.py -v`. All 11 pass against dogfood in ~20s. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * fix(backend/kernel): defer databricks-sql-kernel poetry dep declaration CI is failing across all jobs at \`poetry lock\` time: Because databricks-sql-connector depends on databricks-sql-kernel (^0.1.0) which doesn't match any versions, version solving failed. The kernel wheel isn't yet published to PyPI — we verified the name is available via the Databricks proxy, but the package itself hasn't been built and uploaded yet. Declaring it as a poetry dep (even an optional one inside an extra) requires the version to be resolvable, and \`poetry lock\` runs as the setup step for every CI job: unit tests, linting, type checks, all of them. Fix: drop the \`databricks-sql-kernel\` dep declaration and the \`[kernel]\` extra from pyproject.toml until the wheel is on PyPI. The lazy import in \`backend/kernel/client.py\` still raises a clear ImportError pointing at \`pip install databricks-sql-kernel\` (or local maturin) when use_sea=True is invoked without the kernel present. When the kernel is published, a small follow-up will add back: databricks-sql-kernel = {version = "^0.1.0", optional = true} [tool.poetry.extras] kernel = ["databricks-sql-kernel"] A pointed comment in pyproject.toml documents the deferred change. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * fix(backend/kernel): unit tests skip without pyarrow, mypy + black Three CI failures after the poetry-lock fix uncovered three real issues: 1. pyarrow is optional in the connector. The default-deps CI test job installs without it; the +PyArrow job installs with. The kernel backend's result_set.py + type_mapping.py import pyarrow eagerly (the kernel always returns pyarrow), and the unit tests import the backend at collection time — which crashes the default-deps job at ModuleNotFoundError. Fix: gate the three kernel unit tests on `pytest.importorskip( "pyarrow")` so they skip on default-deps and run on +PyArrow. Verified locally: 39 pass with pyarrow, 3 skipped without. No change to the backend module itself — nothing imports it until use_sea=True is invoked, and pyarrow is on the kernel wheel's runtime dep list so use_sea=True can't hit this either. 2. mypy: KernelDatabricksClient.open_session returns self._session_id, which mypy types as Optional[SessionId] because the field starts as None. Fix: bind the new id to a local non-Optional variable, assign to the field, return the local. CI's check-types runs cleanly on backend/kernel/ now; pre-existing mypy noise elsewhere isn't mine. 3. black --check: black 22.12.0 (the version CI pins) wants reformatting on result_set.py / type_mapping.py / client.py. Applied. Verified locally with the same black version. All 39 kernel unit tests + 619 pre-existing unit tests pass. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * fix(backend/kernel): make package importable without the kernel wheel The +PyArrow CI matrix installs pyarrow but not the databricks-sql-kernel wheel (the wheel isn't on PyPI yet, and the [kernel] extra is deferred — see commit 31ca581). The previous fix gated unit tests on `pytest.importorskip("pyarrow")` but test_kernel_auth_bridge.py was still pulled into a kernel-wheel ImportError because: src/databricks/sql/backend/kernel/__init__.py -> from databricks.sql.backend.kernel.client import KernelDatabricksClient -> import databricks_sql_kernel # ImportError on +PyArrow CI The eager re-export from `__init__.py` was a convenience that broke every consumer that only needed a submodule (type_mapping, result_set, auth_bridge) — they all triggered the kernel wheel import for no reason. Fix: - Drop the eager re-export from `kernel/__init__.py`. Comment documents why and points callers (= session.py::_create_backend, already this shape) at the direct `from .client import ...`. - Drop the no-longer-needed `pytest.importorskip("pyarrow")` / `importorskip("databricks_sql_kernel")` from test_kernel_auth_bridge.py — auth_bridge.py itself has neither dep, so the test now runs on every CI matrix variant. - test_kernel_result_set.py and test_kernel_type_mapping.py keep the pyarrow importorskip because they themselves use pyarrow. Verified locally across the three matrix shapes: - both pyarrow + kernel installed: 39 pass. - pyarrow only (no kernel wheel — the +PyArrow CI shape): 39 pass. - neither: 9 pass (auth_bridge only), 2 modules skip (the others use pyarrow). Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * test(e2e): skip use_sea=True parametrized cases when kernel wheel missing The connector's coverage CI job runs the full e2e suite, several of whose test classes parametrize ``extra_params`` over ``{}`` and ``{"use_sea": True}``. With ``use_sea=True`` now routing through the Rust kernel via PyO3, those cases die at ``connect()`` with our pointed ImportError because the ``databricks-sql-kernel`` wheel isn't yet on PyPI — and that CI job (sensibly) doesn't try to build it from a sibling repo. Fix: ``pytest_collection_modifyitems`` hook in the top-level ``conftest.py`` that adds a ``skip`` marker to any parametrize case with ``extra_params={"use_sea": True, ...}`` when ``importlib.util.find_spec("databricks_sql_kernel")`` returns ``None``. Behavior change is CI-only — local dev with the kernel wheel installed (via ``maturin develop`` from the kernel repo) runs those cases as before. Once the kernel wheel is published, the [kernel] extra in pyproject.toml gets enabled (see comment block there) and the default-deps CI matrix will install it; the skip then becomes a no-op. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * refactor(backend/kernel): address review feedback — mechanical fixes Cleanup pass on the kernel-backend PR addressing reviewer feedback that doesn't change observable behaviour: - result_set.py: replace O(M²) `_buffered_rows` with running counter `_buffered_count` maintained by pull/take/drain (perf F6). - result_set.py: docstring corrections — drop nonexistent `fetch_all_arrow` from kernel-handle contract (F20); document `buffer_size_bytes` as no-op on the kernel backend (F21). - client.py: tighten `_reraise_kernel_error` signature to `_kernel.KernelError` only; drop dead passthrough branch and the defensive setattr try/except (F17). - client.py: drop unused `_use_arrow_native_complex_types` kwarg (F18). - client.py: collapse three `KernelResultSet(...)` construction sites through `_make_result_set` (renamed from `_metadata_result`) (F19). - client.py: drop `metadata-` prefix from synthetic CommandId; use a plain `uuid.uuid4().hex` so anything reading `cursor.query_id` downstream sees a UUID-shaped string (F14). - client.py: clear the raw access token from `_auth_kwargs` after the kernel session is constructed — kernel owns the credential from then on, no need to retain a cleartext copy on the connector instance (F24). - auth_bridge.py: reject bearer tokens containing ASCII control characters at extraction time (defense-in-depth against header injection if a misbehaving HTTP stack ever places the token back into a header without scrubbing) (F25). - tests/unit/test_kernel_auth_bridge.py: construct a real `TokenFederationProvider(http_client=Mock())` instead of bypassing `__init__` with `__new__` + monkey-patching `add_headers`. Exercises the real federation passthrough path the bridge sees in production (F12). Drop unused `MagicMock` import (F27). - tests/e2e/test_kernel_backend.py: drop misleading CloudFetch claim on `test_drain_large_range_to_arrow` — 10000 BIGINT rows is ~80 KB, single inline chunk on a typical warehouse (F26). All 39 existing kernel unit tests pass. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * feat(backend/kernel): introduce dedicated use_kernel flag + substantive review fixes Major change: route the kernel backend through a new ``use_kernel=True`` connection kwarg instead of repurposing ``use_sea=True``. ``use_sea=True`` once again routes to the native pure-Python SEA backend (no behaviour change); ``use_kernel=True`` routes to the Rust kernel via PyO3. The two flags are mutually exclusive. This addresses the largest reviewer concern from the multi-agent review: silently hijacking a documented public flag broke OAuth / federation / parameter-binding callers on ``use_sea=True`` who had no opt-out. With the new flag, the kernel backend is fully opt-in and existing ``use_sea=True`` users continue to get the native SEA backend they signed up for. Other substantive fixes: - session.py: restore ``SeaDatabricksClient`` import + routing. Reject ``use_kernel=True`` + ``use_sea=True`` together with a clear ``ValueError``. - client.py (kernel ``Cursor.columns``): update docstring to flag the ``catalog_name=None`` divergence — kernel requires a catalog, Thrift / native SEA do not (F13). - conftest.py: drop the collection-time ``pytest_collection_modifyitems`` hook that was skipping ``extra_params={"use_sea": True}`` cases. With ``use_sea=True`` back on the native SEA backend, those cases run as they did before this PR (F8). - kernel/client.py: ``get_tables`` now applies the ``table_types`` filter client-side using ``ResultSetFilter._filter_arrow_table`` (the same helper the native SEA backend uses), wrapped in a tiny ``_StaticArrowHandle`` that flows the filtered table back through the normal ``KernelResultSet`` path. Replaces the previous "log a warning and return unfiltered" behaviour (F4). - kernel/client.py: guard ``_async_handles`` with ``threading.RLock`` so concurrent cursors on the same connection don't race on submit / close / close-session (F15). - kernel/result_set.py: ``KernelResultSet.close()`` now drops the entry from ``backend._async_handles`` so async-submitted statements don't leave stale references behind (F5). - kernel/{__init__,client,auth_bridge}.py, tests/e2e/test_kernel_backend.py: update docstrings, error messages, and the e2e fixture to refer to ``use_kernel=True`` instead of ``use_sea=True``. - client.py (``Connection`` docstring): document the new ``use_kernel`` kwarg + its Phase-1 limitations. New tests: - tests/unit/test_kernel_client.py (38 cases): cover the 14-entry ``_CODE_TO_EXCEPTION`` table, ``_reraise_kernel_error`` attribute forwarding, the 6-entry ``_STATE_TO_COMMAND_STATE`` table, the no-open-session guards on every method, ``open_session`` double-open, ``parameters`` / ``query_tags`` rejection, ``get_columns``' catalog-required check, ``cancel_command`` / ``close_command`` no-handle tolerance, ``get_query_state`` sync-path SUCCEEDED, the Failed-state re-raise, the synthetic-command-id UUID shape, and ``close_session`` cleanup even when per-handle close errors fire. Uses a fake ``databricks_sql_kernel`` module installed into ``sys.modules`` so the test runs with no Rust extension dependency (F9). 77/77 kernel unit tests pass. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * fix(backend/kernel): CI-greening — mypy + e2e module skip - src/databricks/sql/backend/kernel/result_set.py: fix the 3 mypy errors at L237/239/241 by casting ``self.backend`` to ``KernelDatabricksClient`` (the base ``DatabricksClient`` doesn't declare ``_async_handles`` / ``_async_handles_lock``). Folds in gopalldb's nit (3249904284) — replace the explicit ``acquire()/try/finally/release()`` with a ``with`` block to match the rest of the file. - tests/e2e/test_kernel_backend.py: harden the module-level skip so the suite doesn't run when the kernel wheel is absent in CI. The unit suite installs a fake ``databricks_sql_kernel`` ``ModuleType`` into ``sys.modules`` so the connector's import-time ``import databricks_sql_kernel`` succeeds without the Rust extension; that fake leaks across into the same pytest session and ``pytest.importorskip`` happily returns it. A real wheel exposes ``__file__`` (compiled extension on disk); the fake does not. Skip the module when ``__file__`` is missing. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * fix(backend/kernel): address gopalldb minor review comments (m1, m4) m1 — install hint (comment 3249904266): The ``databricks-sql-kernel`` wheel is not yet published on PyPI; ``pip install databricks-sql-kernel`` either finds nothing or pulls a squatted package. Drop the misleading hint from ``ImportError`` and from the ``use_kernel`` docstring on ``databricks.sql.connect``; point users at the ``maturin develop --release`` dev path until the wheel ships. m4a — auth_bridge ValueError → ProgrammingError (comment 3249904276): Two sites in ``_extract_bearer_token`` / ``kernel_auth_kwargs`` were raising bare ``ValueError`` for caller-misuse cases (control chars in the token, PAT provider that produced no Authorization header). The rest of the kernel-backend error surface uses PEP 249 exception types — code paths that catch ``DatabaseError`` / ``ProgrammingError`` would miss these. Convert to ``ProgrammingError`` and update the unit test. m4b — description null_ok (comment 3249904282): ``description_from_arrow_schema`` was hardcoding the 7th tuple element to ``None`` even though ``pyarrow.Field.nullable`` is available. PEP 249 §Cursor.description defines ``null_ok`` as "True if NULL values are allowed"; callers branching on it would have lost useful information the kernel already provides. Now emits ``field.nullable``; added a unit test covering both nullable and non-nullable fields; updated the two existing tests that asserted the old all-``None`` shape. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * fix(backend/kernel): substantive review fixes — M1, M2, M3, m2, m3 Addresses gopalldb's four major / two minor remaining review comments. The shared error-mapping primitives move to a new ``_errors.py`` module so both ``client.py`` and ``result_set.py`` can use them without ``result_set.py`` importing from ``client.py``. M1 — async handle leak in get_execution_result (3249904251): ``ResultStream`` from ``await_result()`` is wrapped in ``KernelResultSet``; the underlying ``ExecutedAsyncStatement`` has no further role once the stream is in hand. Close it immediately, drop the entry from ``_async_handles``, and add the guid to ``_closed_commands``. A failed ``async_exec.close()`` is logged but doesn't break the result-set return — the kernel's Drop impl reaps server-side state. M2 — PyO3 native exceptions wrapped as OperationalError (3249904255): Added ``kernel_call(what)`` context manager (in the new ``_errors.py``). ``KernelError`` flows through ``reraise_kernel_error`` as before; anything else (``TypeError`` / ``OverflowError`` / ``ValueError`` from PyO3 argument conversion, extension-internal errors) is wrapped in ``OperationalError`` so DB-API callers only ever see PEP 249 exception types. Applied at every PyO3 call site: ``open_session``, ``execute_command``, ``cancel_command``, ``close_command``, ``get_query_state``, ``get_execution_result``, ``get_catalogs`` / ``get_schemas`` / ``get_tables`` / ``get_columns``, plus ``fetch_next_batch`` / ``arrow_schema`` in ``KernelResultSet``. M3 — KernelError during result-set construction (3249904259): ``KernelResultSet.__init__`` calls ``kernel_handle.arrow_schema()`` which can itself raise. Every call to ``_make_result_set`` is now inside a ``kernel_call`` scope so the schema-fetch error becomes a mapped PEP 249 exception instead of leaking raw ``KernelError``. m3 — get_query_state of a closed async command (3249904273): Added ``_closed_commands: Set[str]`` (guarded by the existing ``_async_handles_lock``). ``close_command`` records the guid; ``close_session`` records every swept guid; ``get_execution_result`` records its own command after closing the async_exec. ``get_query_state`` now returns ``CommandState.CLOSED`` instead of falling through to ``SUCCEEDED`` for these. m2 — unit test for get_tables table_types client-side filter (3249904269): Added ``test_get_tables_with_table_types_filters_rows`` and ``test_get_tables_without_table_types_returns_full_stream`` in ``tests/unit/test_kernel_client.py``. The first feeds a fake stream with mixed ``TABLE`` / ``VIEW`` rows and asserts only ``TABLE`` survives; the second confirms the no-filter path bypasses the drain-and-rewrap and returns all rows unchanged. Plus new tests for every change above: - test_pyo3_native_exception_wrapped_as_operational_error (M2) - test_pyo3_native_exception_wrapped_for_metadata_calls (M2) - test_kernel_error_during_result_set_construction_is_mapped (M3) - test_get_execution_result_closes_async_exec_and_drops_tracking (M1) - test_get_execution_result_does_not_raise_on_async_exec_close_failure (M1) - test_get_query_state_returns_closed_after_close_command (m3) - test_close_session_marks_swept_handles_as_closed (m3) 87/87 kernel unit tests pass (added 9). Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * refactor(backend/kernel): replace kernel_call context manager with explicit try/except Using a ``with`` block for error translation is bad form: ``with`` conventionally signals resource lifecycle (locks, files), so ``with kernel_call("X"):`` hides the fact that the block raises a mapped exception. Replace with explicit ``try/except Exception as exc: raise _wrap_kernel_exception("X", exc) from exc`` at every PyO3 call site. What changed: - ``_errors.py``: drop the ``kernel_call`` context manager; export ``wrap_kernel_exception(what, exc)`` — a pure function that maps a raw exception to a PEP 249 one (KernelError → mapped class via ``reraise_kernel_error``; existing Error → passthrough; anything else → OperationalError). - ``client.py``: replace 12 ``with _kernel_call(...):`` blocks with inline try/except calling the helper. - ``result_set.py``: same for the 3 sites (arrow_schema on construct, fetch_next_batch in _pull_one_batch, fetch_next_batch in _drain). Behaviour is unchanged — same KernelError → PEP 249 mapping, same non-KernelError → OperationalError wrapping. Just spelled in a way that makes control flow visible at the call site and keeps tracebacks one frame shorter (no ``__exit__`` frame). 87/87 kernel unit tests still pass. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * style(backend/kernel): black format client.py Fixes ``check-linting`` CI: one long line in ``execute_command``'s async-submit branch needed to wrap (the ``CommandId.from_sea_statement_id`` call). Pure formatting; no behaviour change. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * fix(backend/kernel): address gopalldb's P1 review comments Local kernel-package fixes from the follow-up review pass on PR #787 (#787 (comment)). The two cross-cutting / pre-existing issues (P0 #1 async leak in shared ``Cursor.close()``, P1 #8/#9 fetch-after-close not raising ``InterfaceError``) are tracked separately as #791 and #792 — they affect Thrift and SEA equally and are out of scope for this PR. P1 #2 — call ``backend.close_command`` from ``KernelResultSet.close()``: The override previously bypassed the base ``ResultSet.close()`` entirely. Honour the contract by invoking ``backend.close_command(self.command_id)`` after the per-handle close + ``_async_handles`` pop. Our own ``close_command`` is tolerant of already-popped guids (no-op), so this is safe even though the per-handle close above already released server-side state. Doesn't go through ``super().close()`` directly because the base path warns when ``self.results`` is ``None`` (which it is for kernel result sets) — replicate the meaningful part of the base contract without the noisy warning. P1 #3 — case-insensitive ``Bearer`` prefix in auth_bridge: RFC 6750 §2.1 says the Authorization scheme is case-insensitive. Match leniently in case a federation proxy or future provider normalises the casing differently — failing closed would surface as a confusing ``ProgrammingError`` from the bridge. P1 #4 — drop redundant ``__cause__`` set in ``reraise_kernel_error``: ``raise wrap_kernel_exception(...) from exc`` already sets ``__cause__`` at the call site; the manual assignment in ``reraise_kernel_error`` was redundant. Updated the test that asserted on it; added ``test_kernel_error_chains_through_wrap`` to cover the end-to-end chain. P1 #5 — ``get_tables`` filter looks up TABLE_TYPE by name: Replaced ``schema.field(5).name`` (positional) with the literal ``"TABLE_TYPE"`` plus a missing-column guard. A future kernel reshape of ``SHOW TABLES`` now surfaces an explicit ``OperationalError`` instead of silently filtering the wrong column. The case-sensitive contract is now documented in the surrounding comment (matches SEA + warehouse). P1 #6 — ``KernelResultSet.close()`` guards on ``connection.open``: ``__del__``-driven close arriving after the parent connection is already closed previously issued a kernel call into a disposed session. Skip the kernel call entirely in that case; still mark the result set ``CLOSED`` locally so ``__del__`` is idempotent. P1 #7 — defer ``kernel_auth_kwargs`` to ``open_session``: ``KernelDatabricksClient.__init__`` previously called ``kernel_auth_kwargs(auth_provider)`` and stored the bearer token on ``self._auth_kwargs`` indefinitely. If ``open_session`` never ran (test paths, error paths, lazy retries) the token stayed resident on the connector object. Build the kwargs locally inside ``open_session`` now — local variable, GC-eligible the moment ``open_session`` returns. Also tightened the install-hint comment in ``pyproject.toml`` to match the rest of the codebase (the wheel isn't on PyPI; only the ``maturin develop`` path is supported today). 88/88 kernel unit tests pass (added 1). Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> * fix(backend/kernel): address gopalldb's follow-up P1/P2 review comments Review comment: #787 (comment) P1.1 — get_query_state handles non-BaseException failure: Previously ``raise _reraise_kernel_error(failure) from failure`` would explode with ``TypeError: exception causes must derive from BaseException`` if the kernel's ``status()`` ever returned a ``failure`` that wasn't a real ``KernelError`` (struct, dict — kernel API drift). Now route through ``_wrap_kernel_exception``, which isinstance-checks and falls through to ``OperationalError`` for non-PEP-249-shaped values. New unit test ``test_get_query_state_handles_non_baseexception_failure``. P1.2 — regression tests for prior fixes: - ``test_bearer_prefix_is_case_insensitive`` (P1 #3 from the earlier review): parametrised over "Bearer "/"bearer "/ "BEARER "/"BeArEr ". RFC 6750 §2.1 compliance was added but not covered by a test. - ``test_close_skips_kernel_call_when_connection_already_closed`` (P1 #6 from the earlier review): exercises the ``connection.open is False`` branch in ``KernelResultSet.close()`` — asserts neither the kernel handle's close nor backend.close_command fire, but the result set still ends in ``CLOSED`` so ``__del__`` is idempotent. - ``test_token_with_control_chars_or_whitespace_rejected`` (pre-existing security guard): parametrised over NUL / CR / LF / DEL / space / tab — the regex previously missed space (0x20). Covered + extended. P2.1 — wrap session_id extraction in open_session: ``SessionId.from_sea_session_id(self._kernel_session.session_id)`` was outside the ``try/except _wrap_kernel_exception`` scope. A raw PyO3 attribute-conversion error on the ``self._kernel_session.session_id`` access could escape unwrapped. Now wrapped. P2.2 — drop redundant _async_handles.pop in result-set close: After the M1 fix (``get_execution_result`` pops the guid before constructing the result set), the pop in ``KernelResultSet.close`` is dead code — every call misses. Sync-execute and metadata paths never registered in ``_async_handles`` to begin with. Drop the per-close pop; rewrote the surrounding comment so ``backend.close_command`` is now the single bookkeeping seam. P2.3 — control-char regex includes whitespace: Extended ``[\x00-\x1f\x7f]`` → ``[\x00-\x20\x7f]`` and renamed ``_CONTROL_CHAR_RE`` → ``_TOKEN_REJECT_RE``. RFC 6750 forbids whitespace within the credential token itself; a token like ``"Bearer doubled-space-token"`` previously slipped past the injection guard. Test parametrised above. type_mapping reuse — use SqlType constants: Replaced literal type strings ("bigint", "string", …) in ``_arrow_type_to_dbapi_string`` with the ``SqlType`` constants from ``databricks.sql.backend.sea.utils.conversion`` — same single source of truth the SEA backend already uses, so the kernel and SEA backends emit byte-identical type-code strings. The Arrow → SqlType lookup itself stays local to the kernel (SEA receives type-text from the server and normalises it; the kernel receives Arrow schemas directly), but the names are now shared. 100/100 kernel unit tests pass (added 12). Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com> --------- Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
1 parent e8e4409 commit 9164ba0

14 files changed

Lines changed: 2667 additions & 1 deletion

pyproject.toml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,20 @@ requests-kerberos = {version = "^0.15.0", optional = true}
3636

3737
[tool.poetry.extras]
3838
pyarrow = ["pyarrow"]
39+
# `[kernel]` extra is intentionally not declared here yet.
40+
# `databricks-sql-kernel` is built from the databricks-sql-kernel
41+
# repo and not yet published to PyPI; declaring it as a poetry dep
42+
# breaks `poetry lock` for every CI job. Once the wheel is on PyPI
43+
# the extra will be added back here:
44+
#
45+
# databricks-sql-kernel = {version = "^0.1.0", optional = true}
46+
# [tool.poetry.extras]
47+
# kernel = ["databricks-sql-kernel"]
48+
#
49+
# Until then, the wheel is not on PyPI and the only supported
50+
# install path is local dev:
51+
# cd databricks-sql-kernel/pyo3 && maturin develop --release
52+
# (into the same venv as databricks-sql-connector).
3953

4054
[tool.poetry.group.dev.dependencies]
4155
pytest = "^7.1.2"
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
"""Backend that delegates to the Databricks SQL Kernel (Rust) via PyO3.
2+
3+
Routed when ``use_kernel=True`` is passed to ``databricks.sql.connect``.
4+
The module's identity is "delegates to the kernel" — not the wire
5+
protocol the kernel happens to use today (SEA REST). The kernel may
6+
switch its default transport (SEA REST → SEA gRPC → …) without
7+
renaming this module.
8+
9+
This ``__init__`` deliberately does **not** re-export
10+
``KernelDatabricksClient`` from ``.client``. Importing ``.client``
11+
loads the ``databricks_sql_kernel`` PyO3 extension at module-import
12+
time; doing that eagerly here would make ``import
13+
databricks.sql.backend.kernel.type_mapping`` (used by tests / by
14+
``KernelResultSet`` consumers) require the kernel wheel even when
15+
the caller never plans to open a kernel-backed session. Callers
16+
that need the client import it directly:
17+
18+
from databricks.sql.backend.kernel.client import KernelDatabricksClient
19+
20+
``session.py::_create_backend`` already does this lazy import under
21+
the ``use_kernel=True`` branch.
22+
23+
See ``docs/designs/pysql-kernel-integration.md`` in
24+
``databricks-sql-kernel`` for the full integration design.
25+
"""
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
"""Shared error-mapping primitives for the kernel backend.
2+
3+
The PyO3 boundary can produce two flavours of exception:
4+
5+
- ``databricks_sql_kernel.KernelError`` — the kernel's own
6+
structured error type. Carries ``code`` / ``message`` /
7+
``sql_state`` / ``query_id`` / ``http_status`` / ``retryable`` /
8+
``vendor_code`` / ``error_code`` as attributes; mapped to a PEP
9+
249 exception class via ``_CODE_TO_EXCEPTION`` with the
10+
attributes forwarded onto the re-raised exception so callers can
11+
branch on ``err.code`` / ``err.sql_state`` without reaching
12+
through ``__cause__``.
13+
- Anything else — ``TypeError`` / ``OverflowError`` /
14+
``ValueError`` from PyO3 argument conversion, or arbitrary
15+
extension-internal Python errors. These would otherwise propagate
16+
raw to connector callers, breaking the DB-API contract that says
17+
"only PEP 249 exception types cross the boundary". Wrapped in
18+
``OperationalError`` here.
19+
20+
These primitives live in their own module so both ``client.py``
21+
(which orchestrates PyO3 calls) and ``result_set.py`` (which calls
22+
``fetch_next_batch`` on the same kernel handles) can share them
23+
without ``result_set.py`` importing from ``client.py``.
24+
25+
Usage at every PyO3 call site is a plain try/except:
26+
27+
try:
28+
stmt.execute()
29+
except Exception as exc:
30+
raise wrap_kernel_exception("execute_command", exc) from exc
31+
32+
The helper returns the mapped exception; callers raise it. Plain
33+
``try/except`` is preferred over a context manager: the control
34+
flow is visible at the call site, the helper is a pure function
35+
(trivial to test), and tracebacks don't carry an extra
36+
``__exit__`` frame.
37+
"""
38+
39+
from __future__ import annotations
40+
41+
from databricks.sql.exc import (
42+
DatabaseError,
43+
Error,
44+
OperationalError,
45+
ProgrammingError,
46+
)
47+
48+
49+
try:
50+
import databricks_sql_kernel as _kernel # type: ignore[import-not-found]
51+
except ImportError as exc: # pragma: no cover - same hint as client.py
52+
raise ImportError(
53+
"use_kernel=True requires the databricks-sql-kernel extension, which "
54+
"is not yet published on PyPI. Build and install it locally from the "
55+
"databricks-sql-kernel repo:\n"
56+
" cd databricks-sql-kernel/pyo3 && maturin develop --release\n"
57+
"(into the same venv as databricks-sql-connector)."
58+
) from exc
59+
60+
61+
# Map a kernel `code` slug to the PEP 249 exception class that best
62+
# captures it. The match isn't a perfect 1:1 — PEP 249 has a
63+
# narrower taxonomy than the kernel — so several kernel codes
64+
# collapse onto the same Python exception. This table is the only
65+
# place that mapping lives.
66+
_CODE_TO_EXCEPTION = {
67+
"InvalidArgument": ProgrammingError,
68+
"Unauthenticated": OperationalError,
69+
"PermissionDenied": OperationalError,
70+
"NotFound": ProgrammingError,
71+
"ResourceExhausted": OperationalError,
72+
"Unavailable": OperationalError,
73+
"Timeout": OperationalError,
74+
"Cancelled": OperationalError,
75+
"DataLoss": DatabaseError,
76+
"Internal": DatabaseError,
77+
"InvalidStatementHandle": ProgrammingError,
78+
"NetworkError": OperationalError,
79+
"SqlError": DatabaseError,
80+
"Unknown": DatabaseError,
81+
}
82+
83+
84+
def reraise_kernel_error(exc: "_kernel.KernelError") -> "Error":
85+
"""Convert a ``databricks_sql_kernel.KernelError`` to a PEP 249
86+
exception with the kernel's structured attributes forwarded onto
87+
the new instance.
88+
89+
The returned exception is raised by callers with ``raise ... from
90+
exc``; the ``from`` clause is what sets ``__cause__``, so we don't
91+
touch it here.
92+
"""
93+
code = getattr(exc, "code", "Unknown")
94+
cls = _CODE_TO_EXCEPTION.get(code, DatabaseError)
95+
new = cls(getattr(exc, "message", str(exc)))
96+
for attr in (
97+
"code",
98+
"sql_state",
99+
"error_code",
100+
"vendor_code",
101+
"http_status",
102+
"retryable",
103+
"query_id",
104+
):
105+
setattr(new, attr, getattr(exc, attr, None))
106+
return new
107+
108+
109+
def wrap_kernel_exception(what: str, exc: BaseException) -> "Error":
110+
"""Map any exception from a PyO3 call site to a PEP 249 exception.
111+
112+
- ``KernelError`` → mapped class with structured attrs forwarded.
113+
- Already-PEP-249 ``Error`` (e.g. raised by an inner caller that
114+
already mapped) → passed through unchanged.
115+
- Anything else (``TypeError`` / ``ValueError`` / etc. from PyO3
116+
argument conversion, extension-internal errors) → wrapped in
117+
``OperationalError``.
118+
119+
Returned, not raised — the caller decides whether to ``raise``
120+
or ``raise ... from exc``. ``what`` is a short tag (the calling
121+
method name) used only in the ``OperationalError`` message.
122+
"""
123+
if isinstance(exc, _kernel.KernelError):
124+
return reraise_kernel_error(exc)
125+
if isinstance(exc, Error):
126+
return exc
127+
return OperationalError(
128+
f"Unexpected error from databricks_sql_kernel during {what}: {exc!r}"
129+
)
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
"""Translate the connector's ``AuthProvider`` into ``databricks_sql_kernel``
2+
``Session`` auth kwargs.
3+
4+
This phase ships PAT only. The kernel-side PyO3 binding accepts
5+
``auth_type='pat'``; OAuth / federation / custom credentials
6+
providers are reserved but not yet wired in either layer. Non-PAT
7+
auth raises ``NotSupportedError`` from this bridge so the failure
8+
surfaces at session-open time with a clear message rather than
9+
deep inside the kernel.
10+
11+
Token extraction goes through ``AuthProvider.add_headers({})``
12+
rather than touching auth-provider-specific attributes, so the
13+
bridge works uniformly for every PAT shape — including
14+
``AccessTokenAuthProvider`` wrapped in ``TokenFederationProvider``
15+
(which ``get_python_sql_connector_auth_provider`` does for every
16+
provider it builds).
17+
"""
18+
19+
from __future__ import annotations
20+
21+
import logging
22+
import re
23+
from typing import Any, Dict, Optional
24+
25+
from databricks.sql.auth.authenticators import AccessTokenAuthProvider, AuthProvider
26+
from databricks.sql.auth.token_federation import TokenFederationProvider
27+
from databricks.sql.exc import NotSupportedError, ProgrammingError
28+
29+
logger = logging.getLogger(__name__)
30+
31+
32+
# RFC 6750 §2.1 defines the Authorization scheme as case-insensitive.
33+
# The connector's auth providers all emit ``Bearer `` exactly today,
34+
# but we match leniently in case a federation proxy or future provider
35+
# normalises the casing differently — failing closed here would surface
36+
# as a confusing ``ProgrammingError`` from the bridge.
37+
_BEARER_PREFIX_LEN = len("Bearer ")
38+
39+
# Defense-in-depth: reject tokens containing ASCII control characters
40+
# or whitespace. CR/LF/NUL in a token would let a misbehaving HTTP
41+
# stack split or terminate the Authorization header line, opening a
42+
# header-injection sink. Space (0x20) is included so leading-/
43+
# embedded-whitespace tokens (e.g. ``"Bearer doubled-space-token"``,
44+
# tab-prefixed token) get rejected too — RFC 6750 §2.1 forbids
45+
# whitespace within the credential token itself.
46+
_TOKEN_REJECT_RE = re.compile(r"[\x00-\x20\x7f]")
47+
48+
49+
def _is_pat(auth_provider: AuthProvider) -> bool:
50+
"""Return True iff this provider ultimately wraps an
51+
``AccessTokenAuthProvider``.
52+
53+
``get_python_sql_connector_auth_provider`` always wraps the
54+
base provider in a ``TokenFederationProvider``, so an
55+
``isinstance`` check against ``AccessTokenAuthProvider`` alone
56+
never matches in practice. We peek through the federation
57+
wrapper to find the real type.
58+
"""
59+
if isinstance(auth_provider, AccessTokenAuthProvider):
60+
return True
61+
if isinstance(auth_provider, TokenFederationProvider) and isinstance(
62+
auth_provider.external_provider, AccessTokenAuthProvider
63+
):
64+
return True
65+
return False
66+
67+
68+
def _extract_bearer_token(auth_provider: AuthProvider) -> Optional[str]:
69+
"""Pull the current bearer token out of an ``AuthProvider``.
70+
71+
The connector's ``AuthProvider.add_headers`` mutates a header
72+
dict and writes the ``Authorization: Bearer <token>`` value.
73+
Going through that public surface keeps us insulated from
74+
provider-specific internals.
75+
76+
Returns ``None`` if the provider did not write an Authorization
77+
header or wrote a non-Bearer scheme — neither is representable
78+
in the kernel's PAT auth surface.
79+
"""
80+
headers: Dict[str, str] = {}
81+
auth_provider.add_headers(headers)
82+
auth = headers.get("Authorization")
83+
if not auth:
84+
return None
85+
if not auth[:_BEARER_PREFIX_LEN].lower() == "bearer ":
86+
return None
87+
token = auth[_BEARER_PREFIX_LEN:]
88+
if _TOKEN_REJECT_RE.search(token):
89+
raise ProgrammingError(
90+
"Bearer token contains ASCII control characters or whitespace; "
91+
"refusing to forward it to the kernel auth bridge."
92+
)
93+
return token
94+
95+
96+
def kernel_auth_kwargs(auth_provider: AuthProvider) -> Dict[str, Any]:
97+
"""Build the kwargs passed to ``databricks_sql_kernel.Session(...)``.
98+
99+
PAT (including ``TokenFederationProvider``-wrapped PAT) routes
100+
through the kernel's PAT path. Anything else raises
101+
``NotSupportedError`` — the kernel binding doesn't accept OAuth
102+
today, and routing OAuth through PAT would silently break
103+
token refresh during long-running sessions.
104+
"""
105+
if _is_pat(auth_provider):
106+
token = _extract_bearer_token(auth_provider)
107+
if not token:
108+
raise ProgrammingError(
109+
"PAT auth provider did not produce a Bearer Authorization "
110+
"header; cannot route through the kernel's PAT path"
111+
)
112+
return {"auth_type": "pat", "access_token": token}
113+
114+
raise NotSupportedError(
115+
f"The kernel backend (use_kernel=True) currently only supports PAT auth, "
116+
f"but got {type(auth_provider).__name__}. Use the Thrift backend "
117+
"(default) for OAuth / federation / custom credential providers."
118+
)

0 commit comments

Comments
 (0)