[WIP] Add session mode for dbt-databricks adapter for 1.10.x version#1311
Open
alexeyegorov wants to merge 2 commits intodatabricks:mainfrom
Open
[WIP] Add session mode for dbt-databricks adapter for 1.10.x version#1311alexeyegorov wants to merge 2 commits intodatabricks:mainfrom
alexeyegorov wants to merge 2 commits intodatabricks:mainfrom
Conversation
This commit updates the Databricks adapter to version 1.10.15+session, introducing support for session mode execution. Key changes include: - Added `DatabricksSessionHandle` and `SessionCursorWrapper` for handling SparkSession-based execution. - Enhanced `DatabricksCredentials` to manage connection methods and validate session mode configurations. - Updated connection management to support session mode, including automatic selection of submission methods for Python models. - Improve SparkSession retrieval in Databricks adapter. This commit enhances the `DatabricksSessionHandle` and `SessionPythonJobHelper` classes to improve the retrieval of the existing SparkSession. It introduces multiple methods to obtain the SparkSession, ensuring compatibility with various Databricks environments. Additionally, it refactors method signatures for consistency and readability.
0ee4e71 to
da8f70b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Execute DBT in session mode (e.g. on a job cluster)
This PR was created with help by Claude/Cursor to implement session mode for
dbt-databricksadapter.It should allow:
dbt-sparkAsset Bundle DBT task
The native
dbt_taskon Databricks does not provide a Spark session. It is also not possible to retrieve it.It is about Databricks to allow retrieving a Spark Session within
dbt_taskin order to keep the Asset Bundle deployment as simple as it is right now.The workaround to still use the session mode is to define the dbt tasks as python scripts or notebooks.
This approach could be added as a template via Databricks Asset Bundles.
In our job example, this looks as below:
With the job cluster selected, it now executes as expected:

dbt depsanddbt seeddbt runwith optional--full-refreshand a passed--select(e.g. state, specific model, or empty for full selection)dbt testExample of dbt cli for run execution
Pros/Cons
Pros:
profiles.ymlandrequirements.txtin the asset bundles repositoryprofiles.ymlneed to be uploaded manually to an available path on e.g. Databricks; changes require to update and reupload the file;Cons:
dbt_taskprovided by Databricks -> could be adjusted by DatabricksDescription
Checklist
CHANGELOG.mdand added information about my change to the "dbt-databricks next" section.