Skip to content

Add Notebook-scoped packages for command submits or notebook job run#1321

Open
fedemgp wants to merge 3 commits intodatabricks:mainfrom
fedemgp:add_notebook_scoped_packages
Open

Add Notebook-scoped packages for command submits or notebook job run#1321
fedemgp wants to merge 3 commits intodatabricks:mainfrom
fedemgp:add_notebook_scoped_packages

Conversation

@fedemgp
Copy link

@fedemgp fedemgp commented Feb 5, 2026

Resolves #1320

Description

Add feature to install packages in Notebook-scoped environment for PythonCommandSubmitt and PythonNotebookUploader classes

Execution Test

I made the changes and tested it in our environment, looking that the compiled code now have the prepended package installation

All purpose cluster
image
Serverless cluster
image

Job cluster

image

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

Copy link
Collaborator

@tejassp-db tejassp-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you taken this into account - The Databricks docs state "Starting with Databricks Runtime 13.0 %pip commands do not automatically restart the Python process. If you install a new package or update an existing package, you may need to use dbutils.library.restartPython() to see the new packages."

Also have you considered the recommendation on driver node sizes for notebook scoped libraries

Copy link
Collaborator

@sd-db sd-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the PR. Took a look at the code changes(didn't review tests yet) and added comments.

logger.debug("Submitting Python model using the Command API.")

# Prepare code with notebook-scoped package installation if needed
code_to_execute = self._prepare_code_with_packages(compiled_code)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should ensure we only call this path if notebook_scoped_libraries is set.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now the private function checks if notebook_scoped_libraries is set before adding this header

parsed_model.config.packages if parsed_model.config.notebook_scoped_libraries else []
)

def _prepare_code_with_packages(self, compiled_code: str) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of duplication, better to extract into a re-usable function component. Maybe something like

def prepare_code_with_packages(compiled_code: str, packages: list[str], separator: str = "\n\n") -> str:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added it into its base clase PythonSubmitter, i had to make some adjustments so any submitter is able to use uit

# Only add packages to cluster-level libraries if not using notebook-scoped
if not notebook_scoped_libraries:
for package in packages:
if index_url:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us look to add support for index_url in notebook scoped packages as well. Since this is supported today; it would be weird if some model silently breaks if something breaks if an user who has set this switches to using notebook scoped packages.

Copy link
Author

@fedemgp fedemgp Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already added support to index-url also at notebook scoped packages here

logger.debug(f"Adding notebook-scoped package installation: {pip_install_cmd}")

# Prepend the pip install command to the compiled code
return f"{pip_install_cmd}\n\n# COMMAND ----------\n{compiled_code}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As recommended by @tejassp-db, let us add dbutils.library.restartPython() after the pip_install_cmd

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

return compiled_code

# Build the %pip install command for notebook-scoped packages
pip_install_cmd = "%pip install " + " ".join(self.packages)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe can try %pip install -q {packages} to reduce noise in notebook output

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing

@fedemgp
Copy link
Author

fedemgp commented Feb 10, 2026

Have you taken this into account - The Databricks docs state "Starting with Databricks Runtime 13.0 %pip commands do not automatically restart the Python process. If you install a new package or update an existing package, you may need to use dbutils.library.restartPython() to see the new packages."

Also have you considered the recommendation on driver node sizes for notebook scoped libraries

I didn't take into account the recommendation to restart python process, thanks for that. I will refactor this draft PR to make it more production ready (avoiding duplicated code) and consider that.

About the how large should the driver node be, this solution is agnostic about cluster size. It will depend on the team that previously defined the cluster (no matter if it was an all purpose cluster, job cluster, whatever).

@fedemgp fedemgp force-pushed the add_notebook_scoped_packages branch from e96a5f0 to 19d86fc Compare February 11, 2026 22:23
Signed-off-by: Federico Manuel Gomez Peter <federico.gomez@payclip.com>
@fedemgp fedemgp force-pushed the add_notebook_scoped_packages branch from 19d86fc to 6fa5301 Compare February 11, 2026 22:31
@fedemgp
Copy link
Author

fedemgp commented Feb 11, 2026

I was not able to run the end to end tests to be honest, i will try to do it tomorrow

Signed-off-by: Federico Manuel Gomez Peter <federico.gomez@payclip.com>
@fedemgp fedemgp marked this pull request as ready for review February 12, 2026 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Notebook-scoped libraries feature not enable in dbt-databricks

3 participants