To get started, create a fresh venv or conda environment. Then, update pip
and install the required dev dependencies within the new environment.
pip install -U pip
pip install -r requirements-dev.txt
# Install the test dependencies to run unit and integration tests.
pip install -r requirements-test.txtWe use pyink to lint/format the code. To apply changes to your local
environment, run:
pyink .This will ensure that your changes pass CI code linting.
We use pytest for testing. To run all tests, simply run pytest. For this
to work, you have to have your ambient environment configured to correctly
resolve all configuration details such as GCP project, region, etc.
To make testing more deterministic, it is instead recommended to specify configuration details on the command line. For example:
env \
GOOGLE_CLOUD_PROJECT='project-id' \
GOOGLE_CLOUD_REGION='us-central1' \
DATAPROC_SPARK_CONNECT_SUBNET='subnet-id' \
pytest --tb=auto -vTo run tests with magic functionality, install the required dependencies manually:
pip install .
pip install IPython sparksql-magicThen run tests as normal. Any magic-related tests will automatically detect and use the available dependencies.
To run tests without the magic dependencies, simply install the base package:
pip install .
pytestTests that require magic functionality will be automatically skipped if the dependencies are not available.
The integration tests in particular can take a while to run. To speed up the
testing cycle, you can run them in parallel. You can do so using the xdist
plugin by setting the -n flag to the number of parallel runners you want to
use. This will be set automatically if you set it to auto. For example:
env \
GOOGLE_CLOUD_PROJECT='project-id' \
GOOGLE_CLOUD_REGION='us-central1' \
DATAPROC_SPARK_CONNECT_SUBNET='subnet-id' \
DATAPROC_SPARK_CONNECT_SERVICE_ACCOUNT='service@account.test' \
pytest -n auto --tb=auto -v