- Learn how to provision computing resources for running Big Data analyses using the Infrastructure as Code (IaC) approach.
- Learn how to set up opinionated CI/CD pipelines to deploy cloud infrastructure.
- Learn how to utilize linters for detecting security vulnerabilities in cloud infrastructure.
- Learn how to run Apache Spark code in a distributed way on Hadoop cluster using
Vertex AI notebooks and Dataproc services on GCP.
- Learn how to use Workload Identity Federation for a secure authentication from GitHub Actions
to Google Cloud.


- Redeem a GCP coupon to create a billing account
- Authenticate to GCP to obtain the default credentials used for running the code
# first remove the stored credentials if exist
gcloud auth application-default revoke
# login and get the new application credentials
gcloud auth application-default login
- Fork this repository to your own Github account.
- Export shared environment variables
export TF_VAR_tbd_semester=2025L
# format: 20xx for teachers, student ID number for students
export TF_VAR_user_id=9900
# use your own billing account id
export TF_VAR_billing_account=01F44C-CA9C7E-587C25
# for budget creation
export GOOGLE_BILLING_PROJECT=$(echo "tbd-${TF_VAR_tbd_semester}-${TF_VAR_user_id}" | tr '[:upper:]' '[:lower:]')
- Enter
bootstrap folder then init project and Terraform state bucket
cd bootstrap
terraform init
terraform apply
cd ..
- CI/CD (Github Actions setup using Workload Identity Federation)
- Edit
env/backend.tfvars file and set bucket variable with the Terraform state bucket
- Edit
env/project.tfvars file and set project_name, iac_service_account variables using the output from the bootstrap phase, e.g.:

- Edit
cicd_bootstrap/conf/github_actions.tfvars to set github_org and github_repo, e.g.:
github_org = "mwiewior"
github_repo = "tbd-workshop-1"
- Init state file and set env variables
cd cicd_bootstrap
terraform init -backend-config=../env/backend.tfvars
# authenticate Docker backend with GCP
gcloud auth configure-docker
# create CI/CD integration using Workload Identity
terraform apply -var-file ../env/project.tfvars -var-file conf/github_actions.tfvars -compact-warnings
cd ..
- Use output variables for configuring Github Actions workflow:
.github/workflows/pull-request.yml,e.g. :
Please do not edit and hardcode these values in a YAML but set the Github Actions secrets instead
while preserving the secret names, i.e. GCP_WORKLOAD_IDENTITY_PROVIDER_NAME and GCP_WORKLOAD_IDENTITY_SA_EMAIL.

- Install and configure
pre-commit
- Commit changes, push to a branch and open a PR to YOUR repository main/master branch.
If you see a warning like this -- please enable the workflows:
...and repush your changes!
Once all Pull Requests checks have passed please merge your PR and wait until your release job finishes.
- IMPORTANT
❗ ❗ ❗ Please remember to destroy all the resources after the workshop:
terraform init -backend-config=env/backend.tfvars
terraform destroy -no-color -var-file env/project.tfvars