From 6bbac5ae10bedb04fcdc03c8a6b28086a59af2de Mon Sep 17 00:00:00 2001 From: FAQ Bot Date: Mon, 23 Mar 2026 19:58:08 +0000 Subject: [PATCH] NEW: How do I use Spark with BigQuery as a data source and sink? --- ...62_2b59e8e6c1_spark-bigquery-read-write.md | 31 +++++++++++++++++++ 1 file changed, 31 insertions(+) create mode 100644 _questions/data-engineering-zoomcamp/module-6/062_2b59e8e6c1_spark-bigquery-read-write.md diff --git a/_questions/data-engineering-zoomcamp/module-6/062_2b59e8e6c1_spark-bigquery-read-write.md b/_questions/data-engineering-zoomcamp/module-6/062_2b59e8e6c1_spark-bigquery-read-write.md new file mode 100644 index 0000000..2884c2e --- /dev/null +++ b/_questions/data-engineering-zoomcamp/module-6/062_2b59e8e6c1_spark-bigquery-read-write.md @@ -0,0 +1,31 @@ +--- +id: 2b59e8e6c1 +question: How do I use Spark with BigQuery as a data source and sink? +sort_order: 62 +--- + +Add the connector package: `com.google.cloud.spark:spark-bigquery-with-dependencies_2.12` + +Read from BigQuery: + +```python +df = spark.read.format("bigquery") \ +.option("table", "project.dataset.table") \ +.load() +``` + +Write to BigQuery: + +```python +df.write.format("bigquery") \ +.option("table", "project.dataset.output_table") \ +.mode("overwrite") \ +.save() +``` + +Make sure: +- your GCP credentials are configured +- dataset location matches your query location +- the output dataset exists + +This enables distributed processing on top of warehouse data. \ No newline at end of file