Course
data-engineering-zoomcamp
Question
How do I use Spark with BigQuery as a data source and sink?
Answer
You can use the Spark BigQuery connector to read and write data between Spark and BigQuery.
-
Add the connector package:
com.google.cloud.spark:spark-bigquery-with-dependencies_2.12
-
Read from BigQuery:
df = spark.read.format("bigquery") \
.option("table", "project.dataset.table") \
.load()
3. Write to BigQuery:
```python
df.write.format("bigquery") \
.option("table", "project.dataset.output_table") \
.mode("overwrite") \
.save()
Make sure:
your GCP credentials are configured
dataset location matches your query location
the output dataset exists
This enables distributed processing on top of warehouse data.
### Checklist
- [x] I have searched existing FAQs and this question is not already answered
- [x] The answer provides accurate, helpful information
- [x] I have included any relevant code examples or links
Course
data-engineering-zoomcamp
Question
How do I use Spark with BigQuery as a data source and sink?
Answer
You can use the Spark BigQuery connector to read and write data between Spark and BigQuery.
Add the connector package:
com.google.cloud.spark:spark-bigquery-with-dependencies_2.12
Read from BigQuery: