[FAQ] How to use Spark with BigQuery as source and destination?

### Course

data-engineering-zoomcamp

### Question

How do I use Spark with BigQuery as a data source and sink?

### Answer

You can use the Spark BigQuery connector to read and write data between Spark and BigQuery.

1. Add the connector package:
com.google.cloud.spark:spark-bigquery-with-dependencies_2.12

2. Read from BigQuery:

```python
df = spark.read.format("bigquery") \
    .option("table", "project.dataset.table") \
    .load()

3. Write to BigQuery:

```python
df.write.format("bigquery") \
    .option("table", "project.dataset.output_table") \
    .mode("overwrite") \
    .save()


Make sure:

your GCP credentials are configured
dataset location matches your query location
the output dataset exists

This enables distributed processing on top of warehouse data.

### Checklist

- [x] I have searched existing FAQs and this question is not already answered
- [x] The answer provides accurate, helpful information
- [x] I have included any relevant code examples or links

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FAQ] How to use Spark with BigQuery as source and destination? #256

Course

Question

Answer

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FAQ] How to use Spark with BigQuery as source and destination? #256

Description

Course

Question

Answer

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions