Skip to content

[FAQ] How to use Spark with BigQuery as source and destination? #256

@AsherJD-io

Description

@AsherJD-io

Course

data-engineering-zoomcamp

Question

How do I use Spark with BigQuery as a data source and sink?

Answer

You can use the Spark BigQuery connector to read and write data between Spark and BigQuery.

  1. Add the connector package:
    com.google.cloud.spark:spark-bigquery-with-dependencies_2.12

  2. Read from BigQuery:

df = spark.read.format("bigquery") \
    .option("table", "project.dataset.table") \
    .load()

3. Write to BigQuery:

```python
df.write.format("bigquery") \
    .option("table", "project.dataset.output_table") \
    .mode("overwrite") \
    .save()


Make sure:

your GCP credentials are configured
dataset location matches your query location
the output dataset exists

This enables distributed processing on top of warehouse data.

### Checklist

- [x] I have searched existing FAQs and this question is not already answered
- [x] The answer provides accurate, helpful information
- [x] I have included any relevant code examples or links

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions