Skip to content

dgizdevans/ga4-bq-flatten

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GA4 BigQuery Data Profiling and Flattening

Problem

GA4 BigQuery export stores data in nested and repeated structures (event_params, user_properties, items). This is great inside BigQuery, but most downstream tools (pandas, ML pipelines, many BI tools) expect flat tables. Repeated UNNEST logic in every query is tedious and increases error risk.

This repo shows a practical workflow to profile GA4 export data and build a reproducible flattening query using the public ga4_obfuscated_sample_ecommerce dataset.

Dataset

Public BigQuery dataset: Google Merchandise Store : Obfuscated GA360 Data
BigQuery ID: bigquery-public-data.ga4_obfuscated_sample_ecommerce

Period: Nov 2020 to Jan 2021.
Scale: 92 daily tables, ~4.3M events.

Contents

Notebook

  • ga4_bq_profiling_and_flattening.ipynb - end-to-end: connect to BigQuery, profile schema and nested fields, build and validate a flat events table

SQL (BigQuery Console)

The sql/ folder contains the same profiling and flattening workflow as standalone queries for running directly in the BigQuery UI.

Run order:

  1. 01_profile_event_params.sql
  2. 02_profile_user_properties.sql
  3. 03_profile_items.sql
  4. 04_profile_flat_fields.sql
  5. 05_flatten.sql
  6. 06_validate.sql (optional)

Approach

  1. Profile event_params - key coverage, NULL rates, type inconsistencies
  2. Profile user_properties - presence and sparsity
  3. Profile items - presence and sparsity (ecommerce context)
  4. Profile flat event fields - identify fields that are always NULL in this dataset
  5. Build flatten query - pivot selected event_params keys into columns and keep stable fields
  6. Validate - row count parity, date range, event-type distribution

Key findings (for this dataset)

  • 28 event_params keys observed, 4 are always NULL, 24 retained in the flat table
  • user_properties are extremely sparse (present as key-value pairs, not as stable columns)
  • items are excluded from the final flat table to keep an event-level layout simple (can be expanded later)
  • 12 flat fields are always NULL in this dataset and are excluded
  • 1 mixed-type key (session_engaged) handled via COALESCE across value types

Result

Flat table: 4,295,584 rows x 60 columns, 1:1 with source events (event-level granularity).

Requirements

  • Google Cloud project with BigQuery access
  • Python 3
  • Packages: google-cloud-bigquery, pandas
  • Colab or local environment (service account if needed)

Usage

  1. Open the notebook in Colab
  2. Set PROJECT_ID and DEST_DATASET
  3. Run all cells

Write-up

Medium article (BigQuery UI walkthrough): Profiling and Flattening GA4 Data in BigQuery: A Step-by-Step Walkthrough

Dataset

The processed and flattened dataset is available on Hugging Face: GA4 BigQuery Flattened Dataset

About

GA4 BigQuery export profiling and reproducible flattening into an event-level table (public sample ecommerce dataset).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors