End-to-End GA4
Analytics Pipeline

Scalable, production-grade web analytics pipeline ingesting raw GA4 event data into Databricks via Structured Streaming, with Delta Lake and Medallion Architecture powering real-time BI dashboards.

GA4 Event Ingestion
3 Lakehouse Tiers
Streaming Loads
Delta Lake ACID
Designed and implemented a scalable, production-grade web analytics pipeline that ingests raw Google Analytics 4 (GA4) event data into Databricks using Structured Streaming Pipeline. Leveraged Delta Lake and Medallion Architecture (Bronze–Silver–Gold) to transform nested GA4 data into optimized, analytics-ready datasets powering real-time BI dashboards.
System Overview
Bronze · Silver · Gold
Bronze
Raw GA4 event ingestion via Structured Streaming, stored as-is in Delta Lake with ACID transactions & checkpointing
Silver
Nested GA4 schema flattening, schema evolution handling, late-arriving events, partitioning & Z-Ordering optimisation
Gold
BI-ready KPI datasets powering Power BI dashboards & integrated with Databricks Genie
What's Under the Hood
GA4 event ingestion using Structured Streaming Injection Pipeline
🏛 Medallion Architecture implementation — Bronze → Silver → Gold
🔄 Flattening of complex nested GA4 event schema using Databricks notebooks
🧬 Schema evolution & late-arriving event handling for data reliability
🔒 Delta Lake ACID transactions & checkpointing for fault tolerance
⚙️ Performance optimisation using partitioning & Z-Ordering strategies
📊 BI-ready KPI datasets powering real-time Power BI dashboards
🤖 Integration of Silver & Gold layers with Databricks Genie for AI querying
Built With