Web Form Data
Ingestion Pipeline

Event-driven, end-to-end web form data ingestion and analytics pipeline combining a serverless Google Apps Script backend, Google Sheets bronze layer, and Databricks incremental processing for analytics-ready datasets.

GAS Serverless API
2 Lakehouse Tiers
Incremental Loads
DTL Pipeline
Implement an end-to-end, event-driven web data ingestion and analytics pipeline designed to capture, validate, store, and analyze form submissions in a scalable and reliable manner — combining a GitHub Pages frontend, Google Apps Script REST API, Google Sheets bronze layer, and a Databricks DTL pipeline with an incrementally refreshed silver layer for analytics-ready output.
System Overview
Ingestion · Bronze · Silver
Ingestion
GitHub Pages frontend submits JSON via Fetch API. Google Apps Script enriches each record with UUID & timestamp, triggers instant email on receipt
Bronze
Google Sheets acts as raw intake store — append-only records ingested incrementally into Databricks via DTL pipeline
Silver
Incrementally refreshed materialized view applying cleansing, normalisation & transformations — no reprocessing of historical data
What's Under the Hood
🌐 Frontend hosted on GitHub Pages with client-side validation before submission
📡 Data submitted via JavaScript Fetch API as JSON POST request
⚙️ Backend implemented using Google Apps Script Web App as a serverless REST API
🔑 Each submission enriched with UUID & timestamp for traceability
🗄 Google Sheets acts as raw intake store — lightweight Bronze layer equivalent
📧 Instant confirmation email triggered on successful submission
🕐 Daily summary email with previous day's metrics via time-based scheduler
Databricks DTL pipeline ingests raw data incrementally from Google Sheets
🔄 Silver layer as incrementally refreshed materialized view — no historical reprocessing
🏗 Event-driven ingestion combined with scheduled processing for cost-efficient architecture
Built With