Web Form Data Ingestion & Analytics Pipeline
Project Objective

This project implements an end-to-end, event-driven web data ingestion and analytics pipeline designed to capture, validate, store, and analyze form submissions in a scalable and reliable manner.

A web form hosted on GitHub Pages collects user inputs and submits data via a JavaScript Fetch API (POST JSON) request. Client-side validation ensures data quality before transmission. The backend is powered by a Google Apps Script REST API, which processes incoming requests, enriches them with metadata (such as timestamps and unique identifiers), and writes raw records to Google Sheets, acting as a lightweight bronze layer for ingestion.

Upon successful submission, an instant email notification is triggered to confirm receipt. Additionally, a time-based scheduled trigger generates a daily summary email containing the previous day’s submission metrics.

The raw data is ingested into Databricks, where it is processed through a DTL pipeline. The bronze layer stores raw, append-only data, while the silver layer is built using an incrementally refreshed, materialized view that applies cleansing, normalization, and basic transformations. This design ensures efficient processing, avoids reprocessing historical data, and delivers analytics-ready datasets.

The overall architecture combines event-driven ingestion with scheduled processing, resulting in a cost-efficient, low-maintenance pipeline suitable for real-world analytics and reporting use cases.

Architecture Overview

🛠️Software Toolkits:

Pipeline Highlights

  •  Frontend hosted on GitHub Pages
  •  Data submitted via Fetch API (JSON POST)
  •  Backend implemented using Google Apps Script Web App
  •  Each submission enriched with UUID & timestamp
  •  Google Sheets acts as raw intake store (Bronze equivalent)
  •  Instant email sent on successful submission
  •  Daily summary email triggered via time-based scheduler
  •  Databricks pipeline ingests raw data incrementally
  •  Silver layer provides curated, analytics-ready data
  •  Architecture is event-driven + scheduled

Connect with me:

LinkedIn GitHub Instagram

✉ Reach me:
jharajnish@outlook.in