Back to Projects

SignalBoard

Data + Full Stack Engineer

Data EngineeringAnalyticsETLDashboardOLAP

SignalBoard is a data-to-insight platform that ingests operational data from heterogeneous sources, validates and transforms it into canonical models, and serves analytics through both API endpoints and a dashboard experience. The system uses ClickHouse as its OLAP engine for time-windowed metric aggregation and high-cardinality analytics queries. Beyond operational metrics, it stores AI evaluation results — precision, recall, latency, and cost per model version — enabling model comparison dashboards and regression tracking across AI systems. The system includes scheduled jobs, anomaly detection, and exportable reports.

Heterogeneous
Data Sources

API and file-based connectors normalized into shared models

Validated
Pipeline Quality

Schema and quality checks gate transformation stages

API + Dashboard
Consumption

Same metrics available to internal tools and human operators

Backfillable
Recovery

Missed windows can be replayed without full pipeline rebuild

The Problem

Organizations often have fragmented operational data across APIs and flat files, making reliable reporting difficult and slowing decision cycles. AI-powered systems add another dimension: model quality metrics need storage and visualization that traditional OLTP databases handle poorly.

The Solution

Built a Python data pipeline with quality validation and transformation stages backed by ClickHouse for columnar analytics. Exposed clean analytics endpoints via FastAPI with a dashboard for KPI exploration, trend drill-down, anomaly review, and report exports. Extended the platform to serve as an AI evaluation warehouse — storing eval run results, model comparison metrics, and quality dashboards that track AI system performance over time.

Technical Decisions

Key architecture decisions and their outcomes

Canonical metric contracts before visualization

Context

Dashboard development moved faster when metric contracts were stable and versioned.

Decision

Defined canonical dimensions, facts, and metric formulas in pipeline layer before UI implementation.

Outcome

Frontend stayed focused on UX while analytics semantics remained consistent.

Job metadata as a core data model

Context

Without lineage and run metadata, failed ingestions were difficult to debug and trust.

Decision

Stored run IDs, source status, row counts, and validation results for every ingestion cycle.

Outcome

Data reliability became observable and recoverable via targeted reruns.

Engineering Details

  • Connectors emit standardized ingestion payloads with source-specific parsers
  • Validation layer enforces schema constraints and quality thresholds before load
  • Analytics API supports date-window guards and pagination for heavy slices
  • Dashboard filters map directly to API dimensions for transparent query behavior
  • Backfill commands replay historical windows with deterministic transforms

Key Highlights

  • Multi-source ingestion with lineage metadata per run
  • Validation and quality checks before downstream transformation
  • Anomaly detection on key metric deltas and trend shifts
  • ClickHouse OLAP engine for time-windowed metric aggregation and AI evaluation storage
  • Model comparison dashboards tracking precision, recall, latency, and cost across AI model versions
  • FastAPI analytics surface for dashboard and programmatic use
  • Interactive KPI dashboard with filtering and export support
  • Replay and backfill paths for missed ingestion windows

Tech Stack

Skills & Technologies

Related Articles

Related Projects