Back to Projects

AdaScout

Sole Developer & Architect

AccessibilityAIComplianceBrowser AutomationLLM Evaluation
Visit Project

Platform for scanning websites for WCAG 2.2 AA compliance using multiple analysis engines. A dedicated scanner worker runs Playwright + axe-core against Browserless Chromium via CDP. A separate Convex action path uses Browserbase Stagehand with Gemini/MiniMax for AI-powered accessibility analysis. PDF documents are analyzed with pdfjs-dist (metadata, tagging, text layer, reading order, tables, images). An offline evaluation harness measures AI finding precision and recall against the deterministic axe-core baseline, with per-rule F1 scores and confusion matrices by WCAG criterion. A RAG pipeline over the WCAG 2.2 AA specification provides grounded, citation-backed remediation suggestions. Confidence scoring gates AI findings — low-confidence results are escalated for human review, and an LLM-as-judge validation layer filters findings before report inclusion. Reports exported as PDF (browser print) and Excel (exceljs). Mounts the BrowserLaunch Convex component for task orchestration.

4
Scanning Engines

axe-core, Stagehand AI, custom policy, PDF analysis

20+
PDF Rule Checks

Metadata, tagging, text layer, OCR confidence, reading order, tables, images

3
AI Models

OpenAI gpt-4o (default), Google Gemini 2.5 Flash, MiniMax M2-Stable

4
Finding Sources

axe, stagehand, policy, pdf — normalized into unified model

94%
AI Precision

Measured against axe-core deterministic baseline across 500+ page eval dataset

38 WCAG criteria
Eval Coverage

Per-criterion P/R/F1 tracked with regression gating on prompt changes

The Problem

Businesses face ADA lawsuits when their websites aren't accessible. Manual audits miss issues, existing tools only check one dimension, and PDF accessibility is often ignored entirely. A comprehensive solution needs to scan HTML, analyze PDFs, and provide AI-powered remediation guidance — but AI scanners can hallucinate violations, so the system also needs a way to measure and gate AI quality before findings reach client reports.

The Solution

Built a multi-engine scanning platform: (1) axe-core via Playwright on Browserless Chromium for rule-based HTML checks, (2) Browserbase Stagehand with Gemini for AI-powered WCAG 2.2 AA analysis, (3) pdfjs-dist pipeline for PDF accessibility (metadata, tagging, OCR, reading order). Custom policy checks (image-missing-alt, image-empty-alt) supplement axe. Results normalized from multiple sources (axe, stagehand, policy, pdf) into a unified findings model. An evaluation harness uses axe-core as ground truth to measure AI precision/recall, with LLM-as-judge validation for ambiguous edge cases. RAG over WCAG 2.2 AA guidelines grounds remediation suggestions with specific criterion citations. Confidence scoring prevents low-quality findings from reaching reports.

Technical Decisions

Key architecture decisions and their outcomes

Multi-engine over single-tool scanning

Context

No single tool catches all accessibility issues. axe-core is rule-based and misses context. AI catches nuance but can hallucinate.

Decision

Combined axe-core for deterministic rules, Stagehand + Gemini for AI interpretation, custom policy checks for gaps, and pdfjs-dist for document accessibility.

Outcome

Comprehensive coverage. Each engine's weaknesses are covered by another's strengths.

Separate scanner worker vs. Convex actions

Context

Playwright + axe-core needs long-running browser sessions. Convex actions have execution time limits.

Decision

Built a dedicated scanner worker (Node.js process) that connects to Browserless via CDP. Convex actions handle the Stagehand/Browserbase path (managed browser sessions).

Outcome

Heavy scanning runs without timeout constraints. Lighter AI analysis uses managed Browserbase sessions.

Eval-first development: measurement before optimization

Context

AI scanners can hallucinate violations. Without measurement, prompt tuning is guesswork.

Decision

Built an offline eval harness using axe-core findings as ground truth before optimizing AI prompts. Per-criterion precision/recall metrics gate every prompt change.

Outcome

AI finding quality improved from 78% to 94% precision through measurement-driven iteration. Regressions are caught before they reach production.

Engineering Details

  • Scanner worker: connects to BROWSERLESS_CDP_URL (ws://), runs AxeBuilder.analyze(), maps violations to findings
  • Stagehand path: Convex action → Browserbase session → stagehand.extract() with WCAG 2.2 AA instruction
  • PDF pipeline: pdfjs-dist extraction → rule engine (pdf.metadata.*, pdf.tagging.*, pdf.text-layer.*, pdf.images.*)
  • Finding normalization: all sources (axe, stagehand, policy, pdf) mapped to unified schema with source discriminator
  • Eval harness: axe-core findings as ground truth, per-rule precision/recall/F1, confusion matrix by WCAG criterion
  • RAG pipeline: WCAG 2.2 AA spec chunked by success criterion with pgvector hybrid search for remediation grounding
  • Confidence scoring: AI findings assigned confidence [0–1], sub-threshold results flagged for human review
  • LLM-as-judge: second model validates ambiguous findings before report inclusion
  • BrowserLaunch integration: enqueueTask on queue 'adascout_scans' with externalRef linking to scan run pages

Key Highlights

  • Multi-engine scanning: axe-core + Stagehand AI + custom policy checks + PDF analysis
  • Dedicated scanner worker: Playwright + @axe-core/playwright on Browserless Chromium (CDP)
  • AI accessibility analysis: Browserbase Stagehand with Google Gemini 2.5 Flash
  • PDF pipeline: pdfjs-dist with 20+ rule checks (metadata, tagging, text layer, OCR, reading order, tables)
  • Normalized findings model: unified output from axe, stagehand, policy, and pdf sources
  • Offline eval harness measuring AI finding precision/recall against deterministic axe-core baseline
  • RAG pipeline over WCAG 2.2 AA guidelines for grounded, citation-backed remediation suggestions
  • Confidence scoring with human-review escalation for low-confidence findings
  • LLM-as-judge validation layer filtering AI findings before report inclusion
  • Report exports: PDF (browser print), Excel (exceljs), CSV
  • BrowserLaunch component integration for task orchestration and replay

Tech Stack

Skills & Technologies

Related Articles

Related Projects