← Back to blog

launchthat

How I Built a Custom AI for My Portfolio (Part 1): Architecture and Implementation

Part 1 of my portfolio AI series: Convex agent component design, RAG fundamentals, OpenAI vs Google provider testing, streaming UX, and how the frontend chat bubble and /admin/ai tooling live inside one reusable AI plugin.

Apr 16, 2026Desmond Tatilian

This is Part 1 of a two-part series on my portfolio AI system.

  • Part 1 (this post): technical architecture and implementation
  • Part 2: monitoring, eval, optimization, and production tuning

Read Part 2 here: How I Built a Custom AI for My Portfolio (Part 2).

Why I built it as a reusable component instead of one-off app code

I did not want AI logic scattered across app-specific files. I wanted one reusable AI layer that I could plug into different products (portfolio, LaunchThatBot, and future apps) without rewriting core behavior each time.

So I structured it as a plugin/component architecture:

  • shared AI backend behaviors in a core package (packages/core/ai)
  • app-specific content and orchestration in apps/portfolio/convex/*
  • shared UI primitives for chat rendering, virtualization, and streaming in packages/core/ai/src/components/*
  • app-level chat shell in apps/portfolio/src/components/chat/*

That split is what lets the portfolio app feel custom while still reusing the same AI foundation as other surfaces in the monorepo.

Core runtime choice: Convex agent component

I built around Convex's agent patterns because I needed:

  • persistent thread/message state
  • server-side actions and scheduling
  • reactive client updates
  • strong typing from backend to frontend
  • streaming-friendly UX hooks

What the agent component includes in practice

At a high level, the agent component gives me:

  • Thread/message lifecycle: create thread, append messages, update status (pending, streaming, done)
  • Tool/action orchestration: structured calls for retrieval and response generation
  • Streaming support path: token deltas can be surfaced live to the UI
  • Shared settings model: provider/model/embedding configuration from admin controls
  • Reusable hooks: clients can subscribe to thread messages and render streaming state safely

This lets me keep business-specific logic (portfolio retrieval and presentation) separate from baseline agent plumbing.

RAG in this system: simple concept, strict implementation

The retrieval concept is standard:

  1. embed the user query
  2. search indexed portfolio content
  3. inject top-ranked context
  4. generate answer with source-grounded prompt instructions

But in implementation, there are three details that matter:

  • Indexing scope: projects, skills, experience, and blog posts all need to be represented
  • Metadata discipline: each chunk needs source type + slug metadata for attribution and fallback paths
  • Fallback logic: exact entity asks need lexical paths, not just semantic vectors

That last point is why "tell me about traderlaunchpad" can work reliably even when semantic match is ambiguous.

For a deeper retrieval-centric breakdown, see RAG in Practice.

Why this is not "RAG-only" (and why that is normal)

A common question is: if everything is indexed, why not rely only on vector search?

The short answer: pure semantic retrieval is great for broad questions, but weaker for exact entity resolution and conversational follow-ups.

In practice, these are different query shapes:

  • Semantic ask: "How do you approach production AI quality?" -> vector retrieval works well.
  • Canonical entity ask: "Tell me about TraderLaunchpad." -> deterministic slug/title lookup is more reliable.
  • Conversational reference: "What about part 1?" -> requires disambiguation + conversation-aware matching.

So the architecture intentionally uses a hybrid pattern:

  • RAG for semantic grounding and synthesis
  • metadata and lexical matching for exact names/slugs/titles
  • deterministic fallback retrieval paths for high-precision lookups

That is not a workaround. It is a common production design choice for assistants that need both flexibility and correctness.

Why I tested two AI providers (OpenAI and Google)

I explicitly tested both providers because model behavior and embedding behavior differ in production, even when both are "good."

What I wanted to compare:

  • answer quality on architecture-heavy prompts
  • retrieval attribution consistency
  • streaming behavior in the UI
  • operational tradeoffs (latency, cost, and iteration speed)

Practical provider differences I designed around

  • Embedding shape and defaults differ, so I treat embedding dimensions as a contract and re-index when changing strategies.
  • Streaming behavior can differ by route/runtime path, so I validated the end-to-end stream path in the exact agent flow.
  • Prompt sensitivity varies, so I kept regression evals around prompt changes regardless of provider.

The final system is provider-flexible by design: settings drive model/provider selection instead of hardcoded one-provider assumptions.

Streaming text: UX design, not just a backend feature

Streaming is not "on/off." It has UX implications:

  • users should see immediate response start (fast first token)
  • text should appear smoothly, not in large jarring chunks
  • scroll behavior must stay pinned as content grows
  • virtualization must measure dynamic message height correctly

My streaming pipeline

  • backend streams response deltas through the agent path
  • frontend maps message statuses (pending/streaming) correctly
  • bubble renderer uses smooth text presentation for assistant output
  • virtualized message list follows streaming content without scroll drift

If any one of those layers is off, users perceive streaming as "broken" even if tokens are technically streaming from the model.

Why I added TanStack virtualization

Version 1 of the chat bubble rendered every message directly in the list. It worked fine for short sessions, but I started planning for heavier usage: what happens when someone keeps the thread alive for 100+ messages?

That changed the design target from "looks good now" to "stays smooth under long-running conversations."

V1 behavior (no virtualization)

In the non-virtualized version, long chats created predictable problems:

  • message list DOM grew linearly with conversation length
  • scroll performance degraded as more rich message blocks accumulated
  • dynamic streaming updates forced more frequent layout recalculations
  • auto-scroll behavior became less reliable during long sessions

V2 behavior (TanStack Virtual)

I moved to a virtualized message list using TanStack Virtual so only visible rows (plus overscan) are mounted.

That gave me:

  • stable scrolling even with large histories
  • lower render work during streaming updates
  • better memory behavior on long sessions
  • cleaner path to support very long threads without rewriting UI architecture

The tricky part was dynamic row height while assistant text streams in. I had to make sure row measurement and "stick-to-bottom" logic cooperated, otherwise you get overlap or the viewport stops following new tokens.

This is exactly why I wanted virtualization in the shared AI UI package rather than app-local one-off code: once solved, every chat surface can inherit the same long-thread behavior.

Frontend chat bubble implementation

The chat bubble is built from reusable AI UI primitives plus portfolio-specific behavior:

  • shared message rendering and virtualized list from the AI package
  • portfolio-specific shell (PortfolioChatBubble) for trigger, drawer, CTA, and project-card behavior
  • mobile/desktop behavior differences (including open-state handling and navigation behavior)

Why this split worked

  • shared components solve hard generic problems once (streaming, virtualization, markdown rendering)
  • app-level component owns UX choices and branding
  • cross-app consistency improves without forcing identical UI

Backend and /admin/ai interface

The /admin/ai surface exists to keep AI behavior tunable without code edits every time:

  • provider/model selection
  • embedding configuration
  • RAG indexing and re-index workflows
  • quality/operational visibility hooks

From an architecture perspective, admin and frontend are two views over the same underlying AI plugin:

  • Frontend chat consumes the runtime behavior
  • Admin configures and inspects that behavior

This is important because AI systems are never "set and forget." You need operational control loops.

How admin and frontend share one AI plugin/component

I think about this as one system with two entry points:

  • User entry point: chat bubble in portfolio UI
  • Operator entry point: admin controls and diagnostics

Both use the same core AI component contracts:

  • thread/message model
  • provider and embedding settings
  • retrieval/indexing actions
  • guardrail and telemetry surfaces

That structure is what makes iteration fast: I can tune behavior in admin and immediately validate user-facing impact in chat.

What Part 1 covers vs Part 2

Part 1 (this post) explains how I assembled the system:

  • component architecture
  • Convex agent runtime
  • RAG and provider strategy
  • streaming and UI integration
  • admin/frontend composition

Part 2 covers how I operated and improved it in production:

  • what metrics I tracked
  • what failed
  • what changed
  • how those changes affected quality, cost, and latency

Continue to Part 2: How I Built a Custom AI for My Portfolio (Part 2).

Want to see how this was built?

Browse all posts