launchthat

How I Built a Custom AI for My Portfolio (Part 1): Architecture and Implementation

Part 1 of my portfolio AI series: Convex agent component design, RAG fundamentals, OpenAI vs Google provider testing, streaming UX, and how the frontend chat bubble and /admin/ai tooling live inside one reusable AI plugin.

Jul 5, 2026Desmond Tatilian

This is Part 1 of a two-part series on my portfolio AI system.

Part 1 (this post): technical architecture and implementation
Part 2: monitoring, eval, optimization, and production tuning

Read Part 2 here: How I Built a Custom AI for My Portfolio (Part 2).

Why I built it as a reusable component instead of one-off app code

I did not want AI logic scattered across app-specific files. I wanted one reusable AI layer that I could plug into different products (portfolio, LaunchThatBot, and future apps) without rewriting core behavior each time.

So I structured it as a plugin/component architecture:

shared AI backend behaviors in a core package (packages/core/ai)
app-specific content and orchestration in apps/portfolio/convex/*
shared UI primitives for chat rendering, virtualization, and streaming in packages/core/ai/src/components/*
app-level chat shell in apps/portfolio/src/components/chat/*

That split is what lets the portfolio app feel custom while still reusing the same AI foundation as other surfaces in the monorepo.

Core runtime choice: Convex agent component

I built around Convex's agent patterns because I needed:

persistent thread/message state
server-side actions and scheduling
reactive client updates
strong typing from backend to frontend
streaming-friendly UX hooks

What the agent component includes in practice

At a high level, the agent component gives me:

Thread/message lifecycle: create thread, append messages, update status (pending, streaming, done)
Tool/action orchestration: structured calls for retrieval and response generation
Streaming support path: token deltas can be surfaced live to the UI
Shared settings model: provider/model/embedding configuration from admin controls
Reusable hooks: clients can subscribe to thread messages and render streaming state safely

This lets me keep business-specific logic (portfolio retrieval and presentation) separate from baseline agent plumbing.

RAG in this system: simple concept, strict implementation

The retrieval concept is standard:

embed the user query
search indexed portfolio content
inject top-ranked context
generate answer with source-grounded prompt instructions

But in implementation, there are three details that matter:

Indexing scope: projects, skills, experience, and blog posts all need to be represented
Metadata discipline: each chunk needs source type + slug metadata for attribution and fallback paths
Fallback logic: exact entity asks need lexical paths, not just semantic vectors

That last point is why "tell me about traderlaunchpad" can work reliably even when semantic match is ambiguous.

For a deeper retrieval-centric breakdown, see RAG in Practice.

Why this is not "RAG-only" (and why that is normal)

A common question is: if everything is indexed, why not rely only on vector search?

The short answer: pure semantic retrieval is great for broad questions, but weaker for exact entity resolution and conversational follow-ups.

In practice, these are different query shapes:

Semantic ask: "How do you approach production AI quality?" -> vector retrieval works well.
Canonical entity ask: "Tell me about TraderLaunchpad." -> deterministic slug/title lookup is more reliable.
Conversational reference: "What about part 1?" -> requires disambiguation + conversation-aware matching.

So the architecture intentionally uses a hybrid pattern:

RAG for semantic grounding and synthesis
metadata and lexical matching for exact names/slugs/titles
deterministic fallback retrieval paths for high-precision lookups

That is not a workaround. It is a common production design choice for assistants that need both flexibility and correctness.

Why I tested two AI providers (OpenAI and Google)

I explicitly tested both providers because model behavior and embedding behavior differ in production, even when both are "good."

What I wanted to compare:

answer quality on architecture-heavy prompts
retrieval attribution consistency
streaming behavior in the UI
operational tradeoffs (latency, cost, and iteration speed)

Practical provider differences I designed around

Embedding shape and defaults differ, so I treat embedding dimensions as a contract and re-index when changing strategies.
Streaming behavior can differ by route/runtime path, so I validated the end-to-end stream path in the exact agent flow.
Prompt sensitivity varies, so I kept regression evals around prompt changes regardless of provider.

The final system is provider-flexible by design: settings drive model/provider selection instead of hardcoded one-provider assumptions.

Streaming text: UX design, not just a backend feature

Streaming is not "on/off." It has UX implications:

users should see immediate response start (fast first token)
text should appear smoothly, not in large jarring chunks
scroll behavior must stay pinned as content grows
virtualization must measure dynamic message height correctly

My streaming pipeline

backend streams response deltas through the agent path
frontend maps message statuses (pending/streaming) correctly
bubble renderer uses smooth text presentation for assistant output
virtualized message list follows streaming content without scroll drift

If any one of those layers is off, users perceive streaming as "broken" even if tokens are technically streaming from the model.

Why I added TanStack virtualization

Version 1 of the chat bubble rendered every message directly in the list. It worked fine for short sessions, but I started planning for heavier usage: what happens when someone keeps the thread alive for 100+ messages?

That changed the design target from "looks good now" to "stays smooth under long-running conversations."

V1 behavior (no virtualization)

In the non-virtualized version, long chats created predictable problems:

message list DOM grew linearly with conversation length
scroll performance degraded as more rich message blocks accumulated
dynamic streaming updates forced more frequent layout recalculations
auto-scroll behavior became less reliable during long sessions

V2 behavior (TanStack Virtual)

I moved to a virtualized message list using TanStack Virtual so only visible rows (plus overscan) are mounted.

That gave me:

stable scrolling even with large histories
lower render work during streaming updates
better memory behavior on long sessions
cleaner path to support very long threads without rewriting UI architecture

The tricky part was dynamic row height while assistant text streams in. I had to make sure row measurement and "stick-to-bottom" logic cooperated, otherwise you get overlap or the viewport stops following new tokens.

This is exactly why I wanted virtualization in the shared AI UI package rather than app-local one-off code: once solved, every chat surface can inherit the same long-thread behavior.

Frontend chat bubble implementation

The chat bubble is built from reusable AI UI primitives plus portfolio-specific behavior:

shared message rendering and virtualized list from the AI package
portfolio-specific shell (PortfolioChatBubble) for trigger, drawer, CTA, and project-card behavior
mobile/desktop behavior differences (including open-state handling and navigation behavior)

Why this split worked

shared components solve hard generic problems once (streaming, virtualization, markdown rendering)
app-level component owns UX choices and branding
cross-app consistency improves without forcing identical UI

Backend and /admin/ai interface

The /admin/ai surface exists to keep AI behavior tunable without code edits every time:

provider/model selection
embedding configuration
RAG indexing and re-index workflows
quality/operational visibility hooks

From an architecture perspective, admin and frontend are two views over the same underlying AI plugin:

Frontend chat consumes the runtime behavior
Admin configures and inspects that behavior

This is important because AI systems are never "set and forget." You need operational control loops.

How admin and frontend share one AI plugin/component

I think about this as one system with two entry points:

User entry point: chat bubble in portfolio UI
Operator entry point: admin controls and diagnostics

Both use the same core AI component contracts:

thread/message model
provider and embedding settings
retrieval/indexing actions
guardrail and telemetry surfaces

That structure is what makes iteration fast: I can tune behavior in admin and immediately validate user-facing impact in chat.

What Part 1 covers vs Part 2

Part 1 (this post) explains how I assembled the system:

component architecture
Convex agent runtime
RAG and provider strategy
streaming and UI integration
admin/frontend composition

Part 2 covers how I operated and improved it in production:

what metrics I tracked
what failed
what changed
how those changes affected quality, cost, and latency

Continue to Part 2: How I Built a Custom AI for My Portfolio (Part 2).

Want to see how this was built?

Continue to Part 2: Monitoring and Optimization

Want to see how this was built?

Browse all posts