launchthat

Vector Dimensions in Production RAG: 768 vs 1536 vs 3072

A practical guide to embedding dimensions using our Portal support retrieval pipeline and shared support/AI components: how dimension choice changes quality, latency, storage, and migration strategy.

Jul 3, 2026Desmond Tatilian

Most teams treat embedding dimension like a model detail. In production RAG, it is a system design decision.

Choosing between 768, 1536, or 3072 dimensions is not just about model quality. It directly changes:

Retrieval precision and recall
Index size and storage cost
Query latency under load
Reindexing complexity during migrations

For the grounding and citation reliability side of this same system, read RAG in Practice.

In our stack, this tradeoff shows up in two concrete places:

apps/portal/convex/plugins/support/rag.ts (Portal support retrieval pipeline)
packages/core/ai/src/convex/component/actions.ts (shared support/AI component used across apps)

Why dimension is a first-class architecture choice

An embedding vector is a compressed representation of meaning. Higher dimension gives the model more room to encode nuance, but every extra dimension has cost:

more numbers stored per chunk
more math for every similarity comparison
larger index working set in memory

At small scale, that cost is easy to ignore. At production support scale, it becomes part of your latency and spend profile.

Reference system 1: Portal support RAG

In apps/portal/convex/plugins/support/rag.ts, the support pipeline makes the embedding provider explicit and ties it to dimension defaults.

That file demonstrates two important production patterns:

Provider-aware embedding strategy
- OpenAI path and Google path are handled differently.
- Google embedding requests pass outputDimensionality directly to Gemini embedding endpoints.
Model fallback and compatibility hardening
- The Google path attempts compatible model candidates.
- This reduces outage risk when model naming or endpoint behavior shifts.

The practical impact is that Portal support can choose a dimension profile intentionally (for example, lower-dimensional vectors for speed-sensitive support retrieval) without rewriting the entire RAG surface.

Reference system 2: Shared support/AI component

In packages/core/ai/src/convex/component/actions.ts, embedding behavior is driven by saved AI settings (provider, embeddingModel, embeddingDimension) rather than hardcoded one-offs.

That gives you a clean operator workflow:

switch provider in admin settings
set embedding model
set embedding dimension / output dimensionality
reindex into the intended namespace

It also lets one component support multiple app contexts with different retrieval constraints.

768 vs 1536 vs 3072 in practice

768 dimensions

Best when:

you need lower query latency
storage footprint is a concern
your retrieval corpus is repetitive (support docs, short structured articles)

Tradeoff:

slightly less semantic separation on nuanced queries

1536 dimensions

Best when:

you want a balanced default for quality vs cost
corpus includes mixed content depth (docs + long-form posts + implementation notes)

Tradeoff:

roughly higher compute and index footprint than 768

3072 dimensions

Best when:

you are optimizing retrieval quality first
query complexity is high and semantic ambiguity is expensive

Tradeoff:

highest index and query cost profile
only worth it when your evals show clear wins

Performance reality: what changes as dimensions increase

Dimension growth impacts both offline and online paths:

Indexing path: higher memory and throughput pressure during bulk indexing
Search path: more arithmetic per candidate vector
Infra profile: larger hot index set, higher cache pressure

In short: if two dimension settings produce similar retrieval metrics on your eval set, pick the smaller one.

The production rule most teams forget

Dimensions are namespace/index contracts.

If you switch from 768 to 1536 (or any other change), you cannot safely compare vectors produced under different dimension contracts in the same namespace. Migration should be handled as a controlled cutover:

Create or target a clean namespace for the new dimension
Reindex full corpus
Validate retrieval metrics (MRR, Recall@k, attribution quality)
Switch traffic
Decommission old namespace

Both reference implementations above are designed to support this pattern.

What we use as a default

For most production support surfaces, 1536 is a strong default because it balances precision and operational cost. We move down to 768 when latency/cost dominate and eval quality remains acceptable. We move up only when measured retrieval failures justify the added footprint.

This is the key: choose dimensions from eval outcomes, not intuition.

Final takeaway

Model choice gets attention, but dimension choice determines operational behavior.

If your RAG system feels expensive, slow, or inconsistent, inspect embedding dimensions before rewriting retrieval logic. In many cases, the fastest reliability gain is not a new framework, but a better dimensionality contract backed by repeatable evals.

Want to see how this was built?

Browse all posts