launchthat
Vector Dimensions in Production RAG: 768 vs 1536 vs 3072
A practical guide to embedding dimensions using our Portal support retrieval pipeline and shared support/AI components: how dimension choice changes quality, latency, storage, and migration strategy.
Most teams treat embedding dimension like a model detail. In production RAG, it is a system design decision.
Choosing between 768, 1536, or 3072 dimensions is not just about model quality. It directly changes:
- Retrieval precision and recall
- Index size and storage cost
- Query latency under load
- Reindexing complexity during migrations
For the grounding and citation reliability side of this same system, read RAG in Practice.
In our stack, this tradeoff shows up in two concrete places:
apps/portal/convex/plugins/support/rag.ts(Portal support retrieval pipeline)packages/core/ai/src/convex/component/actions.ts(shared support/AI component used across apps)
Why dimension is a first-class architecture choice
An embedding vector is a compressed representation of meaning. Higher dimension gives the model more room to encode nuance, but every extra dimension has cost:
- more numbers stored per chunk
- more math for every similarity comparison
- larger index working set in memory
At small scale, that cost is easy to ignore. At production support scale, it becomes part of your latency and spend profile.
Reference system 1: Portal support RAG
In apps/portal/convex/plugins/support/rag.ts, the support pipeline makes the embedding provider explicit and ties it to dimension defaults.
That file demonstrates two important production patterns:
-
Provider-aware embedding strategy
- OpenAI path and Google path are handled differently.
- Google embedding requests pass
outputDimensionalitydirectly to Gemini embedding endpoints.
-
Model fallback and compatibility hardening
- The Google path attempts compatible model candidates.
- This reduces outage risk when model naming or endpoint behavior shifts.
The practical impact is that Portal support can choose a dimension profile intentionally (for example, lower-dimensional vectors for speed-sensitive support retrieval) without rewriting the entire RAG surface.
Reference system 2: Shared support/AI component
In packages/core/ai/src/convex/component/actions.ts, embedding behavior is driven by saved AI settings (provider, embeddingModel, embeddingDimension) rather than hardcoded one-offs.
That gives you a clean operator workflow:
- switch provider in admin settings
- set embedding model
- set embedding dimension / output dimensionality
- reindex into the intended namespace
It also lets one component support multiple app contexts with different retrieval constraints.
768 vs 1536 vs 3072 in practice
768 dimensions
Best when:
- you need lower query latency
- storage footprint is a concern
- your retrieval corpus is repetitive (support docs, short structured articles)
Tradeoff:
- slightly less semantic separation on nuanced queries
1536 dimensions
Best when:
- you want a balanced default for quality vs cost
- corpus includes mixed content depth (docs + long-form posts + implementation notes)
Tradeoff:
- roughly higher compute and index footprint than 768
3072 dimensions
Best when:
- you are optimizing retrieval quality first
- query complexity is high and semantic ambiguity is expensive
Tradeoff:
- highest index and query cost profile
- only worth it when your evals show clear wins
Performance reality: what changes as dimensions increase
Dimension growth impacts both offline and online paths:
- Indexing path: higher memory and throughput pressure during bulk indexing
- Search path: more arithmetic per candidate vector
- Infra profile: larger hot index set, higher cache pressure
In short: if two dimension settings produce similar retrieval metrics on your eval set, pick the smaller one.
The production rule most teams forget
Dimensions are namespace/index contracts.
If you switch from 768 to 1536 (or any other change), you cannot safely compare vectors produced under different dimension contracts in the same namespace. Migration should be handled as a controlled cutover:
- Create or target a clean namespace for the new dimension
- Reindex full corpus
- Validate retrieval metrics (MRR, Recall@k, attribution quality)
- Switch traffic
- Decommission old namespace
Both reference implementations above are designed to support this pattern.
What we use as a default
For most production support surfaces, 1536 is a strong default because it balances precision and operational cost. We move down to 768 when latency/cost dominate and eval quality remains acceptable. We move up only when measured retrieval failures justify the added footprint.
This is the key: choose dimensions from eval outcomes, not intuition.
Final takeaway
Model choice gets attention, but dimension choice determines operational behavior.
If your RAG system feels expensive, slow, or inconsistent, inspect embedding dimensions before rewriting retrieval logic. In many cases, the fastest reliability gain is not a new framework, but a better dimensionality contract backed by repeatable evals.
Want to see how this was built?
Browse all posts