The Economy Of Embeddings
Follow one vector from creation to search, put a price on every step, and you find the bill is set long before your first query runs. The embeddings economy, evaluated through pgvector.
Vector search is now part of many applications, but its real cost is easy to get wrong. Teams plan for the embedding model and for storage. Then the database bill arrives, and it is many times bigger than they expected. This surprise can be avoided.
The cost of an embedding is not one number. It is four separate costs, and they are very different in size. Postgres decides how big each one is — by how it stores large values, how it builds its index, and how it answers searches.
This article evaluates the economy of embeddings, using Postgres and pgvector as the worked example. It follows one vector, from the moment it is made to the moment it is searched, and at each step puts a dollar price on it. The prices are round, typical cloud rates for mid-2026. Use your own provider’s prices instead, and the lesson stays the same — because what matters is how the costs compare to each other, not the exact figures.
By the end, one thing should be clear: embeddings are a memory business.
The rate card
Every cloud charges for the same three things. We use round, typical numbers:
| Resource | What it is | Typical price |
|---|---|---|
| RAM | memory on a Postgres server | ≈ $11 per GB per month |
| Disk | fast disk attached to the database | ≈ $0.10 per GB per month |
| Embedding API | the model that turns text into a vector | ≈ $0.02 per million tokens (standard 1536-dim model) |
Look at the gap: one gigabyte of RAM costs about 110 times more than one gigabyte of disk. This single fact explains almost everything below.
An embedding has four separate costs, and Postgres decides how big each one is:
| # | The cost | What it is |
|---|---|---|
| 1 | Make the vector | a small, one-time cost |
| 2 | Store it on disk | a small monthly cost |
| 3 | Keep the index in RAM | the big cost |
| 4 | Run a search | tiny, but only because of #3 |
Cost #1 — Making the vector
You send text to an embedding model, and it gives back numbers. You pay per token. Assume about 500 tokens per document.
| Corpus | Tokens | One-time cost |
|---|---|---|
| 1 million docs | 500M | $10 |
| 10 million docs | 5B | $100 |
What Postgres does: nothing yet — it just receives a vector(1536) and stores it. Making the vector is a cost of the model, not the database. And you pay it only once (unless you switch to a new model later and have to redo all of them).
Cost 1 of 4 — Make the vector
$100 once
A one-time charge from the model, not the database — almost never the part that hurts your budget.
Cost #2 — Storing the vector on disk
A vector(1536) takes up 6,152 bytes — an 8-byte header plus 1,536 numbers of 4 bytes each.
What Postgres does: a normal table row is meant to stay under about 2 KB. A 6 KB vector is far bigger than that. So Postgres moves it out of the main table and into a side table called TOAST. (pgvector keeps these uncompressed, because squeezing random numbers wastes CPU for almost no gain.) This is why a vector table looks small on its own but is large in total:
SELECT
pg_size_pretty(pg_relation_size('docs')) AS heap_only,
pg_size_pretty(pg_total_relation_size('docs')) AS total_with_toast_and_index;
The HNSW index adds roughly the same volume again — about 1.2 to 1.3 times the raw vectors in practice (it stores its own copy of every vector, plus the graph links). For ten million that is around 77 GB of index on top of ~61.5 GB of vectors.
| Corpus | Vectors + index on disk | Disk cost / month |
|---|---|---|
| 1 million | ~14 GB | $1.40 |
| 10 million | ~138 GB | $13.80 |
Cost 2 of 4 — Store on disk
$14 / month
Cheap — vectors and index both sit on disk for almost nothing. The catch: the index must also be held in RAM to search fast, and that is Cost #3.
Cost #3 — Keeping the index in RAM (the big cost)
What Postgres does: an HNSW search is a walk across a graph. Each step jumps to a random place in the index and reads a stored vector to measure distance. Random jumps are fast in RAM but slow on disk — once the index no longer fits in memory, those reads can make a search ten to a hundred times slower, stretching a few-millisecond query into tens or hundreds of milliseconds. So for search to stay fast, the index must stay in memory (in shared_buffers or the operating system’s cache). In practice you size the server to keep the whole working set hot — the index plus the vectors it serves, about 138 GB for ten million — and RAM is the most expensive thing on the rate card.
It is fair to ask whether this still holds now that disks are NVMe. It does. NVMe is far quicker than the spinning disks that earned disk its reputation, but RAM is still roughly 100 times quicker again per random access — and a graph search pays that gap on every hop. Each hop has to finish before the next can begin, so there is nothing to prefetch: the high bandwidth NVMe is good at does not help, only latency does. And the disk that managed Postgres actually runs on is usually network-attached, where that latency is higher still.
Match the memory to the data, then price it at $11 per GB per month:
| 10 million rows of… | RAM needed | Server size | Per month |
|---|---|---|---|
boolean | ~10 MB | 16 GB | $176 |
int4 | ~40 MB | 16 GB | $176 |
vector(1536) (float) | ~138 GB | 256 GB | $2,816 |
This is the heart of the cost. The same 10 million rows, stored as a simple true/false flag, run on a $176 server. Stored as embeddings, they need a $2,816 server — about 16 times more — only because the index is large and must stay in RAM.
You can run leaner. Strictly, only the index — about 77 GB — has to stay hot for search; the full vectors are read mainly for re-ranking and writes. Size RAM to the index alone and a 128 GB server (about $1,408 a month) often does the job. The figures here size to the whole working set, which is the safer choice; pick whichever matches how hard you push the box.
The reason goes straight back to the rate card: RAM costs about 110 times more than disk per gigabyte, and embeddings are exactly the kind of data that needs many gigabytes of RAM. So embeddings are a memory problem first, a CPU problem second, and a disk problem barely at all.
Cost 3 of 4 — Keep the index in RAM
$2,816 / month
The big cost. The same ten million rows, now embeddings, need sixteen times the server — because the index must live in memory.
Cost #4 — Running a search
What Postgres does: comparing two vectors is not one quick step — it is a loop over every number in the vector.
First, what is a “dimension”? An embedding is just a long list of numbers. A vector(1536) is a list of 1,536 numbers. In this section, N is simply that count — for a vector(1536), N = 1536.
To measure the distance between two vectors, the database walks all N numbers and does a little math at each one. The two common measures, and the work each one costs:
| Distance | Work per number | Total for N = 1536 |
|---|---|---|
| Straight-line (L2) | 3 steps — subtract, square, add | ~4,600 steps |
| Cosine | 6 steps | ~9,200 steps |
So comparing two 1536-number vectors takes roughly 4,600 to 9,200 steps — against a single step to compare two booleans or integers. The work grows in a straight line with the vector’s length: twice the numbers, twice the work, on every comparison.
A plain search repeats this for every row. Over 10 million rows that is 10,000,000 × ~4,600 ≈ 46 billion steps for one search — about a second on a single CPU core, far too slow to use. The HNSW index avoids most of that work by comparing your search against only a few hundred vectors — often just a few dozen — instead of all ten million. (Postgres uses the index only when the search vector is a fixed value.)
So you do not pay for each search. You pay for the server, and it handles many searches. The real cost of one search is just server cost ÷ number of searches it handles. At a steady 100 searches per second — about 260 million a month — the $2,816 server works out to roughly $0.0000108 per search, about a thousandth of a cent.
Cost 4 of 4 — Run a search
$0.0000108 / search
A thousandth of a cent — but only because Cost #3 keeps the index in RAM. Query cost and RAM cost are the same cost, seen from two sides.
Before we cut the cost, the four facts that decide everything:
| # | What to keep in mind |
|---|---|
| 1 | An embedding has four costs — and only one is large: the index held in RAM. |
| 2 | RAM costs about 110× more than disk. That one ratio sets the bill. |
| 3 | The index must stay in memory. Let it fall to disk and every search slows to a crawl. |
| 4 | Store one bit per number and re-rank the survivors — the same workload drops about 8×, from $2,816 to $352 a month. |
How to cut the cost: store each number more roughly
We saw that the big cost is RAM, and RAM is needed because the vectors are large. So the way to save money is simple to say: make the vectors smaller. The question is how to make them smaller without ruining the search.
Here is the key idea. Each number is normally stored very precisely — 4 bytes, far more detail than you need to tell two embeddings apart. Store a rougher version and the search still works. There are three levels of roughness, each smaller than the last:
| Level | Detail per number | Size of a 1536-vector | The trade-off |
|---|---|---|---|
Full (vector) | 4 bytes | 6,152 bytes | most exact, most expensive — the default |
Half (halfvec) | 2 bytes | 3,080 bytes | half the RAM, results barely change |
Binary (bit) | 1 bit | 192 bytes | ~32× smaller, but rough |
Binary is the strongest cut. Instead of the number, keep one fact about it — was it positive or negative? — and write a 1 or a 0. A 1536-number vector becomes 1536 bits, just 192 bytes. That tiny size is what lets the index fit on a small, cheap server, and comparing two strings of bits is far faster than looping over a full vector: you just count how many bits differ (its name is Hamming distance).
The catch: binary is rough. It finds the right neighborhood, but can get the exact order slightly wrong. And how well it holds up depends on the data — high-dimensional embeddings like OpenAI’s quantize well, but some datasets do not, so measure recall on your own vectors before trusting it.
The fix — search in two steps. This is the technique that gives you both the low cost and good accuracy:
- Narrow down with binary. Use the tiny, fast binary vectors to quickly pick the ~200 best-looking rows out of all 10 million. This step is fast and cheap because the vectors are so small.
- Check the survivors carefully. Take only those 200 rows and compare them properly using the full vectors. Comparing 200 rows carefully is fine — comparing all 10 million carefully is exactly what you were trying to avoid.
You get binary’s low cost and the full vector’s accuracy. In SQL, the inner query is the fast binary shortlist, and the outer query re-checks just those rows:
-- step 1 (inner): fast binary shortlist of 200 rows
-- step 2 (outer): careful re-check of just those 200
SELECT id, embedding <=> :query_vec AS distance
FROM (
SELECT id, embedding
FROM docs
ORDER BY emb_bit <~> :query_bit -- compare bits (fast, rough)
LIMIT 200
) candidates
ORDER BY distance -- compare full vectors (careful)
LIMIT 10;
What each level costs (10 million documents)
| Storage choice | Size of one vector | RAM to keep hot | Server needed | Server cost / month |
|---|---|---|---|---|
Full detail (vector) | 6,152 bytes | ~138 GB | 256 GB | $2,816 |
Half detail (halfvec) | 3,080 bytes | ~69 GB | 128 GB | $1,408 |
Binary + re-rank (bit) | 192 bytes | ~6 GB | 32 GB | $352 |
Read the table top to bottom: each time you store the numbers more roughly, the vector gets smaller, the index gets smaller, the server you need gets smaller, and the monthly bill gets smaller with it.
The fix — store one bit per number
$2,816 → $352 / month
An 8× cut — about $29,600 a year — with most of the search quality intact. Keep one bit per number instead of thirty-two, then re-rank the survivors against the full vectors.
The whole economy on one page (10 million documents)
Every price below is just an amount multiplied by a rate-card rate — nothing more. The “How it’s priced” column shows the exact math, so no number is a mystery.
| Stage | What happens | How it’s priced | Full detail | Binary (+ re-rank) |
|---|---|---|---|---|
| Make | text is turned into vectors, once | tokens ÷ 1M × $0.02 | $100 (one time) | $100 (one time) |
| Store on disk | the full vectors and index sit on disk | GB on disk × $0.10/mo | $13.80 /mo | ~$6.60 /mo † |
| Keep in RAM | the working set is held in memory for fast search | GB in RAM × $11/mo | $2,816 /mo | $352 /mo |
| Search | each search does distance math | server cost ÷ searches handled | $0.0000108 /search | $0.0000014 /search |
| Yearly server cost | RAM + disk, over 12 months | (RAM + disk) × 12 | $33,958 /yr | $4,303 /yr |
The two big rows are the same index at two prices, because disk and memory are priced about 110× apart:
- Store on disk = bytes on disk × the disk rate: ~138 GB (61.5 GB vectors + 77 GB index) × $0.10 = $13.80 a month. (TOAST is just the side table where Postgres keeps the big vectors — where the bytes live, not an extra charge.)
- Keep in RAM = working-set size × the memory rate: ~138 GB rounds up to a 256 GB server × $11/GB = $2,816 a month.
So you pay for the index twice — a little to rest on disk, a lot to stay in RAM. That RAM copy is the real expense, and shrinking it is the whole point of binary quantization. By comparison, 10 million plain values or integers need about $176 a month — almost nothing next to the embeddings.
† Binary quantization shrinks RAM, not disk. Re-ranking still needs the full-precision vectors, so they stay on disk — the binary column and its small index are added on top, so disk actually grows a little. What collapses is the index held in RAM (Cost #3): the binary HNSW index is roughly 32× smaller, so it fits on a far cheaper server. That RAM saving is the whole story; the disk line barely moves.
Price your own database
Measure your real sizes, then multiply by the rate card. This works on any provider:
-- Your actual footprint
SELECT
pg_size_pretty(pg_total_relation_size('docs')) AS data_total,
pg_size_pretty(pg_relation_size('docs_embedding_hnsw')) AS hnsw_index;
Then:
| Quantity | How to compute it |
|---|---|
| RAM to buy | index size + hot data + spare room, then round up to the next instance size your cloud offers (e.g. 138 GB → a 256 GB server) |
| Monthly RAM cost | RAM_GB × $11 (or your provider’s per-GB RAM rate) |
| Monthly disk cost | total_GB × $0.10 |
| One-time making cost | (docs × tokens_per_doc / 1,000,000) × $0.02 |
The number that decides everything is the first line: how many gigabytes of index must stay in RAM. That single number is your main monthly cost. The rest is small change.
Summary
Four costs, and only one of them shapes the bill. Making the vectors and resting them on disk are rounding errors. The expense is memory: an HNSW search needs fast random jumps across its graph, so the whole index must stay in RAM — and RAM costs about 110 times more than disk. That single fact is why the same ten million rows cost a $176 server as plain values and a $2,816 server as embeddings.
The one lever that moves it is how many bits you keep per number. Store one bit instead of thirty-two, re-rank the survivors against the full vectors, and the bill falls to about $352 a month with most of the search quality intact.
So before you write a query, ask one question: how big is the index, and how much of it must stay in memory? Multiply that by your RAM rate and you have your main monthly cost. Embeddings are a memory business, and sizing the memory is sizing the bill.
References
- pgvector — project repository and README: https://github.com/pgvector/pgvector
- pgvector 0.7.0 — changelog: https://github.com/pgvector/pgvector/blob/master/CHANGELOG.md
- pgvector 0.7.0 — release announcement: https://www.postgresql.org/about/news/pgvector-070-released-2852
- Jonathan Katz — Scalar and binary quantization for pgvector: https://jkatz05.com/post/postgres/pgvector-scalar-binary-quantization/
- Supabase — What’s new in pgvector 0.7.0: https://supabase.com/blog/pgvector-0-7-0
- Crunchy Data — HNSW indexes with Postgres and pgvector (on keeping the index in memory): https://www.crunchydata.com/blog/hnsw-indexes-with-postgres-and-pgvector
- pgvector — HNSW QPS degradation as the index grows beyond memory (issue #700): https://github.com/pgvector/pgvector/issues/700