Skip to content
Performance Tuning

The Economy Of Embeddings

Follow one vector from creation to search, put a price on every step, and you find the bill is set long before your first query runs. The embeddings economy, evaluated through pgvector.

D
16 min read

Vector search is now part of many applications, but its real cost is easy to get wrong. Teams plan for the embedding model and for storage. Then the database bill arrives, and it is many times bigger than they expected. This surprise can be avoided.

The cost of an embedding is not one number. It is four separate costs, and they are very different in size. Postgres decides how big each one is — by how it stores large values, how it builds its index, and how it answers searches.

This article evaluates the economy of embeddings, using Postgres and pgvector as the worked example. It follows one vector, from the moment it is made to the moment it is searched, and at each step puts a dollar price on it. The prices are round, typical cloud rates for mid-2026. Use your own provider’s prices instead, and the lesson stays the same — because what matters is how the costs compare to each other, not the exact figures.

By the end, one thing should be clear: embeddings are a memory business.

The rate card

Every cloud charges for the same three things. We use round, typical numbers:

ResourceWhat it isTypical price
RAMmemory on a Postgres server≈ $11 per GB per month
Diskfast disk attached to the database≈ $0.10 per GB per month
Embedding APIthe model that turns text into a vector≈ $0.02 per million tokens (standard 1536-dim model)

Look at the gap: one gigabyte of RAM costs about 110 times more than one gigabyte of disk. This single fact explains almost everything below.

An embedding has four separate costs, and Postgres decides how big each one is:

#The costWhat it is
1Make the vectora small, one-time cost
2Store it on diska small monthly cost
3Keep the index in RAMthe big cost
4Run a searchtiny, but only because of #3

Cost #1 — Making the vector

You send text to an embedding model, and it gives back numbers. You pay per token. Assume about 500 tokens per document.

CorpusTokensOne-time cost
1 million docs500M$10
10 million docs5B$100

What Postgres does: nothing yet — it just receives a vector(1536) and stores it. Making the vector is a cost of the model, not the database. And you pay it only once (unless you switch to a new model later and have to redo all of them).

Cost 1 of 4 — Make the vector

$100 once

A one-time charge from the model, not the database — almost never the part that hurts your budget.

Cost #2 — Storing the vector on disk

A vector(1536) takes up 6,152 bytes — an 8-byte header plus 1,536 numbers of 4 bytes each.

What Postgres does: a normal table row is meant to stay under about 2 KB. A 6 KB vector is far bigger than that. So Postgres moves it out of the main table and into a side table called TOAST. (pgvector keeps these uncompressed, because squeezing random numbers wastes CPU for almost no gain.) This is why a vector table looks small on its own but is large in total:

SELECT
  pg_size_pretty(pg_relation_size('docs'))        AS heap_only,
  pg_size_pretty(pg_total_relation_size('docs'))  AS total_with_toast_and_index;

The HNSW index adds roughly the same volume again — about 1.2 to 1.3 times the raw vectors in practice (it stores its own copy of every vector, plus the graph links). For ten million that is around 77 GB of index on top of ~61.5 GB of vectors.

CorpusVectors + index on diskDisk cost / month
1 million~14 GB$1.40
10 million~138 GB$13.80

Cost 2 of 4 — Store on disk

$14 / month

Cheap — vectors and index both sit on disk for almost nothing. The catch: the index must also be held in RAM to search fast, and that is Cost #3.

Cost #3 — Keeping the index in RAM (the big cost)

What Postgres does: an HNSW search is a walk across a graph. Each step jumps to a random place in the index and reads a stored vector to measure distance. Random jumps are fast in RAM but slow on disk — once the index no longer fits in memory, those reads can make a search ten to a hundred times slower, stretching a few-millisecond query into tens or hundreds of milliseconds. So for search to stay fast, the index must stay in memory (in shared_buffers or the operating system’s cache). In practice you size the server to keep the whole working set hot — the index plus the vectors it serves, about 138 GB for ten million — and RAM is the most expensive thing on the rate card.

It is fair to ask whether this still holds now that disks are NVMe. It does. NVMe is far quicker than the spinning disks that earned disk its reputation, but RAM is still roughly 100 times quicker again per random access — and a graph search pays that gap on every hop. Each hop has to finish before the next can begin, so there is nothing to prefetch: the high bandwidth NVMe is good at does not help, only latency does. And the disk that managed Postgres actually runs on is usually network-attached, where that latency is higher still.

Match the memory to the data, then price it at $11 per GB per month:

10 million rows of…RAM neededServer sizePer month
boolean~10 MB16 GB$176
int4~40 MB16 GB$176
vector(1536) (float)~138 GB256 GB$2,816

This is the heart of the cost. The same 10 million rows, stored as a simple true/false flag, run on a $176 server. Stored as embeddings, they need a $2,816 server — about 16 times more — only because the index is large and must stay in RAM.

You can run leaner. Strictly, only the index — about 77 GB — has to stay hot for search; the full vectors are read mainly for re-ranking and writes. Size RAM to the index alone and a 128 GB server (about $1,408 a month) often does the job. The figures here size to the whole working set, which is the safer choice; pick whichever matches how hard you push the box.

The reason goes straight back to the rate card: RAM costs about 110 times more than disk per gigabyte, and embeddings are exactly the kind of data that needs many gigabytes of RAM. So embeddings are a memory problem first, a CPU problem second, and a disk problem barely at all.

Cost 3 of 4 — Keep the index in RAM

$2,816 / month

The big cost. The same ten million rows, now embeddings, need sixteen times the server — because the index must live in memory.

What Postgres does: comparing two vectors is not one quick step — it is a loop over every number in the vector.

First, what is a “dimension”? An embedding is just a long list of numbers. A vector(1536) is a list of 1,536 numbers. In this section, N is simply that count — for a vector(1536), N = 1536.

To measure the distance between two vectors, the database walks all N numbers and does a little math at each one. The two common measures, and the work each one costs:

DistanceWork per numberTotal for N = 1536
Straight-line (L2)3 steps — subtract, square, add~4,600 steps
Cosine6 steps~9,200 steps

So comparing two 1536-number vectors takes roughly 4,600 to 9,200 steps — against a single step to compare two booleans or integers. The work grows in a straight line with the vector’s length: twice the numbers, twice the work, on every comparison.

A plain search repeats this for every row. Over 10 million rows that is 10,000,000 × ~4,600 ≈ 46 billion steps for one search — about a second on a single CPU core, far too slow to use. The HNSW index avoids most of that work by comparing your search against only a few hundred vectors — often just a few dozen — instead of all ten million. (Postgres uses the index only when the search vector is a fixed value.)

So you do not pay for each search. You pay for the server, and it handles many searches. The real cost of one search is just server cost ÷ number of searches it handles. At a steady 100 searches per second — about 260 million a month — the $2,816 server works out to roughly $0.0000108 per search, about a thousandth of a cent.

Cost 4 of 4 — Run a search

$0.0000108 / search

A thousandth of a cent — but only because Cost #3 keeps the index in RAM. Query cost and RAM cost are the same cost, seen from two sides.

Before we cut the cost, the four facts that decide everything:

#What to keep in mind
1An embedding has four costs — and only one is large: the index held in RAM.
2RAM costs about 110× more than disk. That one ratio sets the bill.
3The index must stay in memory. Let it fall to disk and every search slows to a crawl.
4Store one bit per number and re-rank the survivors — the same workload drops about , from $2,816 to $352 a month.

How to cut the cost: store each number more roughly

We saw that the big cost is RAM, and RAM is needed because the vectors are large. So the way to save money is simple to say: make the vectors smaller. The question is how to make them smaller without ruining the search.

Here is the key idea. Each number is normally stored very precisely — 4 bytes, far more detail than you need to tell two embeddings apart. Store a rougher version and the search still works. There are three levels of roughness, each smaller than the last:

LevelDetail per numberSize of a 1536-vectorThe trade-off
Full (vector)4 bytes6,152 bytesmost exact, most expensive — the default
Half (halfvec)2 bytes3,080 byteshalf the RAM, results barely change
Binary (bit)1 bit192 bytes~32× smaller, but rough

Binary is the strongest cut. Instead of the number, keep one fact about it — was it positive or negative? — and write a 1 or a 0. A 1536-number vector becomes 1536 bits, just 192 bytes. That tiny size is what lets the index fit on a small, cheap server, and comparing two strings of bits is far faster than looping over a full vector: you just count how many bits differ (its name is Hamming distance).

The catch: binary is rough. It finds the right neighborhood, but can get the exact order slightly wrong. And how well it holds up depends on the data — high-dimensional embeddings like OpenAI’s quantize well, but some datasets do not, so measure recall on your own vectors before trusting it.

The fix — search in two steps. This is the technique that gives you both the low cost and good accuracy:

  1. Narrow down with binary. Use the tiny, fast binary vectors to quickly pick the ~200 best-looking rows out of all 10 million. This step is fast and cheap because the vectors are so small.
  2. Check the survivors carefully. Take only those 200 rows and compare them properly using the full vectors. Comparing 200 rows carefully is fine — comparing all 10 million carefully is exactly what you were trying to avoid.

You get binary’s low cost and the full vector’s accuracy. In SQL, the inner query is the fast binary shortlist, and the outer query re-checks just those rows:

-- step 1 (inner): fast binary shortlist of 200 rows
-- step 2 (outer): careful re-check of just those 200
SELECT id, embedding <=> :query_vec AS distance
FROM (
  SELECT id, embedding
  FROM docs
  ORDER BY emb_bit <~> :query_bit   -- compare bits (fast, rough)
  LIMIT 200
) candidates
ORDER BY distance                    -- compare full vectors (careful)
LIMIT 10;

What each level costs (10 million documents)

Storage choiceSize of one vectorRAM to keep hotServer neededServer cost / month
Full detail (vector)6,152 bytes~138 GB256 GB$2,816
Half detail (halfvec)3,080 bytes~69 GB128 GB$1,408
Binary + re-rank (bit)192 bytes~6 GB32 GB$352

Read the table top to bottom: each time you store the numbers more roughly, the vector gets smaller, the index gets smaller, the server you need gets smaller, and the monthly bill gets smaller with it.

The fix — store one bit per number

$2,816 → $352 / month

An 8× cut — about $29,600 a year — with most of the search quality intact. Keep one bit per number instead of thirty-two, then re-rank the survivors against the full vectors.

The whole economy on one page (10 million documents)

Every price below is just an amount multiplied by a rate-card rate — nothing more. The “How it’s priced” column shows the exact math, so no number is a mystery.

StageWhat happensHow it’s pricedFull detailBinary (+ re-rank)
Maketext is turned into vectors, oncetokens ÷ 1M × $0.02$100 (one time)$100 (one time)
Store on diskthe full vectors and index sit on diskGB on disk × $0.10/mo$13.80 /mo~$6.60 /mo †
Keep in RAMthe working set is held in memory for fast searchGB in RAM × $11/mo$2,816 /mo$352 /mo
Searcheach search does distance mathserver cost ÷ searches handled$0.0000108 /search$0.0000014 /search
Yearly server costRAM + disk, over 12 months(RAM + disk) × 12$33,958 /yr$4,303 /yr

The two big rows are the same index at two prices, because disk and memory are priced about 110× apart:

  • Store on disk = bytes on disk × the disk rate: ~138 GB (61.5 GB vectors + 77 GB index) × $0.10 = $13.80 a month. (TOAST is just the side table where Postgres keeps the big vectors — where the bytes live, not an extra charge.)
  • Keep in RAM = working-set size × the memory rate: ~138 GB rounds up to a 256 GB server × $11/GB = $2,816 a month.

So you pay for the index twice — a little to rest on disk, a lot to stay in RAM. That RAM copy is the real expense, and shrinking it is the whole point of binary quantization. By comparison, 10 million plain values or integers need about $176 a month — almost nothing next to the embeddings.

† Binary quantization shrinks RAM, not disk. Re-ranking still needs the full-precision vectors, so they stay on disk — the binary column and its small index are added on top, so disk actually grows a little. What collapses is the index held in RAM (Cost #3): the binary HNSW index is roughly 32× smaller, so it fits on a far cheaper server. That RAM saving is the whole story; the disk line barely moves.

Price your own database

Measure your real sizes, then multiply by the rate card. This works on any provider:

-- Your actual footprint
SELECT
  pg_size_pretty(pg_total_relation_size('docs'))           AS data_total,
  pg_size_pretty(pg_relation_size('docs_embedding_hnsw'))  AS hnsw_index;

Then:

QuantityHow to compute it
RAM to buyindex size + hot data + spare room, then round up to the next instance size your cloud offers (e.g. 138 GB → a 256 GB server)
Monthly RAM costRAM_GB × $11 (or your provider’s per-GB RAM rate)
Monthly disk costtotal_GB × $0.10
One-time making cost(docs × tokens_per_doc / 1,000,000) × $0.02

The number that decides everything is the first line: how many gigabytes of index must stay in RAM. That single number is your main monthly cost. The rest is small change.

Summary

Four costs, and only one of them shapes the bill. Making the vectors and resting them on disk are rounding errors. The expense is memory: an HNSW search needs fast random jumps across its graph, so the whole index must stay in RAM — and RAM costs about 110 times more than disk. That single fact is why the same ten million rows cost a $176 server as plain values and a $2,816 server as embeddings.

The one lever that moves it is how many bits you keep per number. Store one bit instead of thirty-two, re-rank the survivors against the full vectors, and the bill falls to about $352 a month with most of the search quality intact.

So before you write a query, ask one question: how big is the index, and how much of it must stay in memory? Multiply that by your RAM rate and you have your main monthly cost. Embeddings are a memory business, and sizing the memory is sizing the bill.

References