PostgreSQL Vector Search: pgvector Tricks I Wish I Knew Sooner

📅 April 28, 2026
👁 ... views

Last month, a colleague spent three weeks integrating Pinecone into our API. Three weeks. New API keys, a new SDK, a new service to monitor, a new billing line item. All to store embeddings and run similarity search.

I asked: “We already have PostgreSQL. Have you looked at pgvector?”

He hadn’t. Nobody on the team had.

This is the state of AI infrastructure in 2026: we’re spinning up whole new databases for vector search while sitting on a perfectly capable one.

I’ve been using pgvector in production for six months now. Here’s what I learned — and the tricks I wish someone had told me on day one.

Why pgvector, Why Now

PostgreSQL crossed 55% developer adoption in 2026. It’s the default database for a reason: reliable, ACID-compliant, and the ecosystem around it is unmatched. When the pgvector extension appeared, most people treated it as a neat demo. That was a mistake.

pgvector turns your existing PostgreSQL instance into a vector database. Same connection string, same connection pooling, same backup strategy, same pg_dump. You’re not adding operational complexity — you’re adding a column type and an index.

If you’re already running PostgreSQL, the marginal cost of adding vector search is nearly zero.

Some will say dedicated vector databases (Pinecone, Weaviate, Milvus) offer more features. They’re not wrong — if you need billion-scale indexes with sub-millisecond latency and distributed sharding, those tools earn their keep. But for 90% of applications — RAG pipelines, semantic search, recommendation systems — pgvector is more than enough.

Setup: The 3-Minute Installation

On a standard PostgreSQL 16 installation:

-- Enable the extension (requires postgresql-16-vector package)
CREATE EXTENSION IF NOT EXISTS vector;

-- Verify it's loaded
SELECT * FROM pg_extension WHERE extname = 'vector';

That’s it. No new service. No new credentials. No new infrastructure.

Now create a table with a vector column:

CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    embedding VECTOR(384), -- 384-dim from all-MiniLM-L6-v2
    created_at TIMESTAMP DEFAULT NOW()
);

The VECTOR(dimensions) type stores dense embeddings. Pick the dimension to match your embedding model:

  • 384all-MiniLM-L6-v2 (sentence-transformers, lightweight)
  • 768text-embedding-3-small (OpenAI)
  • 1024text-embedding-3-large (OpenAI)
  • 1536 → older OpenAI models (ada-002)

Trick #1: The Right Distance Metric Matters More Than You Think

pgvector supports three distance operators. Choosing wrong is the #1 performance mistake I see:

-- 1. Cosine distance (<=>) — best for normalized embeddings
SELECT title, embedding <=> '[0.1, 0.2, ...]' AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 5;

-- 2. Inner product (<#>) — best when you want magnitude to matter
SELECT title, embedding <#> '[0.1, 0.2, ...]' AS score
FROM documents
ORDER BY embedding <#> '[0.1, 0.2, ...]'
LIMIT 5;

-- 3. Euclidean distance (<->) — best for raw spatial distance
SELECT title, embedding <-> '[0.1, 0.2, ...]' AS distance
FROM documents
ORDER BY embedding <-> '[0.1, 0.2, ...]'
LIMIT 5;

Here’s what actually matters: most embedding models normalize their output, which makes cosine distance equivalent to dot product but with clearer semantics (0 = identical, 2 = opposite). I use <=> (cosine) for every project and haven’t regretted it once.

The inner product operator <#> is trickier — it returns negative values for similarity, so you order by it directly without negation. It’s faster for unnormalized vectors, but honestly, just normalize your embeddings and save yourself the mental overhead.

Trick #2: IVFFLAT Indexes — The Speed/Recall Tradeoff

Without an index, pgvector does a sequential scan. On a few thousand rows, that’s fine. On 500K rows, you’ll be waiting.

-- Create an IVFFLAT index (inverted file with flat lists)
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

The lists parameter is the key. Rule of thumb: lists = rows / 1000. For 100K rows, use lists = 100. For 1M rows, use lists = 1000.

But here’s what nobody tells you: IVFFLAT indexes sacrifice recall for speed. With the default ef_search (or probes for IVFFLAT), you get ~95% recall at 10x speed. If you need 99.9% recall, increase the probes:

-- Before querying, set the number of probes to check
SET ivfflat.probes = 10;

-- Then run your query — slower but more accurate
SELECT title FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 5;

I run probes = 5 for search (fast, good enough) and probes = 20 for analytics (slower, precise). The right setting depends on your use case — don’t just set it and forget it.

This is where pgvector shines. You can combine traditional PostgreSQL full-text search with vector similarity in a single query:

-- Full-text search + vector similarity, weighted combination
WITH fts AS (
    SELECT id, title,
           ts_rank(to_tsvector('english', content),
                   plainto_tsquery('english', 'database optimization')) AS text_score
    FROM documents
    WHERE to_tsvector('english', content)
          @@ plainto_tsquery('english', 'database optimization')
),
vec AS (
    SELECT id,
           1 - (embedding <=> '[0.1, 0.2, ...]') AS vec_score
    FROM documents
)
SELECT fts.title,
       (0.4 * fts.text_score + 0.6 * vec.vec_score) AS combined_score
FROM fts
JOIN vec ON fts.id = vec.id
ORDER BY combined_score DESC
LIMIT 10;

The weights (0.4 / 0.6) are tunable. I start with 40% text, 60% vector because semantic matching catches synonyms and related concepts that keyword search misses. But for technical documentation with precise terminology, I bump text to 60%.

The counter-argument here is obvious: “Why not just use vector search alone?” Because vector search struggles with exact matches. Search for “PostgreSQL 16 release notes” and a purely semantic model might return results about “database versioning history” instead of the actual release notes document. Full-text anchors the results to the literal terms.

Trick #4: Metadata Filtering That Doesn’t Kill Performance

Pure vector databases often struggle with pre-filtering. PostgreSQL doesn’t:

-- Filter by metadata BEFORE vector search — and it uses composite indexes
SELECT title, embedding <=> '[0.1, 0.2, ...]' AS similarity
FROM documents
WHERE created_at > '2026-01-01'
  AND title ILIKE '%postgresql%'
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 5;

This is the killer feature. Add any WHERE clause — date ranges, tags, user permissions, tenant IDs — and PostgreSQL’s query planner handles it. No need for separate filter pipelines or post-filtering that breaks your result count.

For high-performance filtering, create a partial index:

CREATE INDEX documents_recent_vectors ON documents
USING ivfflat (embedding vector_cosine_ops)
WHERE created_at > '2026-01-01'
WITH (lists = 50);

PostgreSQL will use this index only when the query predicate matches the partial condition. It’s like having a pre-filtered vector index for free.

Trick #5: Batch Insert Without the ORM Tax

Most ORMs handle vector columns poorly. Insert one row at a time and watch your latency spike. Here’s the fast path:

-- Use COPY for bulk embedding inserts
COPY documents (title, content, embedding) FROM STDIN WITH (FORMAT csv);
-- Then provide CSV rows with embeddings as array strings:
-- "Doc 1","Content here","[0.1,0.2,...,0.384]"
-- "Doc 2","More content","[0.4,0.5,...,0.6]"

In Python with psycopg:

import psycopg
from psycopg.rows import dict_row

conn = psycopg.connect("postgresql://localhost:5432/mydb")

# Batch insert 1000 embeddings in one transaction
docs = [
    ("Title 1", "Content 1", "[0.1, 0.2, ...]"),
    ("Title 2", "Content 2", "[0.3, 0.4, ...]"),
    # ... hundreds more
]

with conn.cursor() as cur:
    cur.executemany(
        "INSERT INTO documents (title, content, embedding) VALUES (%s, %s, %s::vector)",
        docs
    )
    conn.commit()

Using %s::vector cast inline is faster than letting the ORM convert. On 10K rows, the difference between row-by-row ORM inserts and batched executemany is about 40x. I measured it.

What pgvector Still Can’t Do

I’m opinionated but not delusional. Here’s where dedicated vector databases still win:

  • Billion-scale indexes — pgvector handles millions well. Billions need sharding and distributed architecture.
  • Managed infrastructure — Pinecone is fully managed. With pgvector, you manage the PostgreSQL instance.
  • Real-time index updates at massive scale — IVFFLAT requires periodic rebuild for optimal performance with streaming data. HNSW (the other index type in pgvector) handles updates better but uses more memory.

If any of those apply to you, fine — use a dedicated vector database. But don’t reach for one because “that’s what the tutorial uses.”

Bottom Line

PostgreSQL with pgvector is the most practical vector database for teams already running PostgreSQL. You get:

  1. One database instead of two — simpler ops, fewer things to break at 2 AM
  2. Hybrid search out of the box — combine full-text, metadata filters, and vector search
  3. SQL everywhere — your team already knows it, no new query language to learn
  4. Zero marginal infrastructure cost — it’s just an extension

I’ve replaced two separate vector database evaluations with pgvector this year. Both times, it handled the workload without issue.

Are you still spinning up separate vector databases for your AI projects? What convinced you — or what would convince you to consolidate?



📚 Continue Reading

Supporting the blog through affiliate links (at no extra cost to you):

💡

Enjoying the content? Here are tools I personally use and recommend:

  • 🌐 Hosting: Bluehost — what this blog runs on
  • 🛒 Tech Gear: My Amazon Store — keyboards, monitors, dev tools I use

Purchases through my links help keep this blog ad-free 💙

Enjoyed this post?

Subscribe to the newsletter or follow on YouTube for more dev content.

🎬 Watch Shorts