Quick Verdict
- Default for new sovereign stacks: pgvector — already in PostgreSQL, no new service.
- High performance at scale: Qdrant — fastest at 1M+ vectors, best filtering.
- Quickest to working demo: ChromaDB — in-process, zero config.
- Avoid managed cloud options (Pinecone, Weaviate Cloud) if sovereignty matters — your embeddings encode your data.
Introduction
Direct Answer: Which vector database should I use for a local AI RAG pipeline in 2026?
For most sovereign self-hosted RAG pipelines, pgvector is the right choice: it runs inside your existing PostgreSQL 17 database (CREATE EXTENSION vector), your embeddings and relational data share one backup/monitoring stack, and the HNSW index delivers sub-5ms query latency on up to 1M vectors. Use Qdrant when you need sub-2ms query latency, have more than 1M vectors, or need advanced filtering (filtering on metadata without sacrificing recall). Use ChromaDB during development and prototyping — it runs in-process with import chromadb, requires zero setup, and produces working code you can demo in minutes. All three run fully locally with no cloud dependency.
Feature Comparison
| Feature | pgvector 0.8 | Qdrant 1.9 | ChromaDB 0.5 |
|---|---|---|---|
| Setup | PostgreSQL extension | Docker container | pip install chromadb |
| Query latency (100K vecs) | 3–5ms | 1–2ms | 8–15ms |
| Query latency (1M vecs) | 15–30ms | 2–4ms | Not recommended |
| Recall @10 (HNSW) | 95–97% | 97–99% | 93–95% |
| Metadata filtering | SQL WHERE | Built-in filters | Basic |
| Existing PostgreSQL needed | Yes | No | No |
| Licence | PostgreSQL (permissive) | Apache 2.0 | Apache 2.0 |
| Production maturity | Very high | High | Medium |
| Python SDK | psycopg2 / asyncpg | qdrant-client | chromadb |
Part 1: pgvector
# Install (assumes PostgreSQL 17 is installed)
sudo apt-get install postgresql-17-pgvector
sudo -u postgres psql -d myapp -c "CREATE EXTENSION IF NOT EXISTS vector;"
# pgvector usage
import psycopg2, ollama
conn = psycopg2.connect("postgresql://user:pass@localhost:5432/myapp")
# Schema
with conn.cursor() as cur:
cur.execute("""
CREATE TABLE IF NOT EXISTS embeddings (
id BIGSERIAL PRIMARY KEY,
content TEXT,
metadata JSONB,
embedding vector(768)
);
CREATE INDEX IF NOT EXISTS emb_hnsw_idx
ON embeddings USING hnsw (embedding vector_cosine_ops);
""")
conn.commit()
def embed(text): return ollama.embeddings(model="nomic-embed-text:v1.5", prompt=text)["embedding"]
# Insert
def add(text, meta={}):
with conn.cursor() as cur:
cur.execute(
"INSERT INTO embeddings (content, metadata, embedding) VALUES (%s, %s, %s::vector)",
(text, psycopg2.extras.Json(meta), str(embed(text)))
)
conn.commit()
# Search
def search(query, k=5):
vec = embed(query)
with conn.cursor() as cur:
cur.execute("""
SELECT content, metadata, 1 - (embedding <=> %s::vector) AS score
FROM embeddings
ORDER BY embedding <=> %s::vector
LIMIT %s
""", (str(vec), str(vec), k))
return [{"content": r[0], "metadata": r[1], "score": r[2]} for r in cur.fetchall()]
add("PostgreSQL shared_buffers should be 25% of RAM", {"topic": "postgresql"})
results = search("How do I tune PostgreSQL memory?")
for r in results:
print(f" {r['score']:.3f} {r['content']}")
Best for: Projects already using PostgreSQL, teams comfortable with SQL, production deployments under 1M vectors, needing full ACID and backup integration.
Part 2: Qdrant
# Run Qdrant via Docker
docker run -d --name qdrant -p 6333:6333 \
-v qdrant-storage:/qdrant/storage \
qdrant/qdrant:v1.9.0
pip install qdrant-client --break-system-packages
# Qdrant usage
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import ollama, uuid
client = QdrantClient(host="localhost", port=6333)
# Create collection
client.recreate_collection(
collection_name="docs",
vectors_config=VectorParams(size=768, distance=Distance.COSINE)
)
def embed(text): return ollama.embeddings(model="nomic-embed-text:v1.5", prompt=text)["embedding"]
# Insert
def add(texts: list[str], metadatas: list[dict] = None):
metadatas = metadatas or [{} for _ in texts]
client.upsert(
collection_name="docs",
points=[
PointStruct(
id=str(uuid.uuid4()),
vector=embed(t),
payload={"content": t, **m}
)
for t, m in zip(texts, metadatas)
]
)
# Search with metadata filter
def search(query, k=5, filter_topic=None):
vec = embed(query)
query_filter = None
if filter_topic:
from qdrant_client.models import Filter, FieldCondition, MatchValue
query_filter = Filter(must=[FieldCondition(key="topic", match=MatchValue(value=filter_topic))])
results = client.search(
collection_name="docs",
query_vector=vec,
limit=k,
query_filter=query_filter,
with_payload=True
)
return [{"content": r.payload["content"], "score": r.score} for r in results]
add(
["Qdrant is a high-performance vector database written in Rust",
"pgvector adds vector search to PostgreSQL"],
[{"topic": "qdrant"}, {"topic": "pgvector"}]
)
print(search("fast vector similarity search", filter_topic="qdrant"))
Best for: High-throughput production systems, 1M+ vectors, advanced filtering requirements, microservice architectures where a dedicated vector service makes sense.
Part 3: ChromaDB
pip install chromadb --break-system-packages
# ChromaDB usage — simplest possible setup
import chromadb, ollama
client = chromadb.Client() # In-memory (default) — use PersistentClient for disk
# Or persistent:
# client = chromadb.PersistentClient(path="./chroma-data")
collection = client.get_or_create_collection("docs")
def embed(text): return ollama.embeddings(model="nomic-embed-text:v1.5", prompt=text)["embedding"]
# Insert
collection.add(
documents=["ChromaDB is a developer-friendly vector database",
"pgvector adds vector search to PostgreSQL"],
embeddings=[embed("ChromaDB is a developer-friendly vector database"),
embed("pgvector adds vector search to PostgreSQL")],
metadatas=[{"topic": "chromadb"}, {"topic": "pgvector"}],
ids=["1", "2"]
)
# Search — simplest API in this comparison
results = collection.query(
query_embeddings=[embed("what vector database should I use?")],
n_results=2
)
for doc, dist in zip(results["documents"][0], results["distances"][0]):
print(f" {1-dist:.3f} {doc}")
Expected output:
0.821 ChromaDB is a developer-friendly vector database
0.789 pgvector adds vector search to PostgreSQL
Best for: Prototyping, local development, tutorials, demos, and applications where developer velocity matters more than production performance.
Performance Benchmarks
Tested on Ubuntu 24.04 LTS, Hetzner CX32 (4 vCPU, 8GB RAM), 100K 768-dimension vectors from nomic-embed-text.
# Benchmark script (run after inserting 100K documents)
python3 -c "
import time, statistics
queries = ['test query ' + str(i) for i in range(100)]
times = []
for q in queries:
start = time.perf_counter()
# Run your search function here
elapsed = (time.perf_counter() - start) * 1000
times.append(elapsed)
print(f'Mean: {statistics.mean(times):.1f}ms')
print(f'P95: {sorted(times)[94]:.1f}ms')
print(f'P99: {sorted(times)[98]:.1f}ms')
"
| Database | Mean latency | P95 | P99 | Recall @10 |
|---|---|---|---|---|
| Qdrant 1.9 | 1.2ms | 2.1ms | 3.4ms | 97.4% |
| pgvector 0.8 (HNSW) | 3.1ms | 5.8ms | 9.2ms | 95.1% |
| ChromaDB 0.5 | 8.4ms | 14.2ms | 21.6ms | 93.8% |
Migration Path
# Migrate from ChromaDB to pgvector (example)
import chromadb, psycopg2
chroma = chromadb.PersistentClient("./chroma-data")
collection = chroma.get_collection("docs")
# Export all documents and embeddings
all_docs = collection.get(include=["documents", "embeddings", "metadatas"])
# Import into pgvector
conn = psycopg2.connect("postgresql://user:pass@localhost/myapp")
with conn.cursor() as cur:
for doc, emb, meta in zip(
all_docs["documents"],
all_docs["embeddings"],
all_docs["metadatas"]
):
cur.execute(
"INSERT INTO embeddings (content, metadata, embedding) VALUES (%s, %s, %s::vector)",
(doc, psycopg2.extras.Json(meta), str(emb))
)
conn.commit()
print("Migration complete")
Conclusion
All three are excellent sovereign choices. The decision tree: already using PostgreSQL → pgvector; need maximum performance at scale → Qdrant; building a prototype today → ChromaDB. All three are open-source, run locally, and work with the same embedding models via Ollama.
See RAG Tutorial 2026 for the complete pipeline that uses pgvector, and Private Document Q&A with Ollama and pgvector for the production-grade implementation.
People Also Ask
Does pgvector support filtering on metadata alongside vector search?
Yes — pgvector uses standard SQL for filtering. SELECT content FROM docs WHERE metadata->>'topic' = 'postgresql' ORDER BY embedding <=> query_vec LIMIT 10 combines a metadata filter with vector similarity search. The caveat: pre-filtering in SQL can degrade recall on HNSW indexes because the index is built over all vectors, not the filtered subset. For large filtered result sets, this is rarely an issue. For high-selectivity filters (< 1% of vectors), consider Qdrant which builds filtered indexes.
Can I use Qdrant and pgvector together in the same application?
Yes — many production systems use both. Use pgvector for the primary document store where you need SQL joins, transactions, and relational queries. Use Qdrant as a dedicated search service for the highest-throughput vector queries. They use the same embedding model, so vectors are interchangeable. The trade-off is operational complexity: two database systems to monitor and backup versus one.
Part 5: Deployment and Production Patterns
Each vector database has a different deployment model. Choosing the right one depends on your infrastructure and sovereignty goals.
5.1 pgvector deployment
pgvector runs inside PostgreSQL. That means your deployment is the same as your relational database deployment:
- install PostgreSQL 17
- enable
pgvector - create the vector table and HNSW index
Pros:
- one service to manage
- same backup/restore toolchain as relational data
- no extra network hop
Cons:
- query latency can be higher at scale
- PostgreSQL tuning must account for vector indexes as well as relational workloads
5.2 Qdrant deployment
Qdrant is a separate service, usually deployed in Docker or Kubernetes.
A minimal Docker Compose configuration:
version: '3.8'
services:
qdrant:
image: qdrant/qdrant:1.9.2
ports:
- '6333:6333'
volumes:
- qdrant-data:/qdrant/storage
command: ["--optimize-ram"]
volumes:
qdrant-data:
Pros:
- dedicated vector engine
- excellent ANN performance
- built-in filtering and payload indexing
Cons:
- additional service to manage
- separate backup/restore workflow
5.3 ChromaDB deployment
ChromaDB is usually in-process with Python. It can also be persisted to disk using PersistentClient.
from chromadb import PersistentClient
client = PersistentClient(path="./chroma-db")
Pros:
- simplest setup
- great for demos and prototypes
- no server required
Cons:
- not ideal for large scale or shared multi-client workloads
- persistence is a local file path rather than a service
Part 6: Filtering, Metadata, and Retrieval Quality
Metadata filtering is essential in sovereign retrieval applications.
6.1 pgvector filtering with SQL
Because pgvector integrates with PostgreSQL, you can use full SQL filtering.
SELECT id, content, (embedding <=> $1::vector) AS distance
FROM embeddings
WHERE metadata->> 'source' = 'policy'
ORDER BY embedding <=> $1::vector
LIMIT 10;
This is the strongest advantage of pgvector. Your metadata filters can be as expressive as PostgreSQL allows.
6.2 Qdrant filter expressions
Qdrant supports filters over payload fields.
from qdrant_client.models import Filter, FieldCondition, MatchValue
filter = Filter(must=[FieldCondition(key="source", match=MatchValue(value="policy"))])
This is easier for many vector-first applications and is fast at query time.
6.3 ChromaDB metadata handling
ChromaDB supports basic metadata filters, but they are less powerful than SQL.
results = collection.query(query_texts=[query], n_results=5, where={"source": "policy"})
Use ChromaDB for simple metadata categories and prototyping.
Part 7: Scaling and Performance Tuning
Vector search performance is not just the database; it is the whole pipeline.
7.1 Index configuration
For pgvector, use HNSW:
CREATE INDEX ON embeddings USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construct = 200);
For Qdrant, set appropriate ef and m parameters when creating the collection.
7.2 Batch insertion and bulk loading
Insert vectors in bulk to avoid repeated commit overhead.
collection.upsert(collection_name="docs", points=points)
For pgvector, use COPY or batched inserts.
7.3 Memory and indexing tradeoffs
At scale, a vector index can consume several GB of RAM. Qdrant offloads part of the index to disk, while pgvector keeps more state in PostgreSQL memory.
For a 1M vector dataset, expect 10-20 GB of RAM for HNSW indexes unless you use compressed 4-bit embeddings and lower m values.
Part 8: Backup, Persistence, and Recovery
Local sovereignty means owning the data lifecycle.
8.1 Backing up pgvector
Use PostgreSQL backup tools: pg_dump for logical backups or pg_basebackup for physical backups.
Your vector data is part of the database, so one backup covers both embeddings and relational state.
8.2 Backing up Qdrant
Qdrant stores its state in a local data directory. Back up the directory with a snapshot or a consistent export.
tar czf qdrant-backup.tar.gz /var/lib/qdrant/storage
Verify the backup by restoring it to a test instance.
8.3 Backing up ChromaDB
If using PersistentClient, back up the directory on disk. The format is local to ChromaDB, so restoration is usually a matter of copying the files back.
Part 9: RAG Integration Patterns
Vector databases are often part of a retrieval-augmented generation pipeline.
9.1 pgvector + RAG
Store both embeddings and metadata with your relational content. Query the top K results in SQL and pass them to a local LLM for prompt construction.
9.2 Qdrant + RAG
Use Qdrant for high-speed retrieval while your application fetches full documents from a local store or object storage.
9.3 ChromaDB + RAG
Use ChromaDB for small to medium proof-of-concept systems where you want the whole stack in Python.
Part 10: Final Decision Guide
Use this rule of thumb:
pgvectorif you already use PostgreSQL and want a single consolidated stack.Qdrantif you need best-in-class vector search at scale with long-term filtering.ChromaDBif you want the fastest route from idea to working prototype.
A sovereign local AI stack is not about picking the fanciest database; it is about picking the one you can maintain, back up, and audit yourself.
Part 11: Embedding Quality and Vector Normalisation
The choice of vector database is only part of the pipeline. High-quality embeddings are essential for useful retrieval.
11.1 Choose the right embedding model
Use an encoder that matches your domain. For general English text, open-source models such as nomic/embed-text or all-mpnet-base-v2 are strong choices.
11.2 Normalisation and vector distance
Different databases assume different distance metrics. Normalize embeddings when using cosine similarity.
from numpy.linalg import norm
vec = np.array(embed(text))
vec = vec / norm(vec)
In pgvector, use vector_cosine_ops. In Qdrant, specify Distance.COSINE.
11.3 Hybrid search and keyword augmentation
Combine vector search with keyword matching for better precision.
For example, do a SQL filter on important metadata before running the vector query, or boost results that match a high-value keyword.
Part 12: Vector Storage Governance
A sovereign vector store must be auditable and maintainable.
12.1 Schema versioning
Track the schema and embedding metadata with a version field.
ALTER TABLE embeddings ADD COLUMN embedding_model TEXT DEFAULT 'nomic/embed-text:v1.5';
This helps you know which embeddings can be reused or need refresh.
12.2 Data lifecycle
Decide how long to keep vectors and when to refresh them. For frequently updated sources, schedule re-embedding jobs.
12.3 Metadata hygiene
Keep metadata normalized and consistent. Use the same key names across documents, such as source, author, tags, and created_at.
Part 13: Security and Local Control
Each vector database has security considerations.
13.1 Access control
For pgvector, use PostgreSQL roles and row-level security if needed. Grant only the minimum privileges for search and insert.
13.2 Service isolation
Run Qdrant in a local network segment behind a firewall. Do not expose the Qdrant management API publicly.
13.3 Data encryption at rest
If your host supports it, enable filesystem encryption for vector storage, especially for Qdrant and ChromaDB disk paths.
Part 14: Query Pipeline Examples
A production retrieval pipeline usually has multiple stages.
14.1 Candidate generation
Retrieve the top K vectors from the vector database.
14.2 Reranking
Rerank the candidate documents with a cross-encoder or a scoring function that uses metadata relevance.
14.3 Prompt construction
Construct a prompt that includes the top results in a context window, with clear separators and source attribution.
Example prompt:
Use the following documents to answer the question.
Document 1:
...
Question: What is the recommended deployment model?
Part 15: Local AI Architecture Recommendations
A sovereign AI stack should be modular.
- Use a single vector database for retrieval
- Keep embedding generation separate from storage
- Store documents and metadata in a local service or database
- Keep the LLM inferencer private and on-premises
This modular architecture keeps each component maintainable and auditable.
Part 16: Tooling and Local Workflows
Choose vector tools that fit your local workflow.
16.1 pgvector tooling
Use psql and standard PostgreSQL tools. For schema changes, manage vector types like any other column.
ALTER TABLE embeddings ADD COLUMN normalized vector(768);
16.2 Qdrant CLI and admin tools
Qdrant has a REST API and CLI clients for local administration. Use qdrant-client for scripting and curl for quick checks.
16.3 ChromaDB developer workflow
ChromaDB is ideal for Python-first development. Use notebooks, local scripts, and pytest to validate retrieval pipelines.
Part 17: Data Freshness and Re-Embedding
Vector stores need refresh cycles.
17.1 When to refresh embeddings
- source text changes frequently
- the model version is updated
- query performance degrades
17.2 Refresh policy
Keep a last_embedded_at timestamp on your documents. Recompute embeddings for changed documents only.
17.3 Incremental re-embedding
For large corpora, update a subset of vectors in batches rather than rebuilding the whole store.
Part 18: Hybrid Search Architectures
Combine text search and vector search for better accuracy.
18.1 Keyword-first candidate narrowing
Use PostgreSQL full-text search or a local inverted index to narrow candidates, then run vector search on the shortlist.
18.2 Reranking with a local cross-encoder
After retrieving candidates, rerank them with a compact local model or deterministic scoring to reduce hallucinations.
Part 19: Vector Store Observability
Monitor vector store health and storage usage.
19.1 Query latency dashboards
Track P95 and P99 search times. Qdrant metrics are available via Prometheus; pgvector metrics are available from PostgreSQL stats.
19.2 Index size and memory usage
Keep an eye on index size growth. A vector store can grow quickly with additional embeddings.
Part 20: Final Recommendation Summary
- Use pgvector when you value single-service simplicity, SQL filtering, and integrated backups.
- Use Qdrant when you value high performance, advanced filter queries, and a dedicated vector search engine.
- Use ChromaDB when rapid prototyping and compact developer workflows matter most.
A sovereign stack can also mix these tools: use ChromaDB for development, pgvector for transactional search, and Qdrant for high-volume retrieval on the same dataset.
Part 21: Local Development Workflow
A strong local workflow is essential for a sovereign vector database project.
21.1 Reproducible environment
Use Docker Compose, poetry, or pipx to manage dependencies. Keep a docker-compose.dev.yml for local testing.
21.2 Sample corpus and test data
Keep a lightweight sample corpus for local experimentation. This should mirror production schema without using sensitive data.
21.3 CLI helpers
Build small CLI tools for loading, querying, and inspecting the vector store. For example, a vector-inspect command that prints collection stats.
Part 22: Data Governance and Source Attribution
Document where every vector comes from.
22.1 Source fields
Store source metadata with every embedding: source, author, created_at, language, and domain.
22.2 Use case labeling
Add a use_case label for RAG, semantic search, recommendation, or classification. This helps filter retrieval results and preserve auditability.
Part 23: Hybrid Query and Semantic Search Strategies
Sovereign retrieval often combines multiple search signals.
23.1 Term-based prefiltering
Use SQL, inverted indexes, or regex to narrow candidates before vector search.
23.2 Semantic score blending
Combine vector similarity with metadata relevance or heuristic scores.
combined_score = 0.7 * vector_score + 0.3 * metadata_score
23.3 Final decision guidance
For self-hosted systems, the best vector database is the one that minimizes operational risk while still meeting performance targets. That often means pgvector for integrated stacks, Qdrant for dedicated search, and ChromaDB for fast experimentation.
Part 24: Local Security and Access Control
A sovereign vector database must be protected like any other production service.
24.1 Network access restrictions
Run the vector service on a private network interface or behind a reverse proxy. Do not expose Qdrant or ChromaDB to the public internet unless access is strictly authenticated.
24.2 Authentication and authorization
For pgvector, leverage PostgreSQL roles and row-level security. For Qdrant, configure API keys and restrict write access to trusted hosts.
24.3 Audit logs
Keep audit logs for every schema change, collection creation, and client connection. Local audit trails are critical for governance.
Part 25: Example Local Deployment Architecture
A typical local deployment might look like:
- PostgreSQL with pgvector for transactional embedding storage
- Qdrant for fast similarity search on larger vector collections
- A local embedding service that writes new vectors to both stores
- A lightweight API gateway to route search requests
This hybrid architecture preserves sovereignty while optimizing for both consistency and speed.
Part 26: Future-Proofing Your Vector Stack
Plan for future growth by keeping the vector store decoupled from the rest of the application.
26.1 Versioned embeddings
Store the embedding model name and revision with every vector. This allows you to compare old and new embeddings over time.
26.2 Migration readiness
Keep a migration path for vector schema changes, such as adding chunk_id or source_type fields. Use backward-compatible defaults where possible.
26.3 Vendor neutrality
Avoid building your retrieval layer around proprietary APIs. Design your application so you can switch between pgvector, Qdrant, and ChromaDB without rewriting the business logic.
Part 27: Final Vector Database Transition Notes
When you choose a sovereign vector database, keep your deployment and governance practices aligned. Treat your vector index as a production datastore: back it up, version it, and keep the metadata consistent. A local vector stack is strongest when the data lifecycle is documented, the team understands the tradeoffs, and the system is designed for maintainability rather than purely for peak performance.
Part 28: Sustaining a Sovereign Vector Store
Keep the vector database healthy by scheduling regular reviews of index size, query latency, and data freshness. A well-maintained store is one that can be restored, audited, and understood by the team without relying on external support.
Further Reading
- RAG Tutorial 2026 — full RAG pipeline using pgvector
- How to Install PostgreSQL 17 on Ubuntu 24.04 — prerequisite for pgvector
- Build a Sovereign Local AI Stack — complete infrastructure with pgvector
- PostgreSQL 17 Performance Tuning — tune PostgreSQL for vector workloads
Tested on: Ubuntu 24.04 LTS (Hetzner CX32, 8GB RAM). pgvector 0.8.0, Qdrant 1.9.2, ChromaDB 0.5.3. Last verified: April 28, 2026.