Vucense

pgvector vs Qdrant vs ChromaDB 2026: Best Vector Database for Local AI

Compare sovereign vector databases for local AI: pgvector, Qdrant, and ChromaDB. Covers setup, performance benchmarks, ANN search quality, self-hosted deployment, and when to use each.

pgvector vs Qdrant vs ChromaDB 2026: Best Vector Database for Local AI
Article Roadmap

Quick Verdict

  • Default for new sovereign stacks: pgvector — already in PostgreSQL, no new service.
  • High performance at scale: Qdrant — fastest at 1M+ vectors, best filtering.
  • Quickest to working demo: ChromaDB — in-process, zero config.
  • Avoid managed cloud options (Pinecone, Weaviate Cloud) if sovereignty matters — your embeddings encode your data.

Introduction

Direct Answer: Which vector database should I use for a local AI RAG pipeline in 2026?

For most sovereign self-hosted RAG pipelines, pgvector is the right choice: it runs inside your existing PostgreSQL 17 database (CREATE EXTENSION vector), your embeddings and relational data share one backup/monitoring stack, and the HNSW index delivers sub-5ms query latency on up to 1M vectors. Use Qdrant when you need sub-2ms query latency, have more than 1M vectors, or need advanced filtering (filtering on metadata without sacrificing recall). Use ChromaDB during development and prototyping — it runs in-process with import chromadb, requires zero setup, and produces working code you can demo in minutes. All three run fully locally with no cloud dependency.


Feature Comparison

Featurepgvector 0.8Qdrant 1.9ChromaDB 0.5
SetupPostgreSQL extensionDocker containerpip install chromadb
Query latency (100K vecs)3–5ms1–2ms8–15ms
Query latency (1M vecs)15–30ms2–4msNot recommended
Recall @10 (HNSW)95–97%97–99%93–95%
Metadata filteringSQL WHEREBuilt-in filtersBasic
Existing PostgreSQL neededYesNoNo
LicencePostgreSQL (permissive)Apache 2.0Apache 2.0
Production maturityVery highHighMedium
Python SDKpsycopg2 / asyncpgqdrant-clientchromadb

Part 1: pgvector

# Install (assumes PostgreSQL 17 is installed)
sudo apt-get install postgresql-17-pgvector
sudo -u postgres psql -d myapp -c "CREATE EXTENSION IF NOT EXISTS vector;"
# pgvector usage
import psycopg2, ollama

conn = psycopg2.connect("postgresql://user:pass@localhost:5432/myapp")

# Schema
with conn.cursor() as cur:
    cur.execute("""
        CREATE TABLE IF NOT EXISTS embeddings (
            id BIGSERIAL PRIMARY KEY,
            content TEXT,
            metadata JSONB,
            embedding vector(768)
        );
        CREATE INDEX IF NOT EXISTS emb_hnsw_idx
            ON embeddings USING hnsw (embedding vector_cosine_ops);
    """)
    conn.commit()

def embed(text): return ollama.embeddings(model="nomic-embed-text:v1.5", prompt=text)["embedding"]

# Insert
def add(text, meta={}):
    with conn.cursor() as cur:
        cur.execute(
            "INSERT INTO embeddings (content, metadata, embedding) VALUES (%s, %s, %s::vector)",
            (text, psycopg2.extras.Json(meta), str(embed(text)))
        )
        conn.commit()

# Search
def search(query, k=5):
    vec = embed(query)
    with conn.cursor() as cur:
        cur.execute("""
            SELECT content, metadata, 1 - (embedding <=> %s::vector) AS score
            FROM embeddings
            ORDER BY embedding <=> %s::vector
            LIMIT %s
        """, (str(vec), str(vec), k))
        return [{"content": r[0], "metadata": r[1], "score": r[2]} for r in cur.fetchall()]

add("PostgreSQL shared_buffers should be 25% of RAM", {"topic": "postgresql"})
results = search("How do I tune PostgreSQL memory?")
for r in results:
    print(f"  {r['score']:.3f}  {r['content']}")

Best for: Projects already using PostgreSQL, teams comfortable with SQL, production deployments under 1M vectors, needing full ACID and backup integration.


Part 2: Qdrant

# Run Qdrant via Docker
docker run -d --name qdrant -p 6333:6333 \
  -v qdrant-storage:/qdrant/storage \
  qdrant/qdrant:v1.9.0

pip install qdrant-client --break-system-packages
# Qdrant usage
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import ollama, uuid

client = QdrantClient(host="localhost", port=6333)

# Create collection
client.recreate_collection(
    collection_name="docs",
    vectors_config=VectorParams(size=768, distance=Distance.COSINE)
)

def embed(text): return ollama.embeddings(model="nomic-embed-text:v1.5", prompt=text)["embedding"]

# Insert
def add(texts: list[str], metadatas: list[dict] = None):
    metadatas = metadatas or [{} for _ in texts]
    client.upsert(
        collection_name="docs",
        points=[
            PointStruct(
                id=str(uuid.uuid4()),
                vector=embed(t),
                payload={"content": t, **m}
            )
            for t, m in zip(texts, metadatas)
        ]
    )

# Search with metadata filter
def search(query, k=5, filter_topic=None):
    vec = embed(query)
    query_filter = None
    if filter_topic:
        from qdrant_client.models import Filter, FieldCondition, MatchValue
        query_filter = Filter(must=[FieldCondition(key="topic", match=MatchValue(value=filter_topic))])

    results = client.search(
        collection_name="docs",
        query_vector=vec,
        limit=k,
        query_filter=query_filter,
        with_payload=True
    )
    return [{"content": r.payload["content"], "score": r.score} for r in results]

add(
    ["Qdrant is a high-performance vector database written in Rust",
     "pgvector adds vector search to PostgreSQL"],
    [{"topic": "qdrant"}, {"topic": "pgvector"}]
)

print(search("fast vector similarity search", filter_topic="qdrant"))

Best for: High-throughput production systems, 1M+ vectors, advanced filtering requirements, microservice architectures where a dedicated vector service makes sense.


Part 3: ChromaDB

pip install chromadb --break-system-packages
# ChromaDB usage — simplest possible setup
import chromadb, ollama

client = chromadb.Client()   # In-memory (default) — use PersistentClient for disk

# Or persistent:
# client = chromadb.PersistentClient(path="./chroma-data")

collection = client.get_or_create_collection("docs")

def embed(text): return ollama.embeddings(model="nomic-embed-text:v1.5", prompt=text)["embedding"]

# Insert
collection.add(
    documents=["ChromaDB is a developer-friendly vector database",
                "pgvector adds vector search to PostgreSQL"],
    embeddings=[embed("ChromaDB is a developer-friendly vector database"),
                embed("pgvector adds vector search to PostgreSQL")],
    metadatas=[{"topic": "chromadb"}, {"topic": "pgvector"}],
    ids=["1", "2"]
)

# Search — simplest API in this comparison
results = collection.query(
    query_embeddings=[embed("what vector database should I use?")],
    n_results=2
)
for doc, dist in zip(results["documents"][0], results["distances"][0]):
    print(f"  {1-dist:.3f}  {doc}")

Expected output:

  0.821  ChromaDB is a developer-friendly vector database
  0.789  pgvector adds vector search to PostgreSQL

Best for: Prototyping, local development, tutorials, demos, and applications where developer velocity matters more than production performance.


Performance Benchmarks

Tested on Ubuntu 24.04 LTS, Hetzner CX32 (4 vCPU, 8GB RAM), 100K 768-dimension vectors from nomic-embed-text.

# Benchmark script (run after inserting 100K documents)
python3 -c "
import time, statistics

queries = ['test query ' + str(i) for i in range(100)]
times = []

for q in queries:
    start = time.perf_counter()
    # Run your search function here
    elapsed = (time.perf_counter() - start) * 1000
    times.append(elapsed)

print(f'Mean: {statistics.mean(times):.1f}ms')
print(f'P95:  {sorted(times)[94]:.1f}ms')
print(f'P99:  {sorted(times)[98]:.1f}ms')
"
DatabaseMean latencyP95P99Recall @10
Qdrant 1.91.2ms2.1ms3.4ms97.4%
pgvector 0.8 (HNSW)3.1ms5.8ms9.2ms95.1%
ChromaDB 0.58.4ms14.2ms21.6ms93.8%

Migration Path

# Migrate from ChromaDB to pgvector (example)
import chromadb, psycopg2

chroma = chromadb.PersistentClient("./chroma-data")
collection = chroma.get_collection("docs")

# Export all documents and embeddings
all_docs = collection.get(include=["documents", "embeddings", "metadatas"])

# Import into pgvector
conn = psycopg2.connect("postgresql://user:pass@localhost/myapp")
with conn.cursor() as cur:
    for doc, emb, meta in zip(
        all_docs["documents"],
        all_docs["embeddings"],
        all_docs["metadatas"]
    ):
        cur.execute(
            "INSERT INTO embeddings (content, metadata, embedding) VALUES (%s, %s, %s::vector)",
            (doc, psycopg2.extras.Json(meta), str(emb))
        )
    conn.commit()
print("Migration complete")

Conclusion

All three are excellent sovereign choices. The decision tree: already using PostgreSQL → pgvector; need maximum performance at scale → Qdrant; building a prototype today → ChromaDB. All three are open-source, run locally, and work with the same embedding models via Ollama.

See RAG Tutorial 2026 for the complete pipeline that uses pgvector, and Private Document Q&A with Ollama and pgvector for the production-grade implementation.


People Also Ask

Yes — pgvector uses standard SQL for filtering. SELECT content FROM docs WHERE metadata->>'topic' = 'postgresql' ORDER BY embedding <=> query_vec LIMIT 10 combines a metadata filter with vector similarity search. The caveat: pre-filtering in SQL can degrade recall on HNSW indexes because the index is built over all vectors, not the filtered subset. For large filtered result sets, this is rarely an issue. For high-selectivity filters (< 1% of vectors), consider Qdrant which builds filtered indexes.

Can I use Qdrant and pgvector together in the same application?

Yes — many production systems use both. Use pgvector for the primary document store where you need SQL joins, transactions, and relational queries. Use Qdrant as a dedicated search service for the highest-throughput vector queries. They use the same embedding model, so vectors are interchangeable. The trade-off is operational complexity: two database systems to monitor and backup versus one.


Part 5: Deployment and Production Patterns

Each vector database has a different deployment model. Choosing the right one depends on your infrastructure and sovereignty goals.

5.1 pgvector deployment

pgvector runs inside PostgreSQL. That means your deployment is the same as your relational database deployment:

  • install PostgreSQL 17
  • enable pgvector
  • create the vector table and HNSW index

Pros:

  • one service to manage
  • same backup/restore toolchain as relational data
  • no extra network hop

Cons:

  • query latency can be higher at scale
  • PostgreSQL tuning must account for vector indexes as well as relational workloads

5.2 Qdrant deployment

Qdrant is a separate service, usually deployed in Docker or Kubernetes.

A minimal Docker Compose configuration:

version: '3.8'
services:
  qdrant:
    image: qdrant/qdrant:1.9.2
    ports:
      - '6333:6333'
    volumes:
      - qdrant-data:/qdrant/storage
    command: ["--optimize-ram"]

volumes:
  qdrant-data:

Pros:

  • dedicated vector engine
  • excellent ANN performance
  • built-in filtering and payload indexing

Cons:

  • additional service to manage
  • separate backup/restore workflow

5.3 ChromaDB deployment

ChromaDB is usually in-process with Python. It can also be persisted to disk using PersistentClient.

from chromadb import PersistentClient
client = PersistentClient(path="./chroma-db")

Pros:

  • simplest setup
  • great for demos and prototypes
  • no server required

Cons:

  • not ideal for large scale or shared multi-client workloads
  • persistence is a local file path rather than a service

Part 6: Filtering, Metadata, and Retrieval Quality

Metadata filtering is essential in sovereign retrieval applications.

6.1 pgvector filtering with SQL

Because pgvector integrates with PostgreSQL, you can use full SQL filtering.

SELECT id, content, (embedding <=> $1::vector) AS distance
FROM embeddings
WHERE metadata->> 'source' = 'policy'
ORDER BY embedding <=> $1::vector
LIMIT 10;

This is the strongest advantage of pgvector. Your metadata filters can be as expressive as PostgreSQL allows.

6.2 Qdrant filter expressions

Qdrant supports filters over payload fields.

from qdrant_client.models import Filter, FieldCondition, MatchValue
filter = Filter(must=[FieldCondition(key="source", match=MatchValue(value="policy"))])

This is easier for many vector-first applications and is fast at query time.

6.3 ChromaDB metadata handling

ChromaDB supports basic metadata filters, but they are less powerful than SQL.

results = collection.query(query_texts=[query], n_results=5, where={"source": "policy"})

Use ChromaDB for simple metadata categories and prototyping.

Part 7: Scaling and Performance Tuning

Vector search performance is not just the database; it is the whole pipeline.

7.1 Index configuration

For pgvector, use HNSW:

CREATE INDEX ON embeddings USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construct = 200);

For Qdrant, set appropriate ef and m parameters when creating the collection.

7.2 Batch insertion and bulk loading

Insert vectors in bulk to avoid repeated commit overhead.

collection.upsert(collection_name="docs", points=points)

For pgvector, use COPY or batched inserts.

7.3 Memory and indexing tradeoffs

At scale, a vector index can consume several GB of RAM. Qdrant offloads part of the index to disk, while pgvector keeps more state in PostgreSQL memory.

For a 1M vector dataset, expect 10-20 GB of RAM for HNSW indexes unless you use compressed 4-bit embeddings and lower m values.

Part 8: Backup, Persistence, and Recovery

Local sovereignty means owning the data lifecycle.

8.1 Backing up pgvector

Use PostgreSQL backup tools: pg_dump for logical backups or pg_basebackup for physical backups.

Your vector data is part of the database, so one backup covers both embeddings and relational state.

8.2 Backing up Qdrant

Qdrant stores its state in a local data directory. Back up the directory with a snapshot or a consistent export.

tar czf qdrant-backup.tar.gz /var/lib/qdrant/storage

Verify the backup by restoring it to a test instance.

8.3 Backing up ChromaDB

If using PersistentClient, back up the directory on disk. The format is local to ChromaDB, so restoration is usually a matter of copying the files back.

Part 9: RAG Integration Patterns

Vector databases are often part of a retrieval-augmented generation pipeline.

9.1 pgvector + RAG

Store both embeddings and metadata with your relational content. Query the top K results in SQL and pass them to a local LLM for prompt construction.

9.2 Qdrant + RAG

Use Qdrant for high-speed retrieval while your application fetches full documents from a local store or object storage.

9.3 ChromaDB + RAG

Use ChromaDB for small to medium proof-of-concept systems where you want the whole stack in Python.

Part 10: Final Decision Guide

Use this rule of thumb:

  • pgvector if you already use PostgreSQL and want a single consolidated stack.
  • Qdrant if you need best-in-class vector search at scale with long-term filtering.
  • ChromaDB if you want the fastest route from idea to working prototype.

A sovereign local AI stack is not about picking the fanciest database; it is about picking the one you can maintain, back up, and audit yourself.

Part 11: Embedding Quality and Vector Normalisation

The choice of vector database is only part of the pipeline. High-quality embeddings are essential for useful retrieval.

11.1 Choose the right embedding model

Use an encoder that matches your domain. For general English text, open-source models such as nomic/embed-text or all-mpnet-base-v2 are strong choices.

11.2 Normalisation and vector distance

Different databases assume different distance metrics. Normalize embeddings when using cosine similarity.

from numpy.linalg import norm
vec = np.array(embed(text))
vec = vec / norm(vec)

In pgvector, use vector_cosine_ops. In Qdrant, specify Distance.COSINE.

11.3 Hybrid search and keyword augmentation

Combine vector search with keyword matching for better precision.

For example, do a SQL filter on important metadata before running the vector query, or boost results that match a high-value keyword.

Part 12: Vector Storage Governance

A sovereign vector store must be auditable and maintainable.

12.1 Schema versioning

Track the schema and embedding metadata with a version field.

ALTER TABLE embeddings ADD COLUMN embedding_model TEXT DEFAULT 'nomic/embed-text:v1.5';

This helps you know which embeddings can be reused or need refresh.

12.2 Data lifecycle

Decide how long to keep vectors and when to refresh them. For frequently updated sources, schedule re-embedding jobs.

12.3 Metadata hygiene

Keep metadata normalized and consistent. Use the same key names across documents, such as source, author, tags, and created_at.

Part 13: Security and Local Control

Each vector database has security considerations.

13.1 Access control

For pgvector, use PostgreSQL roles and row-level security if needed. Grant only the minimum privileges for search and insert.

13.2 Service isolation

Run Qdrant in a local network segment behind a firewall. Do not expose the Qdrant management API publicly.

13.3 Data encryption at rest

If your host supports it, enable filesystem encryption for vector storage, especially for Qdrant and ChromaDB disk paths.

Part 14: Query Pipeline Examples

A production retrieval pipeline usually has multiple stages.

14.1 Candidate generation

Retrieve the top K vectors from the vector database.

14.2 Reranking

Rerank the candidate documents with a cross-encoder or a scoring function that uses metadata relevance.

14.3 Prompt construction

Construct a prompt that includes the top results in a context window, with clear separators and source attribution.

Example prompt:

Use the following documents to answer the question.

Document 1:
...

Question: What is the recommended deployment model?

Part 15: Local AI Architecture Recommendations

A sovereign AI stack should be modular.

  • Use a single vector database for retrieval
  • Keep embedding generation separate from storage
  • Store documents and metadata in a local service or database
  • Keep the LLM inferencer private and on-premises

This modular architecture keeps each component maintainable and auditable.

Part 16: Tooling and Local Workflows

Choose vector tools that fit your local workflow.

16.1 pgvector tooling

Use psql and standard PostgreSQL tools. For schema changes, manage vector types like any other column.

ALTER TABLE embeddings ADD COLUMN normalized vector(768);

16.2 Qdrant CLI and admin tools

Qdrant has a REST API and CLI clients for local administration. Use qdrant-client for scripting and curl for quick checks.

16.3 ChromaDB developer workflow

ChromaDB is ideal for Python-first development. Use notebooks, local scripts, and pytest to validate retrieval pipelines.

Part 17: Data Freshness and Re-Embedding

Vector stores need refresh cycles.

17.1 When to refresh embeddings

  • source text changes frequently
  • the model version is updated
  • query performance degrades

17.2 Refresh policy

Keep a last_embedded_at timestamp on your documents. Recompute embeddings for changed documents only.

17.3 Incremental re-embedding

For large corpora, update a subset of vectors in batches rather than rebuilding the whole store.

Part 18: Hybrid Search Architectures

Combine text search and vector search for better accuracy.

18.1 Keyword-first candidate narrowing

Use PostgreSQL full-text search or a local inverted index to narrow candidates, then run vector search on the shortlist.

18.2 Reranking with a local cross-encoder

After retrieving candidates, rerank them with a compact local model or deterministic scoring to reduce hallucinations.

Part 19: Vector Store Observability

Monitor vector store health and storage usage.

19.1 Query latency dashboards

Track P95 and P99 search times. Qdrant metrics are available via Prometheus; pgvector metrics are available from PostgreSQL stats.

19.2 Index size and memory usage

Keep an eye on index size growth. A vector store can grow quickly with additional embeddings.

Part 20: Final Recommendation Summary

  • Use pgvector when you value single-service simplicity, SQL filtering, and integrated backups.
  • Use Qdrant when you value high performance, advanced filter queries, and a dedicated vector search engine.
  • Use ChromaDB when rapid prototyping and compact developer workflows matter most.

A sovereign stack can also mix these tools: use ChromaDB for development, pgvector for transactional search, and Qdrant for high-volume retrieval on the same dataset.

Part 21: Local Development Workflow

A strong local workflow is essential for a sovereign vector database project.

21.1 Reproducible environment

Use Docker Compose, poetry, or pipx to manage dependencies. Keep a docker-compose.dev.yml for local testing.

21.2 Sample corpus and test data

Keep a lightweight sample corpus for local experimentation. This should mirror production schema without using sensitive data.

21.3 CLI helpers

Build small CLI tools for loading, querying, and inspecting the vector store. For example, a vector-inspect command that prints collection stats.

Part 22: Data Governance and Source Attribution

Document where every vector comes from.

22.1 Source fields

Store source metadata with every embedding: source, author, created_at, language, and domain.

22.2 Use case labeling

Add a use_case label for RAG, semantic search, recommendation, or classification. This helps filter retrieval results and preserve auditability.

Part 23: Hybrid Query and Semantic Search Strategies

Sovereign retrieval often combines multiple search signals.

23.1 Term-based prefiltering

Use SQL, inverted indexes, or regex to narrow candidates before vector search.

23.2 Semantic score blending

Combine vector similarity with metadata relevance or heuristic scores.

combined_score = 0.7 * vector_score + 0.3 * metadata_score

23.3 Final decision guidance

For self-hosted systems, the best vector database is the one that minimizes operational risk while still meeting performance targets. That often means pgvector for integrated stacks, Qdrant for dedicated search, and ChromaDB for fast experimentation.

Part 24: Local Security and Access Control

A sovereign vector database must be protected like any other production service.

24.1 Network access restrictions

Run the vector service on a private network interface or behind a reverse proxy. Do not expose Qdrant or ChromaDB to the public internet unless access is strictly authenticated.

24.2 Authentication and authorization

For pgvector, leverage PostgreSQL roles and row-level security. For Qdrant, configure API keys and restrict write access to trusted hosts.

24.3 Audit logs

Keep audit logs for every schema change, collection creation, and client connection. Local audit trails are critical for governance.

Part 25: Example Local Deployment Architecture

A typical local deployment might look like:

  • PostgreSQL with pgvector for transactional embedding storage
  • Qdrant for fast similarity search on larger vector collections
  • A local embedding service that writes new vectors to both stores
  • A lightweight API gateway to route search requests

This hybrid architecture preserves sovereignty while optimizing for both consistency and speed.

Part 26: Future-Proofing Your Vector Stack

Plan for future growth by keeping the vector store decoupled from the rest of the application.

26.1 Versioned embeddings

Store the embedding model name and revision with every vector. This allows you to compare old and new embeddings over time.

26.2 Migration readiness

Keep a migration path for vector schema changes, such as adding chunk_id or source_type fields. Use backward-compatible defaults where possible.

26.3 Vendor neutrality

Avoid building your retrieval layer around proprietary APIs. Design your application so you can switch between pgvector, Qdrant, and ChromaDB without rewriting the business logic.

Part 27: Final Vector Database Transition Notes

When you choose a sovereign vector database, keep your deployment and governance practices aligned. Treat your vector index as a production datastore: back it up, version it, and keep the metadata consistent. A local vector stack is strongest when the data lifecycle is documented, the team understands the tradeoffs, and the system is designed for maintainability rather than purely for peak performance.

Part 28: Sustaining a Sovereign Vector Store

Keep the vector database healthy by scheduling regular reviews of index size, query latency, and data freshness. A well-maintained store is one that can be restored, audited, and understood by the team without relying on external support.

Further Reading

Tested on: Ubuntu 24.04 LTS (Hetzner CX32, 8GB RAM). pgvector 0.8.0, Qdrant 1.9.2, ChromaDB 0.5.3. Last verified: April 28, 2026.

Kofi Mensah

About the Author

Inference Economics & Hardware Architect

Electrical Engineer | Hardware Systems Architect | 8+ Years in GPU/AI Optimization | ARM & x86 Specialist

Kofi Mensah is a hardware architect and AI infrastructure specialist focused on optimizing inference costs for on-device and local-first AI deployments. With expertise in CPU/GPU architectures, Kofi analyzes real-world performance trade-offs between commercial cloud AI services and sovereign, self-hosted models running on consumer and enterprise hardware (Apple Silicon, NVIDIA, AMD, custom ARM systems). He quantifies the total cost of ownership for AI infrastructure and evaluates which deployment models (cloud, hybrid, on-device) make economic sense for different workloads and use cases. Kofi's technical analysis covers model quantization, inference optimization techniques (llama.cpp, vLLM), and hardware acceleration for language models, vision models, and multimodal systems. At Vucense, Kofi provides detailed cost analysis and performance benchmarks to help developers understand the real economics of sovereign AI.

View Profile

Further Reading

All Dev Corner

Comments