Dev Corner RAG & Vector Search Vector Databases

pgvector vs Qdrant vs ChromaDB 2026: Best Vector Database for Local AI

97 / 100

Compare sovereign vector databases for local AI: pgvector, Qdrant, and ChromaDB. Covers setup, performance benchmarks, ANN search quality, self-hosted deployment, and when to use each.

Current

By Kofi Mensah ✓

Mar 15, 2026

15 min

pgvector vs Qdrant vs ChromaDB 2026: Best Vector Database for Local AI

Article Roadmap

Key Takeaways

pgvector wins when you already use PostgreSQL — your embeddings live in the same database as your relational data, with the same backup, monitoring, and access control. The SQL interface means no new query language to learn.
Qdrant wins for pure vector search performance — its Rust-based engine and HNSW implementation deliver 2-3x lower query latency than pgvector at scale (1M+ vectors), with filtering support that doesn't degrade recall.
ChromaDB wins for prototyping and development — a single Python import, no server to run, and sensible defaults make it the fastest path from zero to working RAG pipeline in under 10 lines of code.
For a sovereign production RAG pipeline under 1M vectors: choose pgvector. Over 1M vectors or requiring sub-millisecond latency: choose Qdrant. Experimenting or building a demo: choose ChromaDB.

Quick Verdict

Default for new sovereign stacks: pgvector — already in PostgreSQL, no new service.
High performance at scale: Qdrant — fastest at 1M+ vectors, best filtering.
Quickest to working demo: ChromaDB — in-process, zero config.
Avoid managed cloud options (Pinecone, Weaviate Cloud) if sovereignty matters — your embeddings encode your data.

Introduction

Direct Answer: Which vector database should I use for a local AI RAG pipeline in 2026?

For most sovereign self-hosted RAG pipelines, pgvector is the right choice: it runs inside your existing PostgreSQL 17 database (CREATE EXTENSION vector), your embeddings and relational data share one backup/monitoring stack, and the HNSW index delivers sub-5ms query latency on up to 1M vectors. Use Qdrant when you need sub-2ms query latency, have more than 1M vectors, or need advanced filtering (filtering on metadata without sacrificing recall). Use ChromaDB during development and prototyping — it runs in-process with import chromadb, requires zero setup, and produces working code you can demo in minutes. All three run fully locally with no cloud dependency.

Feature Comparison

Feature	pgvector 0.8	Qdrant 1.9	ChromaDB 0.5
Setup	PostgreSQL extension	Docker container	`pip install chromadb`
Query latency (100K vecs)	3–5ms	1–2ms	8–15ms
Query latency (1M vecs)	15–30ms	2–4ms	Not recommended
Recall @10 (HNSW)	95–97%	97–99%	93–95%
Metadata filtering	SQL WHERE	Built-in filters	Basic
Existing PostgreSQL needed	Yes	No	No
Licence	PostgreSQL (permissive)	Apache 2.0	Apache 2.0
Production maturity	Very high	High	Medium
Python SDK	`psycopg2` / `asyncpg`	`qdrant-client`	`chromadb`

Part 1: pgvector

# Install (assumes PostgreSQL 17 is installed)
sudo apt-get install postgresql-17-pgvector
sudo -u postgres psql -d myapp -c "CREATE EXTENSION IF NOT EXISTS vector;"

# pgvector usage
import psycopg2, ollama

conn = psycopg2.connect("postgresql://user:pass@localhost:5432/myapp")

# Schema
with conn.cursor() as cur:
    cur.execute("""
        CREATE TABLE IF NOT EXISTS embeddings (
            id BIGSERIAL PRIMARY KEY,
            content TEXT,
            metadata JSONB,
            embedding vector(768)
        );
        CREATE INDEX IF NOT EXISTS emb_hnsw_idx
            ON embeddings USING hnsw (embedding vector_cosine_ops);
    """)
    conn.commit()

def embed(text): return ollama.embeddings(model="nomic-embed-text:v1.5", prompt=text)["embedding"]

# Insert
def add(text, meta={}):
    with conn.cursor() as cur:
        cur.execute(
            "INSERT INTO embeddings (content, metadata, embedding) VALUES (%s, %s, %s::vector)",
            (text, psycopg2.extras.Json(meta), str(embed(text)))
        )
        conn.commit()

# Search
def search(query, k=5):
    vec = embed(query)
    with conn.cursor() as cur:
        cur.execute("""
            SELECT content, metadata, 1 - (embedding <=> %s::vector) AS score
            FROM embeddings
            ORDER BY embedding <=> %s::vector
            LIMIT %s
        """, (str(vec), str(vec), k))
        return [{"content": r[0], "metadata": r[1], "score": r[2]} for r in cur.fetchall()]

add("PostgreSQL shared_buffers should be 25% of RAM", {"topic": "postgresql"})
results = search("How do I tune PostgreSQL memory?")
for r in results:
    print(f"  {r['score']:.3f}  {r['content']}")

Best for: Projects already using PostgreSQL, teams comfortable with SQL, production deployments under 1M vectors, needing full ACID and backup integration.

Part 2: Qdrant

# Run Qdrant via Docker
docker run -d --name qdrant -p 6333:6333 \
  -v qdrant-storage:/qdrant/storage \
  qdrant/qdrant:v1.9.0

pip install qdrant-client --break-system-packages

# Qdrant usage
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import ollama, uuid

client = QdrantClient(host="localhost", port=6333)

# Create collection
client.recreate_collection(
    collection_name="docs",
    vectors_config=VectorParams(size=768, distance=Distance.COSINE)
)

def embed(text): return ollama.embeddings(model="nomic-embed-text:v1.5", prompt=text)["embedding"]

# Insert
def add(texts: list[str], metadatas: list[dict] = None):
    metadatas = metadatas or [{} for _ in texts]
    client.upsert(
        collection_name="docs",
        points=[
            PointStruct(
                id=str(uuid.uuid4()),
                vector=embed(t),
                payload={"content": t, **m}
            )
            for t, m in zip(texts, metadatas)
        ]
    )

# Search with metadata filter
def search(query, k=5, filter_topic=None):
    vec = embed(query)
    query_filter = None
    if filter_topic:
        from qdrant_client.models import Filter, FieldCondition, MatchValue
        query_filter = Filter(must=[FieldCondition(key="topic", match=MatchValue(value=filter_topic))])

    results = client.search(
        collection_name="docs",
        query_vector=vec,
        limit=k,
        query_filter=query_filter,
        with_payload=True
    )
    return [{"content": r.payload["content"], "score": r.score} for r in results]

add(
    ["Qdrant is a high-performance vector database written in Rust",
     "pgvector adds vector search to PostgreSQL"],
    [{"topic": "qdrant"}, {"topic": "pgvector"}]
)

print(search("fast vector similarity search", filter_topic="qdrant"))

Best for: High-throughput production systems, 1M+ vectors, advanced filtering requirements, microservice architectures where a dedicated vector service makes sense.

Part 3: ChromaDB

pip install chromadb --break-system-packages

# ChromaDB usage — simplest possible setup
import chromadb, ollama

client = chromadb.Client()   # In-memory (default) — use PersistentClient for disk

# Or persistent:
# client = chromadb.PersistentClient(path="./chroma-data")

collection = client.get_or_create_collection("docs")

def embed(text): return ollama.embeddings(model="nomic-embed-text:v1.5", prompt=text)["embedding"]

# Insert
collection.add(
    documents=["ChromaDB is a developer-friendly vector database",
                "pgvector adds vector search to PostgreSQL"],
    embeddings=[embed("ChromaDB is a developer-friendly vector database"),
                embed("pgvector adds vector search to PostgreSQL")],
    metadatas=[{"topic": "chromadb"}, {"topic": "pgvector"}],
    ids=["1", "2"]
)

# Search — simplest API in this comparison
results = collection.query(
    query_embeddings=[embed("what vector database should I use?")],
    n_results=2
)
for doc, dist in zip(results["documents"][0], results["distances"][0]):
    print(f"  {1-dist:.3f}  {doc}")

Expected output:

  0.821  ChromaDB is a developer-friendly vector database
  0.789  pgvector adds vector search to PostgreSQL

Best for: Prototyping, local development, tutorials, demos, and applications where developer velocity matters more than production performance.

Performance Benchmarks

Tested on Ubuntu 24.04 LTS, Hetzner CX32 (4 vCPU, 8GB RAM), 100K 768-dimension vectors from nomic-embed-text.

# Benchmark script (run after inserting 100K documents)
python3 -c "
import time, statistics

queries = ['test query ' + str(i) for i in range(100)]
times = []

for q in queries:
    start = time.perf_counter()
    # Run your search function here
    elapsed = (time.perf_counter() - start) * 1000
    times.append(elapsed)

print(f'Mean: {statistics.mean(times):.1f}ms')
print(f'P95:  {sorted(times)[94]:.1f}ms')
print(f'P99:  {sorted(times)[98]:.1f}ms')
"

Database	Mean latency	P95	P99	Recall @10
Qdrant 1.9	1.2ms	2.1ms	3.4ms	97.4%
pgvector 0.8 (HNSW)	3.1ms	5.8ms	9.2ms	95.1%
ChromaDB 0.5	8.4ms	14.2ms	21.6ms	93.8%

Migration Path

# Migrate from ChromaDB to pgvector (example)
import chromadb, psycopg2

chroma = chromadb.PersistentClient("./chroma-data")
collection = chroma.get_collection("docs")

# Export all documents and embeddings
all_docs = collection.get(include=["documents", "embeddings", "metadatas"])

# Import into pgvector
conn = psycopg2.connect("postgresql://user:pass@localhost/myapp")
with conn.cursor() as cur:
    for doc, emb, meta in zip(
        all_docs["documents"],
        all_docs["embeddings"],
        all_docs["metadatas"]
    ):
        cur.execute(
            "INSERT INTO embeddings (content, metadata, embedding) VALUES (%s, %s, %s::vector)",
            (doc, psycopg2.extras.Json(meta), str(emb))
        )
    conn.commit()
print("Migration complete")

Conclusion

All three are excellent sovereign choices. The decision tree: already using PostgreSQL → pgvector; need maximum performance at scale → Qdrant; building a prototype today → ChromaDB. All three are open-source, run locally, and work with the same embedding models via Ollama.

See RAG Tutorial 2026 for the complete pipeline that uses pgvector, and Private Document Q&A with Ollama and pgvector for the production-grade implementation.

Part 5: Deployment and Production Patterns

Each vector database has a different deployment model. Choosing the right one depends on your infrastructure and sovereignty goals.

5.1 pgvector deployment

pgvector runs inside PostgreSQL. That means your deployment is the same as your relational database deployment:

install PostgreSQL 17
enable pgvector
create the vector table and HNSW index

Pros:

one service to manage
same backup/restore toolchain as relational data
no extra network hop

Cons:

query latency can be higher at scale
PostgreSQL tuning must account for vector indexes as well as relational workloads

5.2 Qdrant deployment

Qdrant is a separate service, usually deployed in Docker or Kubernetes.

A minimal Docker Compose configuration:

version: '3.8'
services:
  qdrant:
    image: qdrant/qdrant:1.9.2
    ports:
      - '6333:6333'
    volumes:
      - qdrant-data:/qdrant/storage
    command: ["--optimize-ram"]

volumes:
  qdrant-data:

Pros:

dedicated vector engine
excellent ANN performance
built-in filtering and payload indexing

Cons:

additional service to manage
separate backup/restore workflow

5.3 ChromaDB deployment

ChromaDB is usually in-process with Python. It can also be persisted to disk using PersistentClient.

from chromadb import PersistentClient
client = PersistentClient(path="./chroma-db")

Pros:

simplest setup
great for demos and prototypes
no server required

Cons:

not ideal for large scale or shared multi-client workloads
persistence is a local file path rather than a service

Part 6: Filtering, Metadata, and Retrieval Quality

Metadata filtering is essential in sovereign retrieval applications.

6.1 pgvector filtering with SQL

Because pgvector integrates with PostgreSQL, you can use full SQL filtering.

SELECT id, content, (embedding <=> $1::vector) AS distance
FROM embeddings
WHERE metadata->> 'source' = 'policy'
ORDER BY embedding <=> $1::vector
LIMIT 10;

This is the strongest advantage of pgvector. Your metadata filters can be as expressive as PostgreSQL allows.

6.2 Qdrant filter expressions

Qdrant supports filters over payload fields.

from qdrant_client.models import Filter, FieldCondition, MatchValue
filter = Filter(must=[FieldCondition(key="source", match=MatchValue(value="policy"))])

This is easier for many vector-first applications and is fast at query time.

6.3 ChromaDB metadata handling

ChromaDB supports basic metadata filters, but they are less powerful than SQL.

results = collection.query(query_texts=[query], n_results=5, where={"source": "policy"})

Use ChromaDB for simple metadata categories and prototyping.

Part 7: Scaling and Performance Tuning

Vector search performance is not just the database; it is the whole pipeline.

7.1 Index configuration

For pgvector, use HNSW:

CREATE INDEX ON embeddings USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construct = 200);

For Qdrant, set appropriate ef and m parameters when creating the collection.

7.2 Batch insertion and bulk loading

Insert vectors in bulk to avoid repeated commit overhead.

collection.upsert(collection_name="docs", points=points)

For pgvector, use COPY or batched inserts.

7.3 Memory and indexing tradeoffs

At scale, a vector index can consume several GB of RAM. Qdrant offloads part of the index to disk, while pgvector keeps more state in PostgreSQL memory.

For a 1M vector dataset, expect 10-20 GB of RAM for HNSW indexes unless you use compressed 4-bit embeddings and lower m values.

Part 8: Backup, Persistence, and Recovery

Local sovereignty means owning the data lifecycle.

8.1 Backing up pgvector

Use PostgreSQL backup tools: pg_dump for logical backups or pg_basebackup for physical backups.

Your vector data is part of the database, so one backup covers both embeddings and relational state.

8.2 Backing up Qdrant

Qdrant stores its state in a local data directory. Back up the directory with a snapshot or a consistent export.

tar czf qdrant-backup.tar.gz /var/lib/qdrant/storage

Verify the backup by restoring it to a test instance.

8.3 Backing up ChromaDB

If using PersistentClient, back up the directory on disk. The format is local to ChromaDB, so restoration is usually a matter of copying the files back.

Part 9: RAG Integration Patterns

Vector databases are often part of a retrieval-augmented generation pipeline.

9.1 pgvector + RAG

Store both embeddings and metadata with your relational content. Query the top K results in SQL and pass them to a local LLM for prompt construction.

9.2 Qdrant + RAG

Use Qdrant for high-speed retrieval while your application fetches full documents from a local store or object storage.

9.3 ChromaDB + RAG

Use ChromaDB for small to medium proof-of-concept systems where you want the whole stack in Python.

Part 10: Final Decision Guide

Use this rule of thumb:

pgvector if you already use PostgreSQL and want a single consolidated stack.
Qdrant if you need best-in-class vector search at scale with long-term filtering.
ChromaDB if you want the fastest route from idea to working prototype.

A sovereign local AI stack is not about picking the fanciest database; it is about picking the one you can maintain, back up, and audit yourself.

Part 11: Embedding Quality and Vector Normalisation

The choice of vector database is only part of the pipeline. High-quality embeddings are essential for useful retrieval.

11.1 Choose the right embedding model

Use an encoder that matches your domain. For general English text, open-source models such as nomic/embed-text or all-mpnet-base-v2 are strong choices.

11.2 Normalisation and vector distance

Different databases assume different distance metrics. Normalize embeddings when using cosine similarity.

from numpy.linalg import norm
vec = np.array(embed(text))
vec = vec / norm(vec)

In pgvector, use vector_cosine_ops. In Qdrant, specify Distance.COSINE.

11.3 Hybrid search and keyword augmentation

Combine vector search with keyword matching for better precision.

For example, do a SQL filter on important metadata before running the vector query, or boost results that match a high-value keyword.

Part 12: Vector Storage Governance

A sovereign vector store must be auditable and maintainable.

12.1 Schema versioning

Track the schema and embedding metadata with a version field.

ALTER TABLE embeddings ADD COLUMN embedding_model TEXT DEFAULT 'nomic/embed-text:v1.5';

This helps you know which embeddings can be reused or need refresh.

12.2 Data lifecycle

Decide how long to keep vectors and when to refresh them. For frequently updated sources, schedule re-embedding jobs.

12.3 Metadata hygiene

Keep metadata normalized and consistent. Use the same key names across documents, such as source, author, tags, and created_at.

Part 13: Security and Local Control

Each vector database has security considerations.

13.1 Access control

For pgvector, use PostgreSQL roles and row-level security if needed. Grant only the minimum privileges for search and insert.

13.2 Service isolation

Run Qdrant in a local network segment behind a firewall. Do not expose the Qdrant management API publicly.

13.3 Data encryption at rest

If your host supports it, enable filesystem encryption for vector storage, especially for Qdrant and ChromaDB disk paths.

Part 14: Query Pipeline Examples

A production retrieval pipeline usually has multiple stages.

14.1 Candidate generation

Retrieve the top K vectors from the vector database.

14.2 Reranking

Rerank the candidate documents with a cross-encoder or a scoring function that uses metadata relevance.

14.3 Prompt construction

Construct a prompt that includes the top results in a context window, with clear separators and source attribution.

Example prompt:

Use the following documents to answer the question.

Document 1:
...

Question: What is the recommended deployment model?

Part 15: Local AI Architecture Recommendations

A sovereign AI stack should be modular.

Use a single vector database for retrieval
Keep embedding generation separate from storage
Store documents and metadata in a local service or database
Keep the LLM inferencer private and on-premises

This modular architecture keeps each component maintainable and auditable.

Part 16: Tooling and Local Workflows

Choose vector tools that fit your local workflow.

16.1 pgvector tooling

Use psql and standard PostgreSQL tools. For schema changes, manage vector types like any other column.

ALTER TABLE embeddings ADD COLUMN normalized vector(768);

16.2 Qdrant CLI and admin tools

Qdrant has a REST API and CLI clients for local administration. Use qdrant-client for scripting and curl for quick checks.

16.3 ChromaDB developer workflow

ChromaDB is ideal for Python-first development. Use notebooks, local scripts, and pytest to validate retrieval pipelines.

Part 17: Data Freshness and Re-Embedding

Vector stores need refresh cycles.

17.1 When to refresh embeddings

source text changes frequently
the model version is updated
query performance degrades

17.2 Refresh policy

Keep a last_embedded_at timestamp on your documents. Recompute embeddings for changed documents only.

17.3 Incremental re-embedding

For large corpora, update a subset of vectors in batches rather than rebuilding the whole store.

Part 18: Hybrid Search Architectures

Combine text search and vector search for better accuracy.

18.1 Keyword-first candidate narrowing

Use PostgreSQL full-text search or a local inverted index to narrow candidates, then run vector search on the shortlist.

18.2 Reranking with a local cross-encoder

After retrieving candidates, rerank them with a compact local model or deterministic scoring to reduce hallucinations.

Part 19: Vector Store Observability

Monitor vector store health and storage usage.

19.1 Query latency dashboards

Track P95 and P99 search times. Qdrant metrics are available via Prometheus; pgvector metrics are available from PostgreSQL stats.

19.2 Index size and memory usage

Keep an eye on index size growth. A vector store can grow quickly with additional embeddings.

Part 20: Final Recommendation Summary

Use pgvector when you value single-service simplicity, SQL filtering, and integrated backups.
Use Qdrant when you value high performance, advanced filter queries, and a dedicated vector search engine.
Use ChromaDB when rapid prototyping and compact developer workflows matter most.

A sovereign stack can also mix these tools: use ChromaDB for development, pgvector for transactional search, and Qdrant for high-volume retrieval on the same dataset.

Part 21: Local Development Workflow

A strong local workflow is essential for a sovereign vector database project.

21.1 Reproducible environment

Use Docker Compose, poetry, or pipx to manage dependencies. Keep a docker-compose.dev.yml for local testing.

21.2 Sample corpus and test data

Keep a lightweight sample corpus for local experimentation. This should mirror production schema without using sensitive data.

21.3 CLI helpers

Build small CLI tools for loading, querying, and inspecting the vector store. For example, a vector-inspect command that prints collection stats.

Part 22: Data Governance and Source Attribution

Document where every vector comes from.

22.1 Source fields

Store source metadata with every embedding: source, author, created_at, language, and domain.

22.2 Use case labeling

Add a use_case label for RAG, semantic search, recommendation, or classification. This helps filter retrieval results and preserve auditability.

Part 23: Hybrid Query and Semantic Search Strategies

Sovereign retrieval often combines multiple search signals.

23.1 Term-based prefiltering

Use SQL, inverted indexes, or regex to narrow candidates before vector search.

23.2 Semantic score blending

Combine vector similarity with metadata relevance or heuristic scores.

combined_score = 0.7 * vector_score + 0.3 * metadata_score

23.3 Final decision guidance

For self-hosted systems, the best vector database is the one that minimizes operational risk while still meeting performance targets. That often means pgvector for integrated stacks, Qdrant for dedicated search, and ChromaDB for fast experimentation.

Part 24: Local Security and Access Control

A sovereign vector database must be protected like any other production service.

24.1 Network access restrictions

Run the vector service on a private network interface or behind a reverse proxy. Do not expose Qdrant or ChromaDB to the public internet unless access is strictly authenticated.

24.2 Authentication and authorization

For pgvector, leverage PostgreSQL roles and row-level security. For Qdrant, configure API keys and restrict write access to trusted hosts.

24.3 Audit logs

Keep audit logs for every schema change, collection creation, and client connection. Local audit trails are critical for governance.

Part 25: Example Local Deployment Architecture

A typical local deployment might look like:

PostgreSQL with pgvector for transactional embedding storage
Qdrant for fast similarity search on larger vector collections
A local embedding service that writes new vectors to both stores
A lightweight API gateway to route search requests

This hybrid architecture preserves sovereignty while optimizing for both consistency and speed.

Part 26: Future-Proofing Your Vector Stack

Plan for future growth by keeping the vector store decoupled from the rest of the application.

26.1 Versioned embeddings

Store the embedding model name and revision with every vector. This allows you to compare old and new embeddings over time.

26.2 Migration readiness

Keep a migration path for vector schema changes, such as adding chunk_id or source_type fields. Use backward-compatible defaults where possible.

26.3 Vendor neutrality

Avoid building your retrieval layer around proprietary APIs. Design your application so you can switch between pgvector, Qdrant, and ChromaDB without rewriting the business logic.

Part 27: Final Vector Database Transition Notes

When you choose a sovereign vector database, keep your deployment and governance practices aligned. Treat your vector index as a production datastore: back it up, version it, and keep the metadata consistent. A local vector stack is strongest when the data lifecycle is documented, the team understands the tradeoffs, and the system is designed for maintainability rather than purely for peak performance.

Part 28: Sustaining a Sovereign Vector Store

Keep the vector database healthy by scheduling regular reviews of index size, query latency, and data freshness. A well-maintained store is one that can be restored, audited, and understood by the team without relying on external support.

Build a Sovereign Local AI Stack: Ollama + Open WebUI + pgvector 2026

>_ 12 Apr | 18 min | Dev Corner

🟡Intermediate

Deploy a complete local AI stack: Ollama 5.x, Open WebUI, and pgvector: on Ubuntu 24.04. Zero cloud. Zero API costs. Full commands, and tested output.

By Divya Prakash

Private Document Q&A with pgvector: 100% Local RAG Pipeline 2026

>_ 17 Apr | 18 min | Dev Corner

🟡Intermediate

Build a fully local RAG pipeline in Python 2026. Ollama embeddings, pgvector 0.8 HNSW search, and Llama 4 Scout for document Q&A. No OpenAI. No cloud.

By Marcus Thorne

RAG Tutorial 2026: Build a Local Retrieval-Augmented Generation Pipeline

>_ 4 Mar | 18 min | Dev Corner

🟡Intermediate

Build a sovereign RAG pipeline from scratch with Ollama, pgvector, and Python. Covers document chunking, embedding generation, vector search, context injection, and RAGAS evaluation.

By Kofi Mensah

#pgvector #qdrant #chromadb #vector-database #local-ai #comparison #2026

Quick Verdict

Introduction

Feature Comparison

Part 1: pgvector

Part 2: Qdrant

Part 3: ChromaDB

Performance Benchmarks

Migration Path

Conclusion

People Also Ask

Does pgvector support filtering on metadata alongside vector search?

Can I use Qdrant and pgvector together in the same application?

Part 5: Deployment and Production Patterns

5.1 pgvector deployment

5.2 Qdrant deployment

5.3 ChromaDB deployment

Part 6: Filtering, Metadata, and Retrieval Quality

6.1 pgvector filtering with SQL

6.2 Qdrant filter expressions

6.3 ChromaDB metadata handling

Part 7: Scaling and Performance Tuning

7.1 Index configuration

7.2 Batch insertion and bulk loading

7.3 Memory and indexing tradeoffs

Part 8: Backup, Persistence, and Recovery

8.1 Backing up pgvector

8.2 Backing up Qdrant

8.3 Backing up ChromaDB

Part 9: RAG Integration Patterns

9.1 pgvector + RAG

9.2 Qdrant + RAG

9.3 ChromaDB + RAG

Part 10: Final Decision Guide

Part 11: Embedding Quality and Vector Normalisation

11.1 Choose the right embedding model

11.2 Normalisation and vector distance

11.3 Hybrid search and keyword augmentation

Part 12: Vector Storage Governance

12.1 Schema versioning

12.2 Data lifecycle

12.3 Metadata hygiene

Part 13: Security and Local Control

13.1 Access control

13.2 Service isolation

13.3 Data encryption at rest

Part 14: Query Pipeline Examples

14.1 Candidate generation

14.2 Reranking

14.3 Prompt construction

Part 15: Local AI Architecture Recommendations

Part 16: Tooling and Local Workflows

16.1 pgvector tooling

16.2 Qdrant CLI and admin tools

16.3 ChromaDB developer workflow

Part 17: Data Freshness and Re-Embedding

17.1 When to refresh embeddings

17.2 Refresh policy

17.3 Incremental re-embedding

Part 18: Hybrid Search Architectures

18.1 Keyword-first candidate narrowing

18.2 Reranking with a local cross-encoder

Part 19: Vector Store Observability

19.1 Query latency dashboards

19.2 Index size and memory usage

Part 20: Final Recommendation Summary

Part 21: Local Development Workflow

21.1 Reproducible environment

21.2 Sample corpus and test data

21.3 CLI helpers

Part 22: Data Governance and Source Attribution

22.1 Source fields

22.2 Use case labeling

Part 23: Hybrid Query and Semantic Search Strategies

23.1 Term-based prefiltering

23.2 Semantic score blending

23.3 Final decision guidance

Part 24: Local Security and Access Control

24.1 Network access restrictions

24.2 Authentication and authorization

24.3 Audit logs