Dev Corner Fine-Tuning & LLMOps Fine-Tuning Basics

RAG vs Fine-Tuning vs Prompt Engineering 2026: Which Should You Use?

92 / 100

Decide between RAG, fine-tuning, and prompt engineering for LLM customisation. Covers decision framework, cost comparison, data requirements, latency, and when each approach wins in production.

Current

By Kofi Mensah ✓

Mar 5, 2026

14 min

RAG vs Fine-Tuning vs Prompt Engineering 2026: Which Should You Use?

Article Roadmap

Key Takeaways

Prompt engineering is always the correct first step — optimising your system prompt, adding few-shot examples, and using chain-of-thought solves 70-80% of LLM quality problems without any data collection, training cost, or model change.
RAG is the right choice when your application needs access to specific, frequently-updated, or sensitive knowledge — it retrieves current information at query time and the LLM answers from the retrieved context rather than training memory.
Fine-tuning is the right choice when you need to change HOW the model responds (style, format, tone, domain vocabulary) rather than WHAT it knows — it works best with 200-5000 high-quality, consistent examples of your desired output format.
The combinations are additive: a fine-tuned model with RAG (for knowledge) and optimised system prompts (for behaviour) outperforms any single approach — start with prompt engineering, add RAG for knowledge, add fine-tuning only if style remains inconsistent.

Key Takeaways

Try prompt engineering first, always. It’s free, instant, and solves most problems.
RAG for knowledge problems: “The model doesn’t know our products” → RAG. “The model doesn’t know recent events” → RAG. “The model can’t access our docs” → RAG.
Fine-tuning for behaviour problems: “The model won’t format output correctly” → fine-tuning. “The model doesn’t match our brand voice” → fine-tuning. “The model uses wrong domain terminology” → fine-tuning.
They combine: Fine-tuned model + RAG + optimised prompts = maximum capability.

Introduction

Direct Answer: When should I use RAG vs fine-tuning vs prompt engineering for customising an LLM in 2026?

Start with prompt engineering — a well-designed system prompt with few-shot examples and format constraints solves most problems with zero cost and zero data. Use RAG when the model needs access to specific documents, up-to-date information, or proprietary knowledge that isn’t in the model’s training data — RAG retrieves relevant context at query time without modifying the model. Use fine-tuning when prompt engineering doesn’t achieve the required output style, format, or domain-specific behaviour — fine-tuning trains a new model version on 200–5,000 examples of your desired output. In practice: start with prompt engineering, add RAG if knowledge gaps are the problem, add fine-tuning only if consistent style/format is still the issue after optimising prompts. All three approaches work with local Ollama models at zero per-query cost.

The Decision Framework

START HERE: What is the actual problem?
│
├─► "The model gives wrong/hallucinated answers"
│   └─► Is the information in the model's training data?
│       ├─► YES → Better prompt + chain-of-thought → Prompt Engineering
│       └─► NO  → The model doesn't have this knowledge → RAG
│
├─► "The model knows the information but formats it wrong"
│   └─► Prompt Engineering: specify format explicitly + few-shot examples
│       └─► Still wrong after 10 iterations? → Fine-Tuning
│
├─► "The model is too slow for my use case"
│   └─► Prompt Engineering: shorter prompts, smaller model, cached responses
│
├─► "The model uses wrong terminology / brand voice"
│   └─► Try system prompt first → if inconsistent → Fine-Tuning
│
└─► "I need the model to access real-time / private data"
    └─► RAG (retrieval-augmented generation)

Detailed Comparison

Dimension	Prompt Engineering	RAG	Fine-Tuning
Setup time	Minutes	Hours–Days	Days–Weeks
Data required	None	Documents	200–5000 labelled examples
Infrastructure	Just the LLM	LLM + vector DB + embedding model	GPU + training pipeline
Cost (one-time)	$0	$0–$50 (storage)	$0 (local GPU) or $20–500 (cloud)
Cost (per query)	LLM inference only	LLM + retrieval overhead	LLM inference only
Knowledge freshness	Static (training cutoff)	Real-time (update the docs)	Static (training cutoff)
Can cite sources?	No	Yes (retrieved chunks)	No
Changes model weights?	No	No	Yes
Reversible?	Yes (edit prompt)	Yes (update/delete docs)	Requires retraining
Best for	Format, tone, behaviour	Knowledge, Q&A, grounding	Style, domain vocabulary, format

Part 1: Prompt Engineering First

Before anything else, optimise the prompt:

import ollama

# ❌ Vague prompt — inconsistent results
bad = ollama.chat(model="qwen3:14b", messages=[
    {"role": "user", "content": "Tell me about our return policy"}
])

# ✅ Specific prompt with role, constraints, and format
good = ollama.chat(model="qwen3:14b", messages=[
    {"role": "system", "content": """You are a customer support agent for Acme Corp.
Answer questions about our 30-day return policy.
Rules:
- Return must be within 30 days of purchase
- Item must be unused and in original packaging
- Digital products are non-refundable
Format your answer in 2-3 sentences maximum.
If you don't know the answer, say: 'Please contact [email protected]'"""},
    {"role": "user", "content": "Tell me about our return policy"}
])

print("Bad:", bad["message"]["content"][:100])
print("Good:", good["message"]["content"][:100])

Expected output:

Bad: Our return policy allows customers to return most items within a reasonable timeframe...
Good: You can return unused items in original packaging within 30 days of purchase. Digital products are non-refundable. For assistance, contact [email protected].

Prompt engineering fixes: 80% of problems. Try 10 variations before moving to RAG or fine-tuning.

Part 2: When to Add RAG

Add RAG when the model lacks the necessary knowledge:

# Without RAG — model doesn't know your product catalogue
r = ollama.chat(model="qwen3:14b", messages=[
    {"role": "system", "content": "You are a support agent for Acme Corp."},
    {"role": "user", "content": "What are the specs for the ProMax 4000?"}
])
print("Without RAG:", r["message"]["content"])
# Output: "I don't have specific information about the ProMax 4000..."

# With RAG — retrieve from product database
from your_rag_module import retrieve_context   # Your RAG implementation

context = retrieve_context("ProMax 4000 specifications")
r = ollama.chat(model="qwen3:14b", messages=[
    {"role": "system", "content": f"You are a support agent. Use ONLY this context:\n{context}"},
    {"role": "user", "content": "What are the specs for the ProMax 4000?"}
])
print("With RAG:", r["message"]["content"])
# Output: "The ProMax 4000 has 16GB RAM, 512GB SSD, Intel Core i9-14900K..."

RAG use cases:

Product catalogues and documentation
Company policy and procedure documents
Research papers and knowledge bases
Recent news and events (updated docs)
Customer-specific data (their orders, history)

Part 3: When Fine-Tuning Is Warranted

Fine-tune when prompt engineering doesn’t achieve consistent behaviour:

# Problem: Model won't reliably output structured support tickets
# Even with detailed system prompt, 20% of outputs are wrong format

# Solution: Fine-tune on 500 examples of correct ticket format
# Training data format (JSONL):
training_examples = [
    {
        "instruction": "Convert this support email to a ticket",
        "input": "Hi, my payment failed twice today",
        "output": '{"priority": "high", "category": "billing", "title": "Payment failure", "description": "Customer reports payment failed twice on same day"}'
    },
    # ... 499 more examples
]

# After fine-tuning: 98%+ correct format, consistent every time
# Before: 80% correct, 20% needed manual correction

Fine-tuning use cases:

Consistent output format (JSON schema, specific structure)
Domain-specific vocabulary (medical, legal, proprietary)
Style and tone matching (brand voice, writing style)
Instruction following for specific task types

Part 4: Combining All Three

The highest-quality production setup uses all three:

LAYER 1 — Fine-tuned model:
  Model trained to always output JSON, use our terminology, match our tone
  COST: One-time training (hours on GPU, $0 local)

LAYER 2 — RAG retrieval:
  Each query retrieves relevant product docs, policies, customer data
  COST: Embedding + vector search (<10ms, negligible)

LAYER 3 — Optimised system prompt:
  Role, constraints, output format, edge case handling
  COST: Included in inference

# Full stack: fine-tuned model + RAG + prompt
from your_rag_module import retrieve_context

def answer_query(user_query: str) -> str:
    # Layer 2: Retrieve relevant context
    context = retrieve_context(user_query)

    # Layer 3: Optimised system prompt
    system = f"""You are AcmeBot, Acme Corp's support agent.
[Relevant context from our knowledge base]
{context}

Rules:
- Answer only from the provided context
- Format: JSON with keys: answer, confidence (0-1), needs_human (bool)
- If context is insufficient, set needs_human: true"""

    # Layer 1: Fine-tuned model (knows our format, terminology, tone)
    r = ollama.chat(
        model="acmebot:v2",  # Your fine-tuned Ollama model
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": user_query}
        ],
        format="json"
    )
    return r["message"]["content"]

Cost and Timeline Summary

SCENARIO: Build a customer support chatbot

Option A — Prompt Engineering only:
  Timeline: 1 day
  Data needed: None
  Cost: $0
  Quality: 70-80% correct responses

Option B — Prompt Engineering + RAG:
  Timeline: 3-5 days
  Data needed: Company docs and FAQs
  Cost: $0-$20 (storage, pgvector)
  Quality: 85-92% correct, grounded responses

Option C — All three (Prompt + RAG + Fine-tuning):
  Timeline: 2-3 weeks
  Data needed: 500+ labelled examples + company docs
  Cost: $0 (local GPU) or $50-200 (cloud training)
  Quality: 93-98% correct, consistent format

Recommendation: Start with Option B. Fine-tune (Option C) only
if format consistency remains a problem after 2 weeks of prompt iteration.

Conclusion

Prompt engineering solves most problems. RAG solves knowledge gaps. Fine-tuning solves consistent behaviour. The three are complementary, not competitive — the best production systems use all three in layers. The key is sequencing: prompt first, RAG second, fine-tune last.

See RAG Tutorial 2026 for the RAG implementation and Fine-Tune Llama 4 with QLoRA and Unsloth for the fine-tuning implementation.

Part 6: Retrieval Architectures for RAG

RAG is not a single pattern — it consists of several architectural choices.

6.1 Hybrid retrieval: sparse + dense

Combine keyword search with vector search. Use a sparse search engine such as Elasticsearch or PostgreSQL full-text search to filter candidate documents, then rerank them with dense embeddings.

SELECT id, title, text
FROM docs
WHERE text @@ plainto_tsquery('english', $1)
ORDER BY ts_rank_cd(text, plainto_tsquery('english', $1)) DESC
LIMIT 100;

Then compute dense embeddings for the top 100 documents and rerank by cosine similarity.

6.2 Local vector stores

For sovereignty, keep the vector store on-premises:

pgvector inside PostgreSQL
Qdrant in a private Docker container
ChromaDB on local disk

This avoids sending embeddings or queries to any external vendor.

6.3 Document chunking and context windows

Split documents into chunks that fit your LLM context window. For a 14B model with a 4,096 token window, 300–500 token chunks work well.

Use overlapping chunks to preserve continuity:

chunk 1: tokens 0–400
chunk 2: tokens 320–720
chunk 3: tokens 640–1040

This gives the retriever enough context without losing segment boundaries.

6.4 Prompting the retriever output

When assembling the prompt, provide the LLM with only the top K chunks plus an explicit instruction:

You are a knowledge assistant. Use ONLY the provided sources below when answering. If the answer is not in the sources, say "I don't know." Output in complete sentences.

Source 1:
<chunk text>

Source 2:
<chunk text>

This reduces hallucination and keeps the model grounded.

Part 7: Evaluating RAG Quality

Measure retrieval quality, not just final answer quality.

7.1 Retrieval precision and recall

Precision: percentage of retrieved chunks that are actually relevant. Recall: percentage of relevant chunks included in the top results.

A good RAG system should have a recall of 90%+ for the top 10 chunks.

7.2 Answer fidelity

Compare the model’s answer to the source documents. If the model is “hallucinating,” the retrieval or prompt is at fault.

7.3 Human-in-the-loop evaluation

Use domain experts to rate answers on correctness, completeness, and hallucination. Track the human score over time as you iterate.

7.4 Explainability

Store the source references and chunk IDs with each answer. Use them for audits and debugging.

Part 8: Fine-Tuning Data Strategy

Fine-tuning works best with clean, consistent examples.

8.1 Data quality over quantity

A dataset of 200 high-quality examples often beats 2,000 noisy ones. Each example should demonstrate the exact output pattern you want the model to learn.

8.2 Input/output pair design

For instruction fine-tuning, use a structure like:

{
  "instruction": "Summarize this paragraph in two sentences.",
  "input": "...",
  "output": "..."
}

For style tuning, the instruction can be abstract and the output should be the desired style directly.

8.3 Validation and held-out prompts

Keep a validation set of prompts that the model has not seen during training. Use it to check whether the fine-tuned model generalises or simply memorises.

8.4 Iterating on the dataset

After the first round of fine-tuning, review outputs and add examples for failure cases. Focus on where the model still misses the required format or tone.

Part 9: Cost and Maintenance Comparison

A sovereign system should document not only the initial implementation but the ongoing cost.

9.1 Maintenance burden

Prompt engineering: low maintenance, easy to update
RAG: moderate maintenance, requires document updates and index rebuilds
Fine-tuning: higher maintenance, requires retraining when requirements change

9.2 Infrastructure cost

Prompt engineering: only inference cost
RAG: inference + storage + retrieval compute
Fine-tuning: training compute + model version management

9.3 Operational risk

RAG adds retrieval complexity and a second data store. Fine-tuning adds model version drift and validation risk. Prompt engineering adds the least infrastructure risk.

9.4 Governance checklist

Prompt templates are version controlled
Retrieval documents are audited and timestamped
Fine-tuned models are tracked with version metadata
Answer provenance is stored with each response
fallback behaviours are documented

Part 10: Practical Rule of Thumb

For a sovereign local AI deployment:

Start with prompt engineering.
Add RAG when the model needs current or proprietary knowledge.
Fine-tune only if output style or format still fails after prompt and retrieval iteration.

This sequence keeps the system manageable and avoids unnecessary training cycles.

Part 11: Embedding and Vector Quality

The quality of your RAG system depends heavily on the embeddings.

11.1 Embedding model selection

Choose an embedding model that matches your data type. For text, use an embedding model trained on semantic similarity. For code, use code-specific embeddings. For multilingual data, use a multilingual text embedding model.

11.2 Embedding caching and storage

Generate embeddings once and store them locally. Each document should have a metadata record with the embedding model version and a timestamp.

11.3 Vector index tuning

The performance of HNSW indexes depends on parameters such as M, ef_construction, and ef_search. A typical config for production is:

M = 16
ef_construction = 200
ef_search = 200

These settings balance build time and query quality. For lower RAM systems, use smaller values and test recall.

11.4 Document chunk scoring

When you retrieve multiple chunks, score them not only by similarity but also by relevance heuristics such as document freshness, source trust level, and user intent match. A simple blended score can improve final answer accuracy.

Part 12: Prompt Templates and Guardrails

Use prompt templates for every RAG query.

You are an expert assistant. Use ONLY the following retrieved sources to answer the user's question.
If the answer is not contained in the sources, say "I don't know." Do not hallucinate.

Sources:
{sources}

Question: {question}

Answer:

Keep the template stable and use placeholders for the user query and source text. This makes your system predictable.

Part 13: Evaluating Fine-Tuned Outputs

After fine-tuning, measure quality with a consistent set of metrics.

13.1 Exact match and similarity

For structured outputs, use exact match or normalized string comparison. For free-form outputs, use semantic similarity against reference answers.

13.2 Human evaluation

Use a checklist for human review:

Does the output follow the required format?
Is the tone appropriate?
Does it use the correct terminology?
Does it avoid hallucinations?

13.3 Regression testing

Keep a regression suite of prompts that previously failed. Re-run it after every new fine-tuning iteration.

Part 14: Operationalising your knowledge stack

A production RAG/fine-tuned system should have clear operational boundaries.

14.1 Document update workflows

When documents change, update the vector index and re-run retrieval tests. Keep a changelog of document refreshes.

14.2 Model versioning

Track fine-tuned model versions with metadata: training date, dataset hash, prompt template version, and evaluation scores.

14.3 Rollback procedures

If a fine-tuned model performs worse in production, rollback to the previous version. Keep a stable version as the default and a candidate version for canary testing.

Part 15: Cost-saving patterns

Even local, compute costs matter.

15.1 Reduce document corpus size

Use document filtering to keep the vector store focused on relevant documents only. More data is not always better.

15.2 Use smaller models for retrieval-only tasks

For retrieval and reranking, a smaller model or embedder may be sufficient. Reserve the larger generative model for final answer composition.

15.3 Prune old vectors

If documents are stale or no longer relevant, remove them from the index instead of leaving them to pollute results.

Part 16: Governance and Auditability

Sovereignty means you can explain and audit every decision.

16.1 Answer provenance

Store the chunk IDs and sources that contributed to each answer. This creates a traceable path from question to response.

16.2 Feedback loops

If users flag an answer as incorrect, store the feedback and use it to improve retrieval, prompt templates, or fine-tuning examples.

16.3 Local runbooks

Keep runbooks for:

updating the document corpus
refreshing the embedding index
retraining or rolling back fine-tuned models
handling hallucination incidents

Part 17: Practical Templates

Use templated prompts for the three approaches:

Prompt Engineering template

You are a helpful assistant.
Answer the user's question concisely.
Use a polite tone and avoid speculation.

User: {question}

Assistant:

RAG template

You are an expert assistant. Use ONLY the sources provided below.
If the answer cannot be found, say "I don't know."

Sources:
{sources}

Question: {question}

Answer:

Fine-tuning instruction template

Instruction: {instruction}
Input: {input}
Output:
{output}

A clean template makes training examples easier to author and review.

Part 18: Practical Risk Mitigation

Every AI system has risks. A sovereign AI system must be designed to mitigate them.

18.1 Hallucination containment

Use explicit instructions and grounding. If the model cannot answer from the provided sources, it should say so. Do not let it guess.

18.2 Sensitive data isolation

For RAG, keep sensitive documents in a separate index and only retrieve them when the user is authorised.

18.3 Versioned prompts and templates

Keep prompt templates under version control. When you update a template, record the change and test the system to ensure answer quality does not regress.

Part 19: Low-Risk Testing Strategies

Validate the system in a staging environment that mirrors production.

19.1 Canary prompts

Create a set of representative prompts and run them against every new model or retrieval pipeline change. Compare the outputs to a baseline.

19.2 Regression prompts

Keep a set of prompts that previously exposed issues and rerun them after each change.

19.3 Data drift monitoring

Track the distribution of query types and retrieved sources. If the query mix changes, adjust the retrieval and prompt strategy accordingly.

Part 20: Local Tooling and Developer Experience

A self-hosted AI project must be easy for developers to work with.

20.1 Local development stack

Use local Docker Compose or local services to run the vector store, the embedding service, and the LLM. Developers should be able to spin up the full stack with one command.

20.2 Sample data and fixtures

Keep a small sample corpus and a test dataset in the repo. This enables quick local experiments without needing the full production data.

20.3 Reproducible experiments

Use a experiments/ folder for prompt templates, model versions, and result summaries. This creates a local research log.

Part 21: Final Production Readiness Checklist

retrieval index is regularly refreshed
answer provenance is stored with every result
prompt templates are version controlled
fine-tuned models are clearly tagged and tracked
fallback behaviour is defined for unknown queries
audit logs exist for retrieval and generation
performance metrics are monitored for drift
security boundaries are defined for sensitive content
the system can be restored from backup quickly

A production-ready RAG/fine-tuning system is not just about better answers; it is about making the whole stack auditable, maintainable, and resilient.

Part 22: Handling Domain Drift

Domain drift occurs when the topic or terminology of the user’s queries changes over time.

22.1 Monitoring topic drift

Log the top topics and extract keywords from incoming queries. If the query distribution shifts, plan an index refresh or prompt update.

22.2 Adaptive retrieval

For drifting domains, use a tiered retrieval architecture. Keep a stable base index and a smaller fresh index for recent or rapidly changing content. Query both and merge the top results.

22.3 Feedback-driven retraining

If users repeatedly mark answers as wrong or incomplete, surface those cases into a feedback dataset. Use that dataset to improve fine-tuning or to refine the prompt instructions.

Part 23: Explainability in Production

For sovereign systems, explainability is a key trust feature.

23.1 Source attribution

Always attach the retrieved sources to the final answer. Make it simple to map each statement back to the document chunk that produced it.

23.2 Transparent scoring

Record the similarity scores and the retrieval rationale. If a user asks why an answer was chosen, you can show which documents contributed and how.

23.3 Audit reports

Generate periodic audit reports that show the most common queries, the highest-ranked sources, and any hallucination incidents. This provides governance evidence for internal stakeholders.

Part 24: Continuous Prompt Calibration

Prompt calibration should be continuous, especially when your retrieval or fine-tuning data evolves.

24.1 A/B test prompt variants

Run A/B tests of different prompt formulations against the same retrieval results. Compare accuracy, hallucination rate, and user satisfaction.

24.2 Keep a prompt change log

Every prompt update should be logged with the rationale and the observed effect. This log is essential for teams to understand why one prompt variant replaced another.

Fine-Tuning LLMs with QLoRA and Unsloth 2026: Local Training Guide

>_ 22 Apr | 22 min | Dev Corner

🔴Advanced

Fine-tune large language models locally with QLoRA and Unsloth on Ubuntu 24.04 in 2026. Covers dataset preparation, LoRA configuration, training on RTX 4090.

By Marcus Thorne

Prompt Engineering Guide 2026: Chain-of-Thought, Few-Shot & Structured Output

>_ 25 Feb | 17 min | Dev Corner

🟡Intermediate

Master prompt engineering for sovereign LLM deployments: chain-of-thought reasoning, few-shot examples, system prompt design, JSON mode structured output, and prompt versioning with local Ollama models.

By Anju Kushwaha

Best Open-Weight AI Models 2026: Llama 4, Qwen3, Gemma3 Compared

>_ 14 May | 20 min | Dev Corner

Vucense Audit: Compare the top open-weight LLMs for sovereign deployment in 2026: Llama 4 Scout, Qwen3 14B, Gemma3, Mistral Small 3.1, and Phi-4.

By Kofi Mensah

#rag #fine-tuning #prompt-engineering #llm #decision-framework #2026

Key Takeaways

Introduction

The Decision Framework

Detailed Comparison

Part 1: Prompt Engineering First

Part 2: When to Add RAG

Part 3: When Fine-Tuning Is Warranted

Part 4: Combining All Three

Cost and Timeline Summary

Conclusion

People Also Ask

How many examples do I need to fine-tune an LLM?

Is RAG or fine-tuning better for up-to-date information?

Part 6: Retrieval Architectures for RAG

6.1 Hybrid retrieval: sparse + dense

6.2 Local vector stores

6.3 Document chunking and context windows

6.4 Prompting the retriever output

Part 7: Evaluating RAG Quality

7.1 Retrieval precision and recall

7.2 Answer fidelity

7.3 Human-in-the-loop evaluation

7.4 Explainability

Part 8: Fine-Tuning Data Strategy

8.1 Data quality over quantity

8.2 Input/output pair design

8.3 Validation and held-out prompts

8.4 Iterating on the dataset

Part 9: Cost and Maintenance Comparison

9.1 Maintenance burden

9.2 Infrastructure cost

9.3 Operational risk

9.4 Governance checklist

Part 10: Practical Rule of Thumb

Part 11: Embedding and Vector Quality

11.1 Embedding model selection

11.2 Embedding caching and storage

11.3 Vector index tuning

11.4 Document chunk scoring

Part 12: Prompt Templates and Guardrails

Part 13: Evaluating Fine-Tuned Outputs

13.1 Exact match and similarity

13.2 Human evaluation

13.3 Regression testing

Part 14: Operationalising your knowledge stack

14.1 Document update workflows

14.2 Model versioning

14.3 Rollback procedures

Part 15: Cost-saving patterns

15.1 Reduce document corpus size

15.2 Use smaller models for retrieval-only tasks

15.3 Prune old vectors

Part 16: Governance and Auditability

16.1 Answer provenance

16.2 Feedback loops

16.3 Local runbooks

Part 17: Practical Templates

Prompt Engineering template

RAG template

Fine-tuning instruction template

Part 18: Practical Risk Mitigation

18.1 Hallucination containment

18.2 Sensitive data isolation

18.3 Versioned prompts and templates

Part 19: Low-Risk Testing Strategies

19.1 Canary prompts

19.2 Regression prompts

19.3 Data drift monitoring

Part 20: Local Tooling and Developer Experience

20.1 Local development stack

20.2 Sample data and fixtures

20.3 Reproducible experiments

Part 21: Final Production Readiness Checklist

Part 22: Handling Domain Drift

22.1 Monitoring topic drift

22.2 Adaptive retrieval

22.3 Feedback-driven retraining

Part 23: Explainability in Production

23.1 Source attribution

23.2 Transparent scoring