Key Takeaways
- Four patterns, infinite combinations: Reflection + Tool Use + Planning + Multi-Agent are the primitives. Every production agent is a composition of these four. See CrewAI Multi-Agent Systems for orchestration beyond individual patterns.
- Reflection is cheap and effective: Two LLM calls (generate + critique) consistently outperform one long system prompt. Add it to any generation task. For more on quality improvement, see Prompt Engineering Guide 2026.
- Tools are the bridge to the real world: File read/write + web search + code execution covers 80% of agentic tasks. Standardise tools with MCP Protocol: Build an AI Tool Server.
- Multi-agent for long tasks: When a task requires more than 20 steps, split it between specialised agents to avoid context accumulation degrading quality. See LangChain and LangGraph Local Agents for graph-based state management.
Introduction
Direct Answer: What are the main AI agent design patterns and how do I implement them in Python in 2026?
The four agentic design patterns defined by Andrew Ng (DeepLearning.AI, 2024) are: Reflection — the agent critiques its own output and revises it; Tool Use — the agent calls external functions (search, code execution, file access); Planning — the agent decomposes a complex goal into subtasks and executes them in order; Multi-Agent — multiple specialised agents collaborate, each handling a distinct role. In Python with Ollama: implement Reflection with two sequential ollama.chat() calls (generate then critique); Tool Use with @tool-decorated functions and llm.bind_tools(tools) in LangChain; Planning with a prompt that asks the model to list steps before executing; Multi-Agent with separate agent instances passing results between them. All four patterns work with local Qwen3 14B or Llama 4 Scout via Ollama — no cloud API required.
Pattern 1: Reflection
The agent generates a response, then critiques and improves it. Two LLM calls produce better results than one.
# pattern1_reflection.py
import ollama
MODEL = "qwen3:14b"
def reflect_and_revise(task: str, max_iterations: int = 2) -> str:
"""Generate a response, critique it, then revise. Returns final output."""
# Step 1: Initial generation
initial = ollama.chat(
model=MODEL,
messages=[
{"role": "system", "content": "You are an expert developer. Complete the task thoroughly."},
{"role": "user", "content": task}
]
)["message"]["content"]
print(f"[Initial output preview]\n{initial[:200]}...\n")
current = initial
for i in range(max_iterations):
# Step 2: Critique
critique = ollama.chat(
model=MODEL,
messages=[
{"role": "system", "content": """You are a critical code reviewer.
Identify specific problems with the code: bugs, missing edge cases, security issues, poor practices.
Be concrete — list exactly what to fix. If the code is already good, say 'No issues found.'"""},
{"role": "user", "content": f"Original task: {task}\n\nCode to review:\n{current}"}
]
)["message"]["content"]
if "No issues found" in critique:
print(f"[Iteration {i+1}] Critique: No issues — stopping early.")
break
print(f"[Iteration {i+1}] Critique found issues — revising...")
# Step 3: Revise based on critique
current = ollama.chat(
model=MODEL,
messages=[
{"role": "system", "content": "You are an expert developer. Revise the code based on the critique."},
{"role": "user", "content": f"""Original task: {task}
Current code:
{current}
Critique:
{critique}
Produce the revised, improved code only. No explanation."""}
]
)["message"]["content"]
return current
# Example: write a function that will get critiqued and improved
result = reflect_and_revise(
"Write a Python function to read a CSV file and return rows where a numeric column exceeds a threshold."
)
print("\n[Final output]")
print(result)
Expected output:
[Initial output preview]
def filter_csv(filepath: str, column: str, threshold: float) -> list[dict]:
import csv
with open(filepath) as f:
return [row for row in csv.DictReader(f) if float(row[column]) > threshold]...
[Iteration 1] Critique found issues — revising...
[Final output]
import csv
from pathlib import Path
def filter_csv(filepath: str | Path, column: str, threshold: float) -> list[dict]:
"""Return rows where the numeric column value exceeds the threshold.
Args:
filepath: Path to the CSV file
column: Name of the numeric column to filter on
threshold: Minimum value (exclusive) to include
Returns:
List of row dicts where column > threshold
Raises:
FileNotFoundError: If the CSV file doesn't exist
KeyError: If the column name is not in the CSV
ValueError: If a row's column value cannot be converted to float
"""
path = Path(filepath)
if not path.exists():
raise FileNotFoundError(f"CSV file not found: {filepath}")
results = []
with path.open(newline='') as f:
reader = csv.DictReader(f)
if column not in (reader.fieldnames or []):
raise KeyError(f"Column '{column}' not found. Available: {reader.fieldnames}")
for row_num, row in enumerate(reader, start=2):
try:
if float(row[column]) > threshold:
results.append(dict(row))
except ValueError:
raise ValueError(f"Row {row_num}: cannot convert '{row[column]}' to float")
return results
The reflection pass added type hints, docstring, error handling, and edge case coverage that the initial version missed.
Pattern 2: Tool Use
The agent calls external functions to act on the world — the most common production pattern.
# pattern2_tool_use.py
import ollama
import subprocess
import json
from pathlib import Path
# ── Define tools ──────────────────────────────────────────────────────────
TOOLS = [
{
"type": "function",
"function": {
"name": "read_file",
"description": "Read the contents of a file",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "File path to read"}
},
"required": ["path"]
}
}
},
{
"type": "function",
"function": {
"name": "write_file",
"description": "Write content to a file",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string"},
"content": {"type": "string"}
},
"required": ["path", "content"]
}
}
},
{
"type": "function",
"function": {
"name": "run_python",
"description": "Execute Python code and return stdout",
"parameters": {
"type": "object",
"properties": {
"code": {"type": "string", "description": "Python code to execute"}
},
"required": ["code"]
}
}
}
]
def execute_tool(name: str, args: dict) -> str:
if name == "read_file":
try:
return Path(args["path"]).read_text()
except Exception as e:
return f"Error: {e}"
elif name == "write_file":
try:
Path(args["path"]).write_text(args["content"])
return f"Written: {args['path']}"
except Exception as e:
return f"Error: {e}"
elif name == "run_python":
result = subprocess.run(
["python3", "-c", args["code"]],
capture_output=True, text=True, timeout=10
)
return result.stdout or result.stderr
return f"Unknown tool: {name}"
def run_agent(goal: str) -> str:
messages = [
{"role": "system", "content": "You are a helpful coding assistant. Use tools to complete tasks."},
{"role": "user", "content": goal}
]
while True:
response = ollama.chat(
model="llama4:scout", # Best tool-calling model
messages=messages,
tools=TOOLS
)
msg = response["message"]
messages.append(msg)
if not msg.get("tool_calls"):
return msg["content"] # No more tool calls — done
# Execute each tool call
for tc in msg["tool_calls"]:
tool_name = tc["function"]["name"]
tool_args = tc["function"]["arguments"]
result = execute_tool(tool_name, tool_args)
print(f" [Tool] {tool_name}({list(tool_args.keys())}) → {result[:80]}")
messages.append({
"role": "tool",
"content": result,
"tool_call_id": tc.get("id", "0")
})
result = run_agent(
"Create a Python file called 'hello.py' that prints 'Hello, sovereign world!', "
"then run it and show me the output."
)
print("\n[Agent result]:", result)
Expected output:
[Tool] write_file(['path', 'content']) → Written: hello.py
[Tool] run_python(['code']) → Hello, sovereign world!
[Agent result]: I created hello.py and ran it. The output is:
Hello, sovereign world!
Pattern 3: Planning
The agent decomposes a complex goal into steps before executing — reduces hallucination on multi-step tasks.
# pattern3_planning.py
import ollama
import json
MODEL = "qwen3:14b"
def plan_and_execute(goal: str) -> dict:
# Step 1: Generate a plan
plan_response = ollama.chat(
model=MODEL,
messages=[
{"role": "system", "content": """You are a planning agent.
Given a goal, create a concrete execution plan as JSON:
{"steps": [{"id": 1, "action": "description", "depends_on": []}, ...]}
Return ONLY the JSON."""},
{"role": "user", "content": f"Goal: {goal}"}
],
format="json"
)
plan = json.loads(plan_response["message"]["content"])
print(f"[Plan] {len(plan['steps'])} steps")
for step in plan["steps"]:
print(f" {step['id']}. {step['action']}")
# Step 2: Execute each step in order
results = {}
for step in plan["steps"]:
context = "\n".join(
f"Step {dep} result: {results[dep]}"
for dep in step.get("depends_on", [])
if dep in results
)
exec_response = ollama.chat(
model=MODEL,
messages=[
{"role": "system", "content": "Execute this step concisely. Return only the output."},
{"role": "user", "content": f"Goal: {goal}\n\n{f'Context: {context}' if context else ''}\n\nExecute: {step['action']}"}
]
)
results[step["id"]] = exec_response["message"]["content"]
print(f" [Step {step['id']} done] {results[step['id']][:80]}...")
# Return the final step's result
final_id = plan["steps"][-1]["id"]
return {"plan": plan, "final_result": results[final_id]}
output = plan_and_execute(
"Research the three main PostgreSQL connection pooling solutions, "
"compare them, and recommend the best one for a FastAPI app."
)
print("\n[Final result]")
print(output["final_result"])
Expected output:
[Plan] 3 steps
1. List the three main PostgreSQL connection poolers
2. Compare their performance, features, and deployment complexity
3. Recommend the best option for a FastAPI application with reasoning
[Step 1 done] The three main PostgreSQL connection poolers are: PgBouncer...
[Step 2 done] PgBouncer: lightweight, C-based, 10k+ connections → 20 server...
[Step 3 done] Recommendation: PgBouncer in transaction mode...
[Final result]
Recommendation: PgBouncer in transaction mode is the best choice for a FastAPI
application. It supports 500+ concurrent connections → 20 server connections,
has the lowest overhead, and is the most battle-tested...
Pattern 4: Multi-Agent
Specialised agents collaborate — each starts with a fresh context, reducing token accumulation and hallucination.
# pattern4_multi_agent.py
import ollama
def make_agent(name: str, role: str, model: str = "qwen3:14b"):
"""Create a named agent with a specific role."""
def run(task: str, context: str = "") -> str:
messages = [
{"role": "system", "content": f"You are {name}, a {role}. Be concise and focused."},
]
if context:
messages.append({"role": "user", "content": f"Context from previous agents:\n{context}"})
messages.append({"role": "user", "content": task})
return ollama.chat(model=model, messages=messages)["message"]["content"]
return run
# Define specialised agents
researcher = make_agent("Research", "technical researcher who finds facts and summarises")
architect = make_agent("Architect", "software architect who designs systems")
critic = make_agent("Critic", "senior engineer who finds problems and risks")
writer = make_agent("Writer", "technical writer who produces clear documentation")
def pipeline(goal: str) -> str:
"""Run a multi-agent pipeline: research → design → critique → document."""
print(f"Goal: {goal}\n")
research = researcher(f"Research: {goal}")
print(f"[Researcher]: {research[:120]}...\n")
design = architect(f"Design a solution for: {goal}", context=research)
print(f"[Architect]: {design[:120]}...\n")
critique = critic(f"Critique this design for: {goal}", context=f"{research}\n\n{design}")
print(f"[Critic]: {critique[:120]}...\n")
doc = writer(
f"Write a concise technical summary for: {goal}",
context=f"Research: {research}\n\nDesign: {design}\n\nCritique: {critique}"
)
return doc
result = pipeline("Design a caching strategy for a high-traffic FastAPI application")
print("[Final document]")
print(result)
Expected output:
Goal: Design a caching strategy for a high-traffic FastAPI application
[Researcher]: Key caching options for FastAPI: Redis (in-memory, pub/sub, TTL),
Memcached (simpler, multi-threaded), CDN caching (Cloudflare), HTTP response...
[Architect]: Recommended layered strategy: (1) Redis for session/computed data...
[Critic]: Risks: Redis single point of failure without sentinel/cluster; cache...
[Final document]
## FastAPI Caching Strategy
**Recommended Stack:** Redis + HTTP cache headers + optional CDN
**Implementation:**
1. Install `redis[hiredis]` and `fastapi-cache2`
2. Cache expensive endpoints with `@cache(expire=300)`
3. Configure Redis sentinel for high availability
...
Each agent starts fresh — no context accumulation, lower hallucination on long tasks.
Combining Patterns
# Most production agents combine patterns:
# Reflection + Tool Use: agent uses tools, then reflects on whether output is correct
# Planning + Tool Use: agent plans steps, executes each with tools
# Multi-Agent + Reflection: each agent reflects before passing output to the next
# Planning + Multi-Agent: planner creates a plan, assigns steps to specialised agents
Conclusion
The four agentic design patterns — Reflection, Tool Use, Planning, Multi-Agent — are the building blocks of every production AI agent in 2026. Reflection adds quality; Tool Use adds capability; Planning adds reliability on complex tasks; Multi-Agent adds scalability. All four patterns run locally against Qwen3 14B or Llama 4 Scout via Ollama with zero cloud cost.
See LangChain and LangGraph Local Agents 2026 for implementing these patterns with the graph-based LangGraph framework, and MCP Protocol: Build a Server in Python for standardising the Tool Use pattern’s tools as MCP servers.
People Also Ask
Which design pattern should I start with?
Start with Reflection — it’s the simplest to implement (just two ollama.chat() calls) and consistently improves output quality for code generation, writing, and analysis. After Reflection, add Tool Use when you need the agent to interact with external systems. Add Planning when tasks regularly require more than 3–4 sequential steps. Use Multi-Agent when a single agent accumulates so much context over a task that output quality degrades.
How many LLM calls does a typical agent make?
A simple Reflection pattern uses 2 calls (generate + revise). A Tool Use loop for a medium-complexity task typically uses 5–15 calls. A Planning + execution pipeline for a research task might use 10–30 calls. Multi-agent pipelines multiply this by the number of agents. Each call costs time (1–5 seconds for a local 14B model) but zero money — the economics of local inference make call-intensive agent patterns practical that would be expensive with cloud APIs.
Related Vucense Guides
- CrewAI Multi-Agent Orchestration with Ollama — extend these patterns to hierarchical crews and delegation
- Best Open-Weight AI Models 2026 — choose the right local LLM for agent reasoning tasks
- Sovereign AI Agents Hub 2026 — complete guide to building autonomous agent systems locally
Further Reading
Vucense Articles
- LangChain and LangGraph Local Agents 2026 — implement these patterns with LangGraph
- CrewAI Multi-Agent Orchestration with Local Ollama — extend these patterns to CrewAI hierarchical workflows
- CrewAI Multi-Agent Orchestration 2026 — high-level framework for coordinating agent teams
- Best Open-Weight AI Models 2026: Llama 4, Qwen3, Gemma3 Compared — choose the right LLM for your agents
- MCP Protocol: Build an AI Tool Server in Python — standardise your Tool Use tools
- Prompt Engineering Guide 2026 — write better prompts for agent system messages
- Build a REST API with Node.js and Fastify 2026 — build tool backends for agents
Official Documentation
- Ollama Official Documentation — local LLM runtime; models at
ollama pull qwen3:14b - Python Official Documentation — Python 3.12+ reference
- LangChain Documentation — LLM app framework with agent abstractions
- DeepLearning.AI: Agentic AI Short Course — Andrew Ng’s foundational course on these patterns
Tool & Library References
- Ollama Model Library — browse and pull quantised models (Qwen3, Llama 4 Scout, Gemma3, Mistral, Phi-4)
- LangChain Tool Documentation — structure tools for agent use
- OpenAI Function Calling Guide — reference for tool-calling protocol (similar in Ollama)
Related Concepts
- Retrieval Augmented Generation (RAG) — extend Tool Use with vector search
- Prompt Engineering Best Practices — improve reflection system prompts
- Agent Benchmarks: ARC, HumanEval, MMLU-Pro — evaluate agent LLM capabilities
Tested on: Ubuntu 24.04 LTS (RTX 4090). Ollama 0.5.12, Qwen3 14B, Llama 4 Scout. Last verified: May 16, 2026.