AI Agent Design Patterns 2026: Reflection, Tool Use, Planning & Multi-Agent

Key Takeaways

Four patterns, infinite combinations: Reflection + Tool Use + Planning + Multi-Agent are the primitives. Every production agent is a composition of these four. See CrewAI Multi-Agent Systems for orchestration beyond individual patterns.
Reflection is cheap and effective: Two LLM calls (generate + critique) consistently outperform one long system prompt. Add it to any generation task. For more on quality improvement, see Prompt Engineering Guide 2026.
Tools are the bridge to the real world: File read/write + web search + code execution covers 80% of agentic tasks. Standardise tools with MCP Protocol: Build an AI Tool Server.
Multi-agent for long tasks: When a task requires more than 20 steps, split it between specialised agents to avoid context accumulation degrading quality. See LangChain and LangGraph Local Agents for graph-based state management.

Introduction

Direct Answer: What are the main AI agent design patterns and how do I implement them in Python in 2026?

The four agentic design patterns defined by Andrew Ng (DeepLearning.AI, 2024) are: Reflection — the agent critiques its own output and revises it; Tool Use — the agent calls external functions (search, code execution, file access); Planning — the agent decomposes a complex goal into subtasks and executes them in order; Multi-Agent — multiple specialised agents collaborate, each handling a distinct role. In Python with Ollama: implement Reflection with two sequential ollama.chat() calls (generate then critique); Tool Use with @tool-decorated functions and llm.bind_tools(tools) in LangChain; Planning with a prompt that asks the model to list steps before executing; Multi-Agent with separate agent instances passing results between them. All four patterns work with local Qwen3 14B or Llama 4 Scout via Ollama — no cloud API required.

Pattern 1: Reflection

The agent generates a response, then critiques and improves it. Two LLM calls produce better results than one.

# pattern1_reflection.py
import ollama

MODEL = "qwen3:14b"

def reflect_and_revise(task: str, max_iterations: int = 2) -> str:
    """Generate a response, critique it, then revise. Returns final output."""

    # Step 1: Initial generation
    initial = ollama.chat(
        model=MODEL,
        messages=[
            {"role": "system", "content": "You are an expert developer. Complete the task thoroughly."},
            {"role": "user", "content": task}
        ]
    )["message"]["content"]

    print(f"[Initial output preview]\n{initial[:200]}...\n")

    current = initial
    for i in range(max_iterations):
        # Step 2: Critique
        critique = ollama.chat(
            model=MODEL,
            messages=[
                {"role": "system", "content": """You are a critical code reviewer.
Identify specific problems with the code: bugs, missing edge cases, security issues, poor practices.
Be concrete — list exactly what to fix. If the code is already good, say 'No issues found.'"""},
                {"role": "user", "content": f"Original task: {task}\n\nCode to review:\n{current}"}
            ]
        )["message"]["content"]

        if "No issues found" in critique:
            print(f"[Iteration {i+1}] Critique: No issues — stopping early.")
            break

        print(f"[Iteration {i+1}] Critique found issues — revising...")

        # Step 3: Revise based on critique
        current = ollama.chat(
            model=MODEL,
            messages=[
                {"role": "system", "content": "You are an expert developer. Revise the code based on the critique."},
                {"role": "user", "content": f"""Original task: {task}

Current code:
{current}

Critique:
{critique}

Produce the revised, improved code only. No explanation."""}
            ]
        )["message"]["content"]

    return current

# Example: write a function that will get critiqued and improved
result = reflect_and_revise(
    "Write a Python function to read a CSV file and return rows where a numeric column exceeds a threshold."
)
print("\n[Final output]")
print(result)

Expected output:

[Initial output preview]
def filter_csv(filepath: str, column: str, threshold: float) -> list[dict]:
    import csv
    with open(filepath) as f:
        return [row for row in csv.DictReader(f) if float(row[column]) > threshold]...

[Iteration 1] Critique found issues — revising...

[Final output]
import csv
from pathlib import Path

def filter_csv(filepath: str | Path, column: str, threshold: float) -> list[dict]:
    """Return rows where the numeric column value exceeds the threshold.
    
    Args:
        filepath: Path to the CSV file
        column: Name of the numeric column to filter on
        threshold: Minimum value (exclusive) to include
        
    Returns:
        List of row dicts where column > threshold
        
    Raises:
        FileNotFoundError: If the CSV file doesn't exist
        KeyError: If the column name is not in the CSV
        ValueError: If a row's column value cannot be converted to float
    """
    path = Path(filepath)
    if not path.exists():
        raise FileNotFoundError(f"CSV file not found: {filepath}")
    
    results = []
    with path.open(newline='') as f:
        reader = csv.DictReader(f)
        if column not in (reader.fieldnames or []):
            raise KeyError(f"Column '{column}' not found. Available: {reader.fieldnames}")
        for row_num, row in enumerate(reader, start=2):
            try:
                if float(row[column]) > threshold:
                    results.append(dict(row))
            except ValueError:
                raise ValueError(f"Row {row_num}: cannot convert '{row[column]}' to float")
    return results

The reflection pass added type hints, docstring, error handling, and edge case coverage that the initial version missed.

Pattern 2: Tool Use

The agent calls external functions to act on the world — the most common production pattern.

# pattern2_tool_use.py
import ollama
import subprocess
import json
from pathlib import Path

# ── Define tools ──────────────────────────────────────────────────────────
TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read the contents of a file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string", "description": "File path to read"}
                },
                "required": ["path"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "write_file",
            "description": "Write content to a file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string"},
                    "content": {"type": "string"}
                },
                "required": ["path", "content"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "run_python",
            "description": "Execute Python code and return stdout",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {"type": "string", "description": "Python code to execute"}
                },
                "required": ["code"]
            }
        }
    }
]

def execute_tool(name: str, args: dict) -> str:
    if name == "read_file":
        try:
            return Path(args["path"]).read_text()
        except Exception as e:
            return f"Error: {e}"
    elif name == "write_file":
        try:
            Path(args["path"]).write_text(args["content"])
            return f"Written: {args['path']}"
        except Exception as e:
            return f"Error: {e}"
    elif name == "run_python":
        result = subprocess.run(
            ["python3", "-c", args["code"]],
            capture_output=True, text=True, timeout=10
        )
        return result.stdout or result.stderr
    return f"Unknown tool: {name}"

def run_agent(goal: str) -> str:
    messages = [
        {"role": "system", "content": "You are a helpful coding assistant. Use tools to complete tasks."},
        {"role": "user", "content": goal}
    ]

    while True:
        response = ollama.chat(
            model="llama4:scout",   # Best tool-calling model
            messages=messages,
            tools=TOOLS
        )
        msg = response["message"]
        messages.append(msg)

        if not msg.get("tool_calls"):
            return msg["content"]   # No more tool calls — done

        # Execute each tool call
        for tc in msg["tool_calls"]:
            tool_name = tc["function"]["name"]
            tool_args = tc["function"]["arguments"]
            result = execute_tool(tool_name, tool_args)
            print(f"  [Tool] {tool_name}({list(tool_args.keys())}) → {result[:80]}")

            messages.append({
                "role": "tool",
                "content": result,
                "tool_call_id": tc.get("id", "0")
            })

result = run_agent(
    "Create a Python file called 'hello.py' that prints 'Hello, sovereign world!', "
    "then run it and show me the output."
)
print("\n[Agent result]:", result)

Expected output:

  [Tool] write_file(['path', 'content']) → Written: hello.py
  [Tool] run_python(['code']) → Hello, sovereign world!

[Agent result]: I created hello.py and ran it. The output is:
Hello, sovereign world!

Pattern 3: Planning

The agent decomposes a complex goal into steps before executing — reduces hallucination on multi-step tasks.

# pattern3_planning.py
import ollama
import json

MODEL = "qwen3:14b"

def plan_and_execute(goal: str) -> dict:
    # Step 1: Generate a plan
    plan_response = ollama.chat(
        model=MODEL,
        messages=[
            {"role": "system", "content": """You are a planning agent.
Given a goal, create a concrete execution plan as JSON:
{"steps": [{"id": 1, "action": "description", "depends_on": []}, ...]}
Return ONLY the JSON."""},
            {"role": "user", "content": f"Goal: {goal}"}
        ],
        format="json"
    )
    plan = json.loads(plan_response["message"]["content"])
    print(f"[Plan] {len(plan['steps'])} steps")
    for step in plan["steps"]:
        print(f"  {step['id']}. {step['action']}")

    # Step 2: Execute each step in order
    results = {}
    for step in plan["steps"]:
        context = "\n".join(
            f"Step {dep} result: {results[dep]}"
            for dep in step.get("depends_on", [])
            if dep in results
        )
        exec_response = ollama.chat(
            model=MODEL,
            messages=[
                {"role": "system", "content": "Execute this step concisely. Return only the output."},
                {"role": "user", "content": f"Goal: {goal}\n\n{f'Context: {context}' if context else ''}\n\nExecute: {step['action']}"}
            ]
        )
        results[step["id"]] = exec_response["message"]["content"]
        print(f"  [Step {step['id']} done] {results[step['id']][:80]}...")

    # Return the final step's result
    final_id = plan["steps"][-1]["id"]
    return {"plan": plan, "final_result": results[final_id]}

output = plan_and_execute(
    "Research the three main PostgreSQL connection pooling solutions, "
    "compare them, and recommend the best one for a FastAPI app."
)
print("\n[Final result]")
print(output["final_result"])

Expected output:

[Plan] 3 steps
  1. List the three main PostgreSQL connection poolers
  2. Compare their performance, features, and deployment complexity
  3. Recommend the best option for a FastAPI application with reasoning

  [Step 1 done] The three main PostgreSQL connection poolers are: PgBouncer...
  [Step 2 done] PgBouncer: lightweight, C-based, 10k+ connections → 20 server...
  [Step 3 done] Recommendation: PgBouncer in transaction mode...

[Final result]
Recommendation: PgBouncer in transaction mode is the best choice for a FastAPI
application. It supports 500+ concurrent connections → 20 server connections,
has the lowest overhead, and is the most battle-tested...

Pattern 4: Multi-Agent

Specialised agents collaborate — each starts with a fresh context, reducing token accumulation and hallucination.

# pattern4_multi_agent.py
import ollama

def make_agent(name: str, role: str, model: str = "qwen3:14b"):
    """Create a named agent with a specific role."""
    def run(task: str, context: str = "") -> str:
        messages = [
            {"role": "system", "content": f"You are {name}, a {role}. Be concise and focused."},
        ]
        if context:
            messages.append({"role": "user", "content": f"Context from previous agents:\n{context}"})
        messages.append({"role": "user", "content": task})

        return ollama.chat(model=model, messages=messages)["message"]["content"]
    return run

# Define specialised agents
researcher = make_agent("Research", "technical researcher who finds facts and summarises")
architect = make_agent("Architect", "software architect who designs systems")
critic     = make_agent("Critic", "senior engineer who finds problems and risks")
writer     = make_agent("Writer", "technical writer who produces clear documentation")

def pipeline(goal: str) -> str:
    """Run a multi-agent pipeline: research → design → critique → document."""
    print(f"Goal: {goal}\n")

    research  = researcher(f"Research: {goal}")
    print(f"[Researcher]: {research[:120]}...\n")

    design    = architect(f"Design a solution for: {goal}", context=research)
    print(f"[Architect]: {design[:120]}...\n")

    critique  = critic(f"Critique this design for: {goal}", context=f"{research}\n\n{design}")
    print(f"[Critic]: {critique[:120]}...\n")

    doc       = writer(
        f"Write a concise technical summary for: {goal}",
        context=f"Research: {research}\n\nDesign: {design}\n\nCritique: {critique}"
    )
    return doc

result = pipeline("Design a caching strategy for a high-traffic FastAPI application")
print("[Final document]")
print(result)

Expected output:

Goal: Design a caching strategy for a high-traffic FastAPI application

[Researcher]: Key caching options for FastAPI: Redis (in-memory, pub/sub, TTL), 
Memcached (simpler, multi-threaded), CDN caching (Cloudflare), HTTP response...

[Architect]: Recommended layered strategy: (1) Redis for session/computed data...

[Critic]: Risks: Redis single point of failure without sentinel/cluster; cache...

[Final document]
## FastAPI Caching Strategy

**Recommended Stack:** Redis + HTTP cache headers + optional CDN

**Implementation:**
1. Install `redis[hiredis]` and `fastapi-cache2`
2. Cache expensive endpoints with `@cache(expire=300)`
3. Configure Redis sentinel for high availability
...

Each agent starts fresh — no context accumulation, lower hallucination on long tasks.

Combining Patterns

# Most production agents combine patterns:

# Reflection + Tool Use: agent uses tools, then reflects on whether output is correct
# Planning + Tool Use: agent plans steps, executes each with tools
# Multi-Agent + Reflection: each agent reflects before passing output to the next
# Planning + Multi-Agent: planner creates a plan, assigns steps to specialised agents

Conclusion

The four agentic design patterns — Reflection, Tool Use, Planning, Multi-Agent — are the building blocks of every production AI agent in 2026. Reflection adds quality; Tool Use adds capability; Planning adds reliability on complex tasks; Multi-Agent adds scalability. All four patterns run locally against Qwen3 14B or Llama 4 Scout via Ollama with zero cloud cost.

See LangChain and LangGraph Local Agents 2026 for implementing these patterns with the graph-based LangGraph framework, and MCP Protocol: Build a Server in Python for standardising the Tool Use pattern’s tools as MCP servers.

Key Takeaways

Introduction

Pattern 1: Reflection

Pattern 2: Tool Use

Pattern 3: Planning

Pattern 4: Multi-Agent

Combining Patterns

Conclusion

People Also Ask

Which design pattern should I start with?

How many LLM calls does a typical agent make?

Further Reading

Further Reading

CrewAI Tutorial 2026: Multi-Agent Systems with Local Ollama

LangChain and LangGraph with Ollama: Build Local AI Agents in Python 2026

Sovereign AI Agents 2026: A Complete Guide to Local, Offline Agentic Systems

Comments

YOLOv11 on Raspberry Pi 5 & Jetson Nano 2026: Edge AI Object Detection

Self-Hosted Web Infrastructure 2026: Full Stack Without PaaS

Sovereign AI Agents 2026: A Complete Guide to Local, Offline Agentic Systems

Docker Volumes Guide 2026: Persistent Data Storage for Containers

Docker Networking Explained 2026: Bridge, Host & Overlay Networks

Recently Visited

Key Takeaways

Introduction

Pattern 1: Reflection

Pattern 2: Tool Use

Pattern 3: Planning

Pattern 4: Multi-Agent

Combining Patterns

Conclusion

People Also Ask

Which design pattern should I start with?

How many LLM calls does a typical agent make?

Related Vucense Guides

Further Reading

Further Reading

CrewAI Tutorial 2026: Multi-Agent Systems with Local Ollama

LangChain and LangGraph with Ollama: Build Local AI Agents in Python 2026

Sovereign AI Agents 2026: A Complete Guide to Local, Offline Agentic Systems

The Sovereign Brief

You're in!

Comments

Recently Visited