What's the difference between CrewAI and LangGraph?

CrewAI is higher-level with built-in agents/tasks/crews abstraction. LangGraph is lower-level graph-based control flow. Choose CrewAI for multi-agent coordination, LangGraph for custom workflows requiring fine-grained control.

Can I use GPT-4 or Claude instead of Ollama?

Yes, CrewAI supports any LLM. Replace Ollama with OpenAI key: LLM(model='gpt-4', api_key=os.getenv('OPENAI_API_KEY')). Trade-off: cloud APIs are faster and smarter but cost money and require internet.

How much does it cost to run a 10-agent crew?

With Ollama (local): $0 after initial setup. With GPT-4: ~$0.03-0.10 per run depending on input tokens. For comparison, CrewAI with gpt-3.5-turbo: ~$0.005 per run. Local Ollama is free but slower.

What happens if an agent task times out?

CrewAI raises AgentMaxIterationsExceededException after max_iterations (default: 15). Set task timeout: timeout_in_seconds, implement retry logic with exponential backoff, or use shorter prompts to reduce iterations.

Can I run multiple CrewAI instances on the same Ollama server?

Yes, Ollama handles concurrent requests. One instance per model at a time (queued), or use multiple models loaded simultaneously if VRAM allows. For 7B model: ~6GB VRAM per instance.

How do I prevent agents from hallucinating?

3 strategies: 1) Ground agents in tools (research via APIs), 2) Use lower temperature (0.3 vs 0.7), 3) Implement fact-checking task. CrewAI prevents hallucination better than raw LLMs because agents must justify decisions.

How do I debug why an agent is stuck in a loop?

Enable verbose=True on crew, check agent.max_iterations (increase if needed), review task description for clarity, test tools independently. Common cause: ambiguous task description or tool returning unexpected format.

Dev Corner Engineering CrewAI & AutoGen

CrewAI Tutorial 2026: Multi-Agent Systems with Local Ollama

🟡Intermediate

Build sovereign multi-agent crews with CrewAI and local Ollama models. Covers role-based agents, task delegation, crew orchestration, tool integration, and production deployment with zero cloud APIs.

Author

Kofi Mensah

Inference Economics & Hardware Architect

Published

May 15, 2026

Duration

Reading

24 min

Build

30 min

CrewAI Tutorial 2026: Multi-Agent Systems with Local Ollama

Article Roadmap

Key Takeaways

CrewAI organises AI agents into crews — each Agent has a role, goal, and backstory that shapes how it approaches tasks, and a Crew coordinates multiple agents to complete a Process (sequential or hierarchical) with task dependencies and output passing.
Local Ollama integration uses CrewAI's LLM class: 'LLM(model="ollama/llama4:scout", base_url="http://localhost:11434")' — this is identical to the OpenAI integration but points at your local Ollama server for zero per-query cost.
CrewAI tools extend agent capabilities — the built-in SerperDevTool (web search), FileReadTool, and CodeInterpreterTool give agents real-world access, and custom tools built with the @tool decorator add domain-specific capabilities.
The Process.hierarchical setting enables a manager agent that delegates tasks to worker agents — the manager LLM plans which agent handles which task based on their roles, enabling more flexible task routing than the fixed sequential approach.

Key Takeaways

Agents have personas: Role + goal + backstory shapes how the LLM approaches each task — a “Senior Security Researcher” agent frames its analysis differently from a “Technical Writer” agent. See AI Agent Design Patterns 2026 for the underlying principles.
Tasks have outputs: Each Task defines expected_output which sets the quality bar and format, and output_file to save results to disk automatically. Combine with Docker Volumes to persist agent outputs in production.
Sequential vs hierarchical: Sequential processes run tasks in order, passing outputs between agents. Hierarchical adds a manager LLM that dynamically assigns tasks — more flexible, uses more inference calls. Compare with LangChain and LangGraph Local Agents for graph-based state management.
Local models work identically: LLM(model="ollama/llama4:scout") replaces LLM(model="gpt-4o") with no other code changes — same crew, same tasks, zero API cost.

Introduction

Direct Answer: How do I build a multi-agent CrewAI system with local Ollama models in 2026?

Install CrewAI, create agents (Researcher, Writer, Reviewer) with local Ollama LLM configured, define tasks with expected outputs, assemble into a Crew with hierarchical process, and execute with crew.kickoff(). All agent reasoning, tool execution, and task delegation runs locally on Ollama with zero cloud API calls. Requires Llama 4 Scout or Qwen3 14B for reliable agent behavior.

Part 1: Environment Setup

Before building agents, ensure Ollama is running locally with a capable model loaded. CrewAI communicates with Ollama via HTTP on port 11434.

pip install crewai --break-system-packages

# Verify
python3 -c "import crewai; print('CrewAI:', crewai.__version__)"

# Ensure capable model is available
ollama list | grep -E "llama4|qwen3:14b"

Expected output:

CrewAI: 0.80.4
NAME              SIZE
llama4:scout      12 GB

Part 2: Building Agents with Local LLM

Agents are the workers in a multi-agent crew. Each agent has a role, goal, and backstory that shapes how it approaches tasks. The backstory is critical — it frames the agent’s expertise and decision-making.

# simple_crew.py
from crewai import Agent, Task, Crew, Process, LLM

# Local Ollama model — identical API to OpenAI
local_llm = LLM(
    model="ollama/llama4:scout",
    base_url="http://localhost:11434",
    temperature=0.3,
)

# ── Define Agents ─────────────────────────────────────────────────────────
researcher = Agent(
    role="Senior Technical Researcher",
    goal="Find accurate, specific, and actionable technical information",
    backstory="""You are an expert researcher with 10 years of experience in
    software engineering and infrastructure. You always provide specific version
    numbers, exact commands, and verified facts. You never guess.""",
    llm=local_llm,
    verbose=False,
    max_iter=5,         # Max reasoning iterations before forced answer
)

writer = Agent(
    role="Technical Documentation Writer",
    goal="Transform research into clear, structured technical documentation",
    backstory="""You write developer documentation that is precise, concise,
    and immediately actionable. You use bullet points, code examples, and
    clear headings. You never use marketing language.""",
    llm=local_llm,
    verbose=False,
)

# ── Define Tasks ──────────────────────────────────────────────────────────
research_task = Task(
    description="""Research the topic: {topic}
    Find: what it is, why it matters, and 3 specific use cases with examples.""",
    expected_output="""A factual research summary with:
    - Clear definition
    - 3 concrete use cases with specific examples
    - Key technical specifications or version information""",
    agent=researcher,
)

writing_task = Task(
    description="""Write a developer quickstart guide for: {topic}
    Use the research findings provided. Include working code examples.""",
    expected_output="""A structured quickstart guide with:
    - 1-paragraph introduction
    - Prerequisites (numbered list)
    - 3-5 step setup guide with code
    - One working code example
    - Common pitfall to avoid""",
    agent=writer,
    context=[research_task],    # Receives researcher's output as context
)

# ── Assemble Crew ─────────────────────────────────────────────────────────
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential,
    verbose=False,
)

# ── Run ───────────────────────────────────────────────────────────────────
result = crew.kickoff(inputs={"topic": "Redis pub/sub for real-time notifications in Python"})
print(result.raw)

Expected output:

## Redis Pub/Sub Quickstart for Python Developers

Redis Pub/Sub enables real-time message broadcasting between services using a 
publisher/subscriber pattern where publishers send to channels and subscribers 
receive without direct coupling.

**Prerequisites:**
1. Redis 7.4+ running on localhost:6379
2. Python 3.12+ with redis-py: `pip install redis`

**Setup:**

**Step 1 — Publisher:**
```python
import redis
r = redis.Redis()
r.publish('notifications', '{"user_id": 42, "event": "new_message"}')

…


---

## Part 3: Hierarchical Crew with Manager Agent

Hierarchical crews use a manager agent to dynamically assign tasks to worker agents. This approach is more flexible than sequential task chains but requires more LLM inference calls.

```python
# hierarchical_crew.py — manager agent dynamically assigns tasks
from crewai import Agent, Task, Crew, Process, LLM
from crewai.tools import tool
import subprocess

local_llm = LLM(model="ollama/llama4:scout", base_url="http://localhost:11434")

# Tools available to agents
@tool("Server Health Check")
def check_server_health(host: str) -> str:
    """Check if a server is reachable and return its HTTP status."""
    result = subprocess.run(
        ["curl", "-sI", "--max-time", "3", f"http://{host}"],
        capture_output=True, text=True
    )
    return result.stdout.split('\n')[0] if result.returncode == 0 else f"Unreachable: {result.stderr}"

@tool("List Running Services")
def list_services() -> str:
    """List all active systemd services."""
    result = subprocess.run(
        ["systemctl", "list-units", "--type=service", "--state=running", "--no-pager", "--plain"],
        capture_output=True, text=True
    )
    return result.stdout[:1000]  # Truncate long output

# Specialised agents
infra_agent = Agent(
    role="Infrastructure Engineer",
    goal="Assess server health and identify operational issues",
    backstory="Expert in Linux systems, networking, and service management. Uses tools to gather actual system data.",
    tools=[check_server_health, list_services],
    llm=local_llm,
)

security_agent = Agent(
    role="Security Analyst",
    goal="Identify security vulnerabilities and misconfigurations",
    backstory="Specialises in server hardening, CVE analysis, and security best practices for Ubuntu servers.",
    llm=local_llm,
)

report_agent = Agent(
    role="Report Writer",
    goal="Produce clear, actionable technical reports from findings",
    backstory="Transforms technical findings into structured reports with prioritised action items.",
    llm=local_llm,
)

# Tasks
health_task = Task(
    description="Check the health of localhost and list running services",
    expected_output="Server health status with list of running services",
    agent=infra_agent,
)

security_task = Task(
    description="Analyse the server configuration for security issues. Focus on: exposed ports, unnecessary services, and hardening gaps",
    expected_output="Security findings with severity levels (Critical/High/Medium/Low) and specific fixes",
    agent=security_agent,
    context=[health_task],
)

report_task = Task(
    description="Write an executive summary of the server assessment",
    expected_output="2-page technical report: Summary, Findings (prioritised), Action Items (numbered)",
    agent=report_agent,
    context=[health_task, security_task],
    output_file="server-assessment-report.md",   # Auto-saves to file
)

crew = Crew(
    agents=[infra_agent, security_agent, report_agent],
    tasks=[health_task, security_task, report_task],
    process=Process.sequential,
    verbose=True,
)

result = crew.kickoff()
print("\nReport saved to: server-assessment-report.md")

Part 2.5: Complete Hierarchical Crew with Manager Agent

A manager agent makes dynamic decisions about which worker agent handles each task:

# hierarchical_with_manager.py — Complete Hierarchical Crew with Manager Agent Decision-Making
# Manager agent evaluates each task and assigns it to the most suitable worker agent
# Use case: complex multi-step projects where task type determines which specialist handles it

from crewai import Agent, Task, Crew, Process, LLM

# ══════════════════════════════════════════════════════════════════════════════════════════════
# Local LLM Configuration — Ollama Integration
# ══════════════════════════════════════════════════════════════════════════════════════════════

# Local Ollama model on http://localhost:11434
# temperature=0.3: Low randomness, deterministic output (good for structured tasks)
# Higher temperature (>0.7): More creative, varied responses (good for brainstorming)
local_llm = LLM(
    model="ollama/llama4:scout",
    base_url="http://localhost:11434",
    temperature=0.3  # Conservative: focus on accuracy, not creativity
)

# ══════════════════════════════════════════════════════════════════════════════════════════════
# Manager Agent — Orchestrates Team Task Assignment
# ══════════════════════════════════════════════════════════════════════════════════════════════

# Manager role: evaluates task requirements and routes to appropriate specialist
# Key difference from sequential crew: manager can make dynamic decisions per task
# Instead of pre-assigning tasks, manager reviews task description and assigns based on expertise
manager = Agent(
    role="Project Manager",
    goal="Coordinate a team to deliver high-quality technical documentation and code",
    
    # Backstory provides context for decision-making; influences how manager evaluates tasks
    # "Experienced project manager" persona helps with task prioritization
    backstory="""You are an experienced project manager overseeing a distributed team of 
    specialists. You decide which team member is best suited for each task based on:
    - Task complexity and type (code, documentation, review, testing)
    - Individual agent expertise and past performance
    - Delivery timeline and quality requirements
    You prioritise quality, accuracy, and timely delivery over speed.""",
    
    llm=local_llm,
    verbose=True,  # Log all manager decisions and reasoning
)

# ══════════════════════════════════════════════════════════════════════════════════════════════
# Worker Agents — Specialised Teams for Different Tasks
# ══════════════════════════════════════════════════════════════════════════════════════════════

# ── Code Expert Agent ──────────────────────────────────────────────────────────────────────
# Assigned to: tasks requiring code implementation, architectural decisions, optimization
code_expert = Agent(
    role="Senior Software Engineer",
    goal="Provide expert-level code examples and technical deep dives with best practices",
    
    # Backstory emphasises: years of experience, production-quality standards, security mindset
    backstory="""You are a principal engineer with 15+ years of experience in distributed systems
    and scalable architecture. You write production-grade code with comprehensive error handling,
    security considerations, and performance optimizations. Your code includes detailed comments
    and follows SOLID principles.""",
    
    llm=local_llm,
)

# ── Technical Writer Agent ────────────────────────────────────────────────────────────────
# Assigned to: explaining concepts, creating tutorials, simplifying complex topics
writer = Agent(
    role="Technical Writer",
    goal="Translate complex technical concepts into clear, accessible documentation",
    
    # Backstory emphasises: accessibility, junior developer mindset, concrete examples
    backstory="""You specialise in making advanced technical topics accessible to junior developers.
    You use real-world analogies, step-by-step instructions, code snippets, and concrete examples.
    You anticipate common questions and explain the 'why' behind technical decisions.""",
    
    llm=local_llm,
)

# ── Code Reviewer Agent ────────────────────────────────────────────────────────────────
# Assigned to: code review, quality assurance, best practices validation, edge case testing
reviewer = Agent(
    role="QA Engineer",
    goal="Ensure technical accuracy and catch edge cases",
    backstory="""You test everything. You find edge cases others miss. You ensure all 
    code examples actually work and all claims are technically sound.""",
    llm=local_llm,
)

# ── Tasks (no specific agent yet — manager will assign) ────────────────────
task1 = Task(
    description="""Write production-grade Python code for a rate limiter that uses 
    token buckets and works with async/await. Include comprehensive error handling.""",
    expected_output="Complete, tested Python code with docstrings and type hints",
    agent=code_expert,   # Manager can reassign, but this is the default
)

task2 = Task(
    description="""Explain the token bucket algorithm in simple terms. Use a water 
    bucket metaphor. Then explain why it's better than fixed-window rate limiting.""",
    expected_output="2-paragraph explanation with metaphor and comparison",
    agent=writer,
)

task3 = Task(
    description="""Review the code from task1 and the explanation from task2. Check for:
    1. Code correctness and edge cases
    2. Explanation accuracy  
    3. Any security issues
    Return a list of findings and fixes.""",
    expected_output="QA report with findings list and specific fixes",
    agent=reviewer,
    context=[task1, task2],  # Depends on other tasks' outputs
)

# ── Hierarchical Crew (manager coordinates workers) ────────────────────────
crew = Crew(
    agents=[code_expert, writer, reviewer],  # Workers
    tasks=[task1, task2, task3],
    manager_agent=manager,                   # Manager makes assignments
    process=Process.hierarchical,            # Manager delegates dynamically
    verbose=True,
)

# ── Run with manager coordination ──────────────────────────────────────────
result = crew.kickoff(
    inputs={
        "topic": "Rate limiting for REST APIs",
        "audience": "intermediate Python developers"
    }
)

print("\n" + "="*80)
print("HIERARCHICAL CREW OUTPUT")
print("="*80)
print(result.raw)

Key differences from sequential:

process=Process.hierarchical enables manager decision-making
manager_agent=manager specifies who makes assignments
Tasks don’t pre-assign agents — manager chooses based on task description and agent roles
Manager can reassign tasks if it thinks a different agent is better suited

Expected workflow:

[Manager]
  "I need code — assigning to code_expert"
  → [Code Expert] produces code
  "I need explanation — assigning to writer"
  → [Writer] produces explanation
  "I need review of both — assigning to reviewer"
  → [Reviewer] produces QA report

[Final Output]
Complete documentation with all three perspectives

When to use hierarchical:

Task requirements are complex and need human-like delegation decisions
Agent roles can handle overlapping domains (manager must choose)
Long task sequences where mid-course corrections needed
Multi-stage workflows (research → design → implementation → review)

Part 3: Custom Tool Integration

# custom_tools.py — add domain-specific tools to your agents
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
import requests, json

class WebSearchInput(BaseModel):
    query: str = Field(description="The search query")
    max_results: int = Field(default=5, description="Number of results to return")

class SovereignWebSearch(BaseTool):
    name: str = "Sovereign Web Search"
    description: str = "Search the web using a privacy-respecting search engine (SearXNG)"
    args_schema: type[BaseModel] = WebSearchInput

    def _run(self, query: str, max_results: int = 5) -> str:
        # SearXNG is self-hostable — see https://searxng.github.io/searxng/
        # For demo, using public instance
        try:
            response = requests.get(
                "https://searx.be/search",
                params={"q": query, "format": "json", "categories": "general"},
                timeout=5,
                headers={"User-Agent": "SovereignBot/1.0"}
            )
            results = response.json().get("results", [])[:max_results]
            return "\n\n".join(
                f"Title: {r['title']}\nURL: {r['url']}\nSnippet: {r.get('content', '')[:200]}"
                for r in results
            )
        except Exception as e:
            return f"Search failed: {e}"

# Use in an agent
search_tool = SovereignWebSearch()

research_agent = Agent(
    role="Web Researcher",
    goal="Research current information using web search",
    backstory="Expert researcher who finds accurate, current information from the web.",
    tools=[search_tool],
    llm=LLM(model="ollama/llama4:scout", base_url="http://localhost:11434"),
)

Part 4: Sovereignty Audit

python3 - << 'EOF'
import subprocess, threading, time
import crewai
from crewai import Agent, Task, Crew, LLM

external_connections = []

def monitor():
    for _ in range(20):
        r = subprocess.run(['ss','-tnp','state','established'],
                           capture_output=True, text=True)
        for line in r.stdout.splitlines():
            if 'python' in line and '127.0.0.1' not in line and '::1' not in line:
                external_connections.append(line)
        time.sleep(0.5)

t = threading.Thread(target=monitor, daemon=True)
t.start()

llm = LLM(model="ollama/qwen3:14b", base_url="http://localhost:11434")
agent = Agent(role="Test", goal="Test", backstory="Test agent", llm=llm)
task = Task(description="Say 'hello'", expected_output="The word hello", agent=agent)
crew = Crew(agents=[agent], tasks=[task])
crew.kickoff()

t.join(timeout=3)
if external_connections:
    print(f"External connections found: {external_connections}")
else:
    print("✓ Zero external connections — CrewAI + Ollama is fully sovereign")
EOF

Expected output:

✓ Zero external connections — CrewAI + Ollama is fully sovereign

Troubleshooting

`Agent stopped due to iteration limit`

Cause: max_iter reached — model is looping or can’t complete the task. Fix: Increase max_iter=10 in the Agent, or simplify the task description. With smaller models (7B), reduce task complexity.

Agents produce inconsistent output format

Cause: Smaller models don’t reliably follow the expected_output structure. Fix: Switch to llama4:scout or qwen3:14b. Add explicit format instructions to expected_output: “Return ONLY a JSON object with keys: findings, severity, recommendation”.

`Connection refused to Ollama`

Fix: Ensure Ollama is running: ollama serve. Check: curl http://localhost:11434/api/version.

Conclusion

CrewAI with local Ollama delivers multi-agent orchestration with zero cloud cost and full data sovereignty. The crew pattern — specialised agents with distinct roles, defined tasks with context passing, and tool integration — handles complex tasks that would overwhelm a single-agent approach.

See AI Agent Design Patterns 2026 for the underlying patterns CrewAI implements, and LangGraph Tutorial 2026 for a more code-level alternative to CrewAI’s higher-level abstraction.

Troubleshooting & Common Issues

Issue: `ConnectionError: Failed to connect to Ollama server`

Cause: Ollama not running or listening on wrong address.

# Fix: Start Ollama and verify
ollama serve                    # Start Ollama in one terminal
# In another terminal:
curl http://localhost:11434/api/tags   # Should return model list

Issue: `RuntimeError: No model specified in LLM config`

Cause: Model name doesn’t exist or wrong format.

# Fix: Check available models and use correct name
import subprocess
result = subprocess.run(['ollama', 'list'], capture_output=True, text=True)
print(result.stdout)  # Shows available models

# Correct format:
llm = LLM(model="ollama/llama2:7b", base_url="http://localhost:11434")

Issue: `Task took too long to execute (>300s timeout)`

Cause: LLM too slow or task too complex.

# Fix: Increase timeout or simplify task
task = Task(
    description="Simpler description with less context needed",
    expected_output="Concise output (1 paragraph, not 10)",
    timeout=600  # 10 minutes instead of 5
)

Issue: `Tool execution failed: Command not found`

Cause: Tool uses bash command not available on system.

# Fix: Check command availability
import subprocess
result = subprocess.run(['which', 'curl'], capture_output=True)
if result.returncode != 0:
    # Install: apt-get install curl
    print("curl not found")

# Or write cross-platform tool:
@tool("Get URL")
def fetch_url(url: str) -> str:
    import requests  # Pure Python, platform-agnostic
    return requests.get(url).text

Issue: `Agent stuck in loop repeating same task`

Cause: Tool feedback doesn’t change agent’s reasoning.

# Fix: Make tools return clear pass/fail signals
@tool("Check if done")
def is_complete(status: str) -> str:
    """Returns 'SUCCESS' or 'FAILURE: reason', not ambiguous output."""
    if status == "complete":
        return "SUCCESS: Task completed as requested"
    return "FAILURE: Not done yet. Try next step."

Quick Reference: CrewAI vs Sequential vs Hierarchical

Process Type	When to Use	Pros	Cons
Sequential	Tasks have clear order (research → write → review)	Simple, easy to understand	Slow (tasks must wait for prior completion)
Hierarchical	Complex projects with manager deciding task assignment	Flexible, manager adapts to results	More LLM calls, higher cost
Parallel	Multiple independent tasks (translate to 3 languages)	Fast, all agents work simultaneously	More complex output merging

Crew Architecture Decision Tree

What type of problem are you solving?
├─ Multi-step workflow with clear order
│  └─ Use Process.sequential (research → write → publish)
├─ Dynamic task assignment based on content
│  └─ Use Process.hierarchical (manager decides which agent)
├─ Multiple independent similar tasks
│  └─ Use Process.parallel (agents work simultaneously)
└─ Complex dependencies + custom logic
   └─ Use LangGraph instead (more control)

Common Agent Patterns

Pattern 1: Researcher → Writer → Reviewer (Sequential)

# Step 1: Research agent finds information
research_task = Task(description="Research topic X")

# Step 2: Writer creates content (depends on research)
write_task = Task(
    description="Write article from research",
    context=[research_task]  # Waits for research to complete
)

# Step 3: Reviewer checks quality (depends on writing)
review_task = Task(
    description="Review article for quality",
    context=[write_task]
)

crew = Crew(
    agents=[researcher, writer, reviewer],
    tasks=[research_task, write_task, review_task],
    process=Process.sequential
)

Pattern 2: Manager Assigns Tasks (Hierarchical)

# Manager decides who handles each task
manager_task = Task(
    description="Coordinate team to produce report",
    agent=manager  # Manager assigned, routes to team
)

# Worker tasks DON'T have specific agents pre-assigned
# Manager will assign based on task type
task1 = Task(description="Code the API module")
task2 = Task(description="Write API documentation")
task3 = Task(description="Test API endpoints")

crew = Crew(
    agents=[manager, dev, writer, tester],
    tasks=[task1, task2, task3],
    process=Process.hierarchical,
    manager_agent=manager
)

Pattern 3: Tool Feedback Loop (Agentic)

@tool("Execute code")
def run_code(code: str) -> str:
    # Return clear success/failure, not just output
    try:
        result = exec(code)
        return f"SUCCESS: Code executed. Output: {result}"
    except Exception as e:
        return f"FAILURE: {type(e).__name__}: {str(e)}"

# Agent sees clear signal, not confused output

Performance Optimization Tips

Tip 1: Reduce LLM Input Size

# ❌ Bad: Full 100,000-token context
task = Task(
    description=f"Analyze this data: {entire_dataset}"
)

# ✅ Good: Summarized context
task = Task(
    description="Analyze top 10 data points by importance"
)

Tip 2: Use Faster Models

# ❌ Slow: 70B parameter model
llm = LLM(model="ollama/llama2:70b")  # 5+ seconds per response

# ✅ Fast: Smaller model for simple tasks
llm = LLM(model="ollama/mistral:7b")  # 500ms per response

Tip 3: Cache Tool Results

tool_cache = {}

@tool("Get weather")
def get_weather(city: str) -> str:
    if city in tool_cache:
        return tool_cache[city]
    result = requests.get(...).json()
    tool_cache[city] = result
    return result

Frequently Asked Questions (FAQ)

Q: How much does it cost to run CrewAI vs ChatGPT API?

A: Local Ollama: ~$0 (runs on your hardware). OpenAI API: ~$0.10–$1 per task (depends on tokens used). For 1000 tasks: Ollama = free; OpenAI = $100–$1000.

Q: Can I use CrewAI with GPT-4 or Claude?

A: Yes. Replace the LLM:

llm = LLM(
    model="gpt-4",
    base_url="https://api.openai.com/v1",
    api_key=os.getenv("OPENAI_API_KEY")
)

Q: How do I debug why an agent isn’t using a tool?

A: Check verbose logs:

agent = Agent(verbose=True)  # Prints all reasoning steps
crew = Crew(..., verbose=True)  # Prints full execution trace

Q: Can agents collaborate across multiple machines?

A: CrewAI doesn’t have built-in distributed support. Use message queues (RabbitMQ, Redis) to coordinate agents across machines, or deploy entire crew on one machine and scale horizontally with multiple crew instances.

Q: What’s the maximum number of agents in a crew?

A: Technically unlimited, but practically:

3–5 agents: optimal (fast, manageable)
10+ agents: slow (context window fills, LLM confusion)
50+ agents: likely to fail (token limits, complexity)

Q: How do I prevent an agent from hallucinating?

A: Use tool-grounded workflow:

# ❌ Bad: LLM makes up data
task = Task(description="What is the weather in NYC?")

# ✅ Good: Agent must use weather tool
task = Task(
    description="Use weather tool to find NYC temperature",
    tools=[get_weather]  # Only available tool
)

Q: Can CrewAI handle real-time data streams?

A: Not natively. For streaming data, use async execution:

while True:
    new_data = fetch_latest_data()
    result = await crew.kickoff_async(inputs={"data": new_data})
    process_result(result)
    await asyncio.sleep(60)  # Check every minute

Q: What happens if a tool fails? Does the crew retry?

A: By default, agents see the error and adapt their next step. For automatic retry:

@tool("Unreliable API")
def call_api_with_retry(url: str, retries: int = 3) -> str:
    for attempt in range(retries):
        try:
            return requests.get(url).json()
        except:
            if attempt == retries - 1:
                return "ERROR: API unavailable after 3 retries"
            time.sleep(2 ** attempt)  # Exponential backoff

AI Agent Design Patterns 2026 — understand reflection, tool use, and planning patterns that CrewAI orchestrates
Best Open-Weight AI Models 2026 — choose the best local LLM for your crew agents
LangChain & LangGraph Local Agents — alternative multi-agent framework with similar sovereignty advantages

Private Document Q&A with pgvector: 100% Local RAG Pipeline 2026

>_ 17 Apr | 18 min | Dev Corner

🟡Intermediate

Build a fully local RAG pipeline in Python 2026. Ollama embeddings, pgvector 0.8 HNSW search, and Llama 4 Scout for document Q&A. No OpenAI. No cloud. Zero data leaves your machine.

By Marcus Thorne

Sovereign AI Agents 2026: A Complete Guide to Local, Offline Agentic Systems

>_ 17 May | 35 min | Dev Corner

🟡Intermediate

Learn to build, deploy, and orchestrate autonomous AI agents entirely on your infrastructure. Covers design patterns, multi-agent orchestration, model selection, and zero-cloud-dependency deployment for compliance-sensitive and privacy-focused teams.

By Kofi Mensah

AI Agent Design Patterns 2026: Reflection, Tool Use, Planning & Multi-Agent

>_ 13 May | 18 min | Dev Corner

🟡Intermediate

Build sovereign AI agents from first principles. Covers the four agentic design patterns — reflection, tool use, planning, and multi-agent — with Python code examples using local Ollama models.

By Kofi Mensah

#crewai #multi-agent #ollama #local-ai #python #ai-agents #dev-corner #2026

Key Takeaways

Introduction

Part 1: Environment Setup

Part 2: Building Agents with Local LLM

Part 2.5: Complete Hierarchical Crew with Manager Agent

Part 3: Custom Tool Integration

Part 4: Sovereignty Audit

Troubleshooting

Agent stopped due to iteration limit

Agents produce inconsistent output format

Connection refused to Ollama

Conclusion

People Also Ask

What is the difference between CrewAI and LangGraph?

Does CrewAI support async execution?

Troubleshooting & Common Issues

Issue: ConnectionError: Failed to connect to Ollama server

Issue: RuntimeError: No model specified in LLM config

Issue: Task took too long to execute (>300s timeout)

Issue: Tool execution failed: Command not found

Issue: Agent stuck in loop repeating same task

Quick Reference: CrewAI vs Sequential vs Hierarchical

Crew Architecture Decision Tree

Common Agent Patterns

Pattern 1: Researcher → Writer → Reviewer (Sequential)

Pattern 2: Manager Assigns Tasks (Hierarchical)

Pattern 3: Tool Feedback Loop (Agentic)

Performance Optimization Tips

Tip 1: Reduce LLM Input Size

Tip 2: Use Faster Models

Tip 3: Cache Tool Results

Frequently Asked Questions (FAQ)

Q: How much does it cost to run CrewAI vs ChatGPT API?

Q: Can I use CrewAI with GPT-4 or Claude?

Q: How do I debug why an agent isn’t using a tool?

Q: Can agents collaborate across multiple machines?

Q: What’s the maximum number of agents in a crew?

Q: How do I prevent an agent from hallucinating?

Q: Can CrewAI handle real-time data streams?

Q: What happens if a tool fails? Does the crew retry?

Related Vucense Guides

Further Reading

Further Reading

Private Document Q&A with pgvector: 100% Local RAG Pipeline 2026

Sovereign AI Agents 2026: A Complete Guide to Local, Offline Agentic Systems

AI Agent Design Patterns 2026: Reflection, Tool Use, Planning & Multi-Agent

The Sovereign Brief

You're in!

Comments

Recently Visited

`Agent stopped due to iteration limit`

`Connection refused to Ollama`

Issue: `ConnectionError: Failed to connect to Ollama server`

Issue: `RuntimeError: No model specified in LLM config`

Issue: `Task took too long to execute (>300s timeout)`

Issue: `Tool execution failed: Command not found`

Issue: `Agent stuck in loop repeating same task`