How do I export a trained model to run on mobile or in a browser?

YOLOv11 supports ONNX (cross-platform), TFLite (mobile), CoreML (iOS), TorchScript (PyTorch). Use: model.export(format='onnx'). Then deploy with ONNXRuntime, TFLite interpreter, or browser with WASM.

How do I extract frames from video and train on them?

Use OpenCV: cap = cv2.VideoCapture('video.mp4'); while True: ret, frame = cap.read(); if ret: cv2.imwrite(f'frame_{i}.jpg', frame); else: break. Extract every Nth frame to avoid redundancy. 10fps video = 1 frame every 100ms.

My classes are imbalanced (1000 images of class A, 100 of class B). What do I do?

3 solutions: 1) Data augmentation on class B, 2) Weighted loss in training (class_weight=inverse_frequency), 3) Oversampling class B. YOLOv11 supports weights in data.yaml: class_weights: [1.0, 10.0] (class B weighted 10x).

What's the minimum accuracy (mAP) considered 'good' for production?

mAP50 >= 0.80 is good, >= 0.90 is excellent. For critical applications (medical, safety), target >= 0.95. Measure actual FN rate: false negatives worse than false positives for most use cases.

Can I detect small objects smaller than 32 pixels?

Difficult with YOLOv11 alone because small objects lose detail in 32-stride. Solutions: 1) Increase imgsz to 1280 (slower), 2) Tile large images into smaller patches, 3) Use multi-scale training (mosaic augmentation enabled by default).

How do I monitor inference performance on edge devices?

Use FPS counter: fps = cv2.getTickFrequency() / (time.time() - start). Log latency per frame. For production: if fps < threshold, alert. Typical RTX 4090: 1000+ FPS for nano, 200+ for xlarge.

Is there a way to prevent overfitting on small datasets?

Yes: 1) Mosaic augmentation (enabled by default), 2) Reduce model size (nano vs xlarge), 3) Use higher dropout, 4) Early stopping (patience=20), 5) Transfer learning from COCO weights (not training from scratch).

How do I evaluate the model on my test set?

metrics = model.val(data='data.yaml', split='test'). Get mAP50, mAP75, precision, recall per class. Compare with baseline. If regression detected, retrain with different hyperparameters or more data.

Dev Corner Engineering Computer Vision

Computer Vision with YOLOv11 2026: Local Object Detection Pipeline

🟡Intermediate

Build sovereign computer vision pipelines with YOLOv11: dataset preparation, training, inference, OpenCV integration, and local CV deployment with zero cloud inference dependency.

Author

Kofi Mensah

Inference Economics & Hardware Architect

Published

May 15, 2026

Duration

Reading

22 min

Build

40 min (training: 2-4 hrs)

Computer Vision with YOLOv11 2026: Local Object Detection Pipeline

Article Roadmap

Key Takeaways

YOLOv11 (Ultralytics, 2025) achieves real-time object detection at 60+ FPS on a single RTX 4090 — 'from ultralytics import YOLO; model = YOLO("yolo11n.pt"); results = model("image.jpg")' runs inference in under 10ms per frame using the pre-trained COCO weights.
Training YOLOv11 on custom data requires three things: a dataset in YOLO format (images + label .txt files with normalised coordinates), a data.yaml defining classes and paths, and 'model.train(data="data.yaml", epochs=50, imgsz=640)' — the training loop handles augmentation, validation, and checkpointing automatically.
YOLO format labels use normalised coordinates: each line in the label file is 'class_id center_x center_y width height' where all values are in [0,1] relative to image dimensions. A bounding box for a cat at pixel (100, 50, 200, 150) in a 640×480 image becomes '15 0.234 0.208 0.156 0.208'.
Deploying YOLOv11 for real-time video uses OpenCV: 'cap = cv2.VideoCapture(0)' for webcam, 'ret, frame = cap.read()' in a loop, 'results = model(frame)' for inference, and 'annotated = results[0].plot()' draws bounding boxes directly on the frame.

Key Takeaways

Pre-trained COCO weights for quick start: YOLO("yolo11n.pt") detects 80 common object classes out of the box — no training needed for people, cars, animals, etc. For advanced computer vision, see Embedding Models 2026.
YOLO format for custom training: One .txt label file per image, one line per bounding box: class_id cx cy w h (all normalised 0–1). Use annotation tools (Roboflow, LabelImg, CVAT) to prepare your dataset.
model.train() handles everything: Augmentation, validation splits, learning rate scheduling, and checkpoint saving are automatic.
Export for deployment: model.export(format="onnx") or format="tflite" converts trained weights for edge deployment. See GGUF Quantization Explained for optimization techniques.

Introduction

Direct Answer: How do I use YOLOv11 for object detection on my own hardware in 2026?

Install with pip install ultralytics, load pre-trained weights YOLO("yolo11n.pt"), and run inference on images or video. For custom training, prepare dataset in YOLO format (images + normalized bounding box labels), create data.yaml, and run model.train(data="data.yaml", epochs=50). All training and inference runs locally without cloud APIs.

Can YOLOv11 run 100% offline?

Yes. Ultralytics YOLOv11 supports local CPU/GPU/MPS inference with zero network calls. Unlike AWS Rekognition or Google Vision, local YOLOv11 processes frames entirely on your hardware. No image data, no metadata, and no telemetry leave your machine.

The Vucense 2026 Computer Vision Sovereignty Index

Inference Method	Data Retention	Latency	Auditability	Sovereignty Score
Cloud Vision API	🔴 Logged & Retained	⚠️ 200-800ms	❌ Black-box	12/100
Hybrid (Local + Cloud Fallback)	🟡 Partial	🟢 50-150ms	⚠️ Partial	48/100
Local YOLOv11 (CPU/GPU)	🟢 100% On-Device	🟢 15-45ms	✅ Open weights	89/100
Air-Gapped Edge (Jetson/RPi)	🟢 Physically Isolated	🟢 30-60ms	✅ Verifiable pipeline	94/100

Why Local Computer Vision Matters

Cloud vision APIs (AWS Rekognition, Google Vision, Azure Computer Vision) offer convenience but require sending images to third-party servers. For sovereign deployments, running YOLOv11 locally is critical:

Use Case	Cloud API Risk	Local YOLO Benefit
Home security camera	Video streams to vendor cloud	Footage never leaves your network
Industrial inspection	Proprietary models + data retention	Full control over model + data
Medical imaging	HIPAA compliance complexity	Air-gapped inference possible
Privacy-sensitive analysis	Metadata harvesting, facial recognition	Detection results stay local

Key principle: If the image contains people, locations, or sensitive assets, local inference is the only way to guarantee the data never leaves your control.

Part 1: Installation and Setup

YOLOv11 is available via Ultralytics. The nano model (yolo11n.pt) is best for CPU inference; larger models require GPU.

pip install ultralytics --break-system-packages
python3 -c "from ultralytics import YOLO; print('Ultralytics version:', YOLO.__module__.split('.')[0])"

# 01_quick_start.py — run pre-trained YOLO on an image
from ultralytics import YOLO
from pathlib import Path

# Load pre-trained YOLOv11 (downloads ~6MB on first run)
# Variants: yolo11n (nano), yolo11s (small), yolo11m (medium), yolo11l (large), yolo11x (xlarge)
model = YOLO("yolo11n.pt")   # Nano: fastest, smallest

# Inference on a single image
results = model("https://ultralytics.com/images/bus.jpg")   # or local path

# Parse results
for result in results:
    for box in result.boxes:
        cls = model.names[int(box.cls)]
        conf = float(box.conf)
        x1, y1, x2, y2 = map(int, box.xyxy[0])
        print(f"  {cls}: {conf:.2f}  at [{x1},{y1},{x2},{y2}]")

# Save annotated image
results[0].save(filename="detected.jpg")
print("Saved: detected.jpg")

Expected output:

  person: 0.94  at [54, 192, 244, 756]
  person: 0.87  at [310, 155, 501, 750]
  bus:    0.82  at [5, 228, 640, 756]
  person: 0.72  at [478, 208, 564, 748]
Saved: detected.jpg

Part 2: Custom Dataset Training

Transfer learning from COCO-pretrained weights significantly accelerates training on custom data. The model already understands object shapes, textures, and spatial relationships. You only need to fine-tune for your specific classes and domain.

# 02_batch_inference.py — process multiple images efficiently
from ultralytics import YOLO
from pathlib import Path
import time

model = YOLO("yolo11n.pt")

# Warmup (first inference is slower due to CUDA initialisation)
model("warmup.jpg", verbose=False)

# Batch process all images in a directory
image_dir = Path("/tmp/images")
images = list(image_dir.glob("*.jpg")) + list(image_dir.glob("*.png"))

start = time.perf_counter()
results = model(
    [str(img) for img in images],
    batch=16,        # Process 16 images simultaneously
    device="cuda",
    conf=0.5,        # Minimum confidence threshold
    iou=0.45,        # IoU threshold for NMS
    verbose=False
)
elapsed = time.perf_counter() - start

print(f"Processed {len(images)} images in {elapsed:.2f}s ({len(images)/elapsed:.0f} img/s)")

# Aggregate results
detection_counts = {}
for result in results:
    for box in result.boxes:
        cls = model.names[int(box.cls)]
        detection_counts[cls] = detection_counts.get(cls, 0) + 1

print("\nDetection summary:")
for cls, count in sorted(detection_counts.items(), key=lambda x: -x[1])[:10]:
    print(f"  {cls}: {count}")

Expected output (RTX 4090):

Processed 500 images in 8.3s (60 img/s)

Detection summary:
  person: 847
  car: 312
  bicycle: 89
  truck: 67

Part 2.5: Annotation Tools for Custom Datasets

Before training on custom data, you need to annotate (label) images with bounding boxes. Here’s a comparison of the best tools:

Tool	Cost	Ease of Use	YOLO Format Export	Team Support	Speed per Image	Best For
Roboflow	Free / $50/mo	⭐⭐⭐⭐⭐	✓ Direct	Teams	~2-3 min	Small teams, quick iteration
Labelimg (LabelImg)	Free	⭐⭐⭐	✓ Manual convert	Solo	~3-5 min	Single annotator, low budget
CVAT (Computer Vision Annotation Tool)	Free (self-host)	⭐⭐⭐⭐	✓ Plugin	Teams	~2 min	Enterprise, self-hosted
Makesense.ai	Free	⭐⭐⭐⭐	✓ Yes	Solo/Small	~2 min	Quick browser-based labeling
Supervisely	Freemium	⭐⭐⭐⭐	✓ Yes	Teams	~1-2 min	Professional CV teams
Label Studio	Free (self-host)	⭐⭐⭐⭐	✓ Yes	Teams	~2 min	Custom workflows, privacy

Recommended workflow for 100-500 images:

Option 1: Roboflow (Fastest for small projects)

# 1. Upload images to Roboflow Web UI
# 2. Draw bounding boxes in browser
# 3. Export in YOLO format
# 4. Roboflow automatically creates train/val split and applies augmentation

# Download and extract
unzip roboflow-dataset.zip
cd roboflow-dataset

# Ready to train immediately
python3 train.py --data data.yaml --epochs 50

Pros: Zero setup, browser-based, automatic data augmentation
Cons: Cloud-dependent, free tier limited to 3 models/month

Option 2: LabelImg (Free, self-contained)

# Install
pip install labelimg

# Run GUI
labelimg custom_dataset/images/train

# Draw boxes and save — creates .xml files
# Convert XML to YOLO format
python3 - << 'EOF'
import xml.etree.ElementTree as ET
from pathlib import Path

def xml_to_yolo(xml_file, img_width, img_height):
    tree = ET.parse(xml_file)
    root = tree.getroot()
    
    yolo_lines = []
    for obj in root.findall('object'):
        class_name = obj.find('name').text
        class_id = 0  # Change based on your classes
        
        bbox = obj.find('bndbox')
        xmin = float(bbox.find('xmin').text)
        ymin = float(bbox.find('ymin').text)
        xmax = float(bbox.find('xmax').text)
        ymax = float(bbox.find('ymax').text)
        
        cx = (xmin + xmax) / 2 / img_width
        cy = (ymin + ymax) / 2 / img_height
        w = (xmax - xmin) / img_width
        h = (ymax - ymin) / img_height
        
        yolo_lines.append(f"{class_id} {cx:.6f} {cy:.6f} {w:.6f} {h:.6f}")
    
    txt_file = str(xml_file).replace('.xml', '.txt')
    Path(txt_file).write_text('\n'.join(yolo_lines))

# Convert all XMLs
for xml in Path('custom_dataset/images/train').glob('*.xml'):
    xml_to_yolo(xml, 640, 480)
EOF

Pros: Completely offline, no account needed
Cons: Manual XML→YOLO conversion, no augmentation

Option 3: CVAT (Best for teams, self-hosted privacy)

# Docker-based self-hosted setup
docker run -d -p 8080:8080 \
  -e DJANGO_SU_NAME=admin \
  -e DJANGO_SU_EMAIL=admin@localhost \
  -e DJANGO_SU_PASSWORD=admin \
  cvat/cvat:latest

# Access at http://localhost:8080
# Create project, upload images, annotate in browser
# Export as YOLO format directly

Pros: Enterprise-grade, full control, team collaboration
Cons: Higher setup complexity, requires Docker

Recommendation:

100 images, quick prototype: Roboflow (fastest)
500+ images, privacy critical: CVAT self-hosted (full control)
Solo, zero cost: LabelImg (works offline)

Part 3: Custom Dataset Training

# Directory structure for custom training
mkdir -p custom_dataset/{images,labels}/{train,val}

# data.yaml — dataset configuration
cat > custom_dataset/data.yaml << 'EOF'
path: /home/user/custom_dataset   # Absolute path to dataset root
train: images/train
val:   images/val

nc: 3   # Number of classes
names: ["cat", "dog", "bird"]
EOF

# Create YOLO format labels from your annotations
# YOLO format: class_id cx cy w h (all normalised to [0, 1])

def convert_bbox_to_yolo(img_width: int, img_height: int,
                          x1: int, y1: int, x2: int, y2: int,
                          class_id: int) -> str:
    """Convert pixel bounding box to YOLO normalised format."""
    cx = ((x1 + x2) / 2) / img_width
    cy = ((y1 + y2) / 2) / img_height
    w = (x2 - x1) / img_width
    h = (y2 - y1) / img_height
    return f"{class_id} {cx:.6f} {cy:.6f} {w:.6f} {h:.6f}"

# Example: cat at pixels (100, 50, 300, 250) in 640x480 image
line = convert_bbox_to_yolo(640, 480, 100, 50, 300, 250, class_id=0)
print(line)   # → 0 0.312500 0.312500 0.312500 0.416667

# 03_training.py — Train YOLOv11 on Custom Dataset using Transfer Learning
# Transfer learning: fine-tune pre-trained weights instead of training from scratch
# Benefit: converges 10× faster, requires fewer images (100+ instead of 1000+)

from ultralytics import YOLO

# ══════════════════════════════════════════════════════════════════════════════════════════════
# Load Pre-trained Model (Transfer Learning)
# ══════════════════════════════════════════════════════════════════════════════════════════════

# YOLO("yolo11n.pt"): load YOLOv11 Nano with weights from Coco pre-training
# "yolo11n": nano size — lightweight (smallest accuracy, fastest inference)
# Alternatives: yolo11s (small), yolo11m (medium), yolo11l (large), yolo11x (extra-large)
# Trade-off: Nano=18M params, fastest; Large=93M params, most accurate
model = YOLO("yolo11n.pt")

# ══════════════════════════════════════════════════════════════════════════════════════════════
# Training Configuration — Hyperparameters Optimized for Custom Data
# ══════════════════════════════════════════════════════════════════════════════════════════════

results = model.train(
    # ── Dataset Configuration ──────────────────────────────────────────────────────────────
    data="custom_dataset/data.yaml",  # Path to data.yaml (train/val paths, class names)
    
    # ── Training Duration ──────────────────────────────────────────────────────────────────
    epochs=50,              # Number of times to iterate over entire dataset
    # Each epoch: load all images, compute loss, backprop, update weights
    # 50 epochs typical for 100-500 images; increase to 100+ for larger datasets
    
    # ── Image Preprocessing ────────────────────────────────────────────────────────────────
    imgsz=640,              # Input image size for training (640×640 pixels)
    # Ultralytics auto-splits to 32-pixel multiples: 640 = 20×20 grid for predictions
    # Larger (832): more detail, slower; smaller (416): faster, less detail
    
    batch=16,               # Images per batch (batch size)
    # Memory requirement: roughly batch_size × 2 GB for RTX 3090
    # 16 = ~32 GB VRAM; reduce to 8 for RTX 3080, 4 for RTX 2080
    # Recommended: use largest batch that fits in GPU memory
    
    device="cuda",          # Device to train on: "cuda" (NVIDIA), "mps" (Apple Silicon), "cpu"
    # GPU training 50–100× faster than CPU; CUDA availability: python -c "import torch; print(torch.cuda.is_available())"
    
    # ── Learning Rate Schedule ────────────────────────────────────────────────────────────
    lr0=1e-3,               # Initial learning rate (0.001)
    # Learning rate controls step size for weight updates
    # Too high: training diverges (loss explodes)
    # Too low: training stalls (loss plateaus)
    # Typical range: 1e-4 to 1e-2
    
    lrf=1e-2,               # Final learning rate multiplier (0.01)
    # End learning rate = lr0 × lrf = 0.001 × 0.01 = 0.00001
    # Cosine annealing: learning rate decreases smoothly over epochs
    # Helps model converge to local minima
    
    # ── Regularization (Prevent Overfitting) ────────────────────────────────────────────
    weight_decay=5e-4,      # L2 regularization penalty (0.0005)
    # Penalizes large weights; encourages simpler, more generalizable model
    # Typical values: 1e-5 to 1e-3
    
    augment=True,           # Enable data augmentation
    # Augmentation techniques applied each epoch:
    # - Mosaic: combine 4 images into one (increases effective dataset size)
    # - Mixup: blend two images together (smoother transitions, better robustness)
    # - Rotations, flips, color jitter (invariance to lighting, orientation changes)
    
    # ── Early Stopping (Prevent Overfitting) ────────────────────────────────────────────
    patience=20,            # Stop training if validation mAP doesn't improve for 20 epochs
    # Prevents overfitting: training loss decreases, but validation loss increases
    # Example: best mAP at epoch 30, no improvement by epoch 50 → stop at epoch 50
    
    # ── Checkpointing ──────────────────────────────────────────────────────────────────
    save_period=10,         # Save checkpoint every 10 epochs
    # Allows resuming if interrupted: yolo train resume=runs/custom_yolo11/weights/last.pt
    # Best model automatically saved as best.pt
    
    # ── Output Directory ───────────────────────────────────────────────────────────────
    project="runs",         # Root directory for output
    name="custom_yolo11"    # Subdirectory: runs/custom_yolo11/
    # Final structure: runs/custom_yolo11/weights/{best,last}.pt, runs/custom_yolo11/results.csv
)

# ══════════════════════════════════════════════════════════════════════════════════════════════
# Training Results Analysis
# ══════════════════════════════════════════════════════════════════════════════════════════════

# Extract performance metrics from training results
# mAP50(B): mean Average Precision at 0.5 IoU threshold
# IoU (Intersection over Union): overlap between predicted and ground-truth boxes
# mAP50 > 0.85 is good; > 0.90 is excellent for custom datasets
print(f"Best mAP50: {results.results_dict['metrics/mAP50(B)']:.4f}")

# Path to best model (lowest validation loss)
# Use this for inference, not the last checkpoint (which may be overfit)
print(f"Best model: runs/custom_yolo11/weights/best.pt")

Expected output (during training):

Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
  1/50     21.4G     1.2341     2.3847     1.1234         47        640
 10/50     21.4G     0.8231     1.2347     0.9123         52        640
 50/50     21.4G     0.5123     0.7234     0.7891         61        640
                 Class     Images  Instances      Box(P          R      mAP50
                   all       200        847      0.891      0.843      0.872

Part 4: Real-Time Video Detection

# 04_webcam_detection.py — live object detection from webcam
import cv2
from ultralytics import YOLO
import time

# Load your trained model (or use pre-trained)
model = YOLO("yolo11n.pt")

cap = cv2.VideoCapture(0)   # 0 = default webcam; or video file path
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)

fps_tracker = time.time()
frame_count = 0

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Inference
    results = model(frame, conf=0.5, verbose=False)

    # Annotate frame with bounding boxes
    annotated = results[0].plot()

    # Add FPS counter
    frame_count += 1
    if frame_count % 30 == 0:
        fps = 30 / (time.time() - fps_tracker)
        fps_tracker = time.time()
        print(f"FPS: {fps:.1f}")
    cv2.putText(annotated, f"YOLOv11 | Local GPU", (10, 30),
                cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)

    cv2.imshow("YOLOv11 Detection", annotated)

    if cv2.waitKey(1) & 0xFF == ord("q"):
        break

cap.release()
cv2.destroyAllWindows()

Expected output (terminal):

FPS: 62.3
FPS: 64.1
FPS: 61.8

62+ FPS on RTX 4090 with YOLOv11n — real-time detection with headroom.

Part 5: Export for Edge Deployment

# 05_export.py — export to various formats
model = YOLO("runs/custom_yolo11/weights/best.pt")

# ONNX (runs on any hardware via ONNX Runtime)
model.export(format="onnx", imgsz=640, simplify=True)

# TensorRT (fastest on NVIDIA GPUs)
model.export(format="engine", imgsz=640, half=True)   # FP16

# TFLite (for Raspberry Pi / mobile)
model.export(format="tflite", imgsz=320)   # Smaller resolution for edge

# Run ONNX inference
from ultralytics import YOLO
onnx_model = YOLO("runs/custom_yolo11/weights/best.onnx")
results = onnx_model("test.jpg")

Sovereign Deployment: Local Inference Endpoint

For production computer vision workloads, expose YOLOv11 inference through a private API endpoint that runs 100% on your infrastructure:

# local_inference.py
from fastapi import FastAPI, UploadFile, File
from ultralytics import YOLO
import cv2
import numpy as np
import io
from PIL import Image

app = FastAPI(title="Sovereign CV Endpoint", version="1.0.0")
model = YOLO("yolo11n.pt", task="detect")  # Loads once at startup, stays in memory

@app.post("/api/detect")
async def detect(file: UploadFile = File(...)):
    """Run YOLO inference on uploaded image. Results never leave this server."""
    # Read image from upload
    contents = await file.read()
    img_array = np.frombuffer(contents, np.uint8)
    img = cv2.imdecode(img_array, cv2.IMREAD_COLOR)
    
    # Inference on CPU or GPU (auto-detect)
    results = model(img, verbose=False, device="cpu")  # device="mps" or "cuda" if available
    
    # Extract detections
    detections = []
    for box in results[0].boxes:
        detections.append({
            "class": model.names[int(box.cls.item())],
            "confidence": float(box.conf.item()),
            "bbox": box.xyxy[0].tolist()
        })
    
    return {"detections": detections, "num_objects": len(detections)}

@app.get("/health")
async def health():
    """Health check endpoint."""
    return {"status": "ok", "model": "yolo11n", "inference_device": "cpu"}

# Run locally only (no network exposure by default)
if __name__ == "__main__":
    import uvicorn
    # Bind to localhost only — production access via Tailscale/VPN
    uvicorn.run(app, host="127.0.0.1", port=8000)

Usage (local machine only):

pip install fastapi uvicorn
python local_inference.py

# In another terminal
curl -X POST "http://127.0.0.1:8000/api/detect" -F "[email protected]"
# Returns: {"detections": [...], "num_objects": 3}

Sovereign Design: The endpoint binds to 127.0.0.1 (not 0.0.0.0), so it’s inaccessible from the network. For remote access, use Tailscale VPN (see Docker Networking 2026) to securely route inference requests without exposing the API to the public internet.

PII Protection: Frame Sanitization Before Storage

If your CV pipeline logs, archives, or trains on frames, you must strip identifiable data before write. This is a legal requirement for GDPR/CCPA compliance and a sovereignty best practice.

# sanitize_frames.py
import cv2
import numpy as np
from ultralytics import YOLO

def blur_faces_and_licenses(frame, results, blur_strength=51):
    """
    Blur detected faces and license plates before storage.
    Sovereignty rule: If the image contains people, locations, or vehicles, 
    redact before archival.
    """
    for box in results[0].boxes:
        x1, y1, x2, y2 = map(int, box.xyxy[0])
        
        # Class 0 = person (in COCO, adjust for your custom dataset)
        if int(box.cls.item()) == 0:
            roi = frame[y1:y2, x1:x2]
            frame[y1:y2, x1:x2] = cv2.GaussianBlur(roi, (blur_strength, blur_strength), 30)
        
        # Class 2 = car (license plate inside, blur entire vehicle if desired)
        if int(box.cls.item()) == 2:
            roi = frame[y1:y2, x1:x2]
            # Blur only upper 15% (where license plate likely is)
            plate_height = int((y2 - y1) * 0.15)
            frame[y1:y1+plate_height, x1:x2] = cv2.GaussianBlur(
                roi[:plate_height], (99, 99), 30
            )
    
    return frame

# Example: process video and save sanitized frames
model = YOLO("yolo11n.pt")
cap = cv2.VideoCapture("security_camera.mp4")

frame_count = 0
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    
    results = model(frame, verbose=False)
    sanitized = blur_faces_and_licenses(frame, results)
    
    # Save sanitized frame to archive (safe for logs, training, etc.)
    cv2.imwrite(f"archive/sanitized_frame_{frame_count}.jpg", sanitized)
    frame_count += 1
    
    if frame_count % 100 == 0:
        print(f"Processed {frame_count} frames")

cap.release()
print(f"Total frames archived (sanitized): {frame_count}")

Key principle: Blur faces, license plates, and location identifiers before any storage, logging, or training. This ensures your CV pipeline respects user privacy and complies with regulations. For security cameras on your property, this is still best practice — it protects visitors and guests.

YOLOv11 provides the full computer vision pipeline: pre-trained COCO detection in 3 lines, custom dataset training with automatic augmentation, real-time webcam inference at 60+ FPS, and export to ONNX/TensorRT for production. Everything runs locally — no cloud inference API, no per-image cost.

Sovereign Deployment Tip: Combine YOLOv11 with Ollama for multimodal reasoning: “Detect objects with YOLO, then ask a local LLM to explain what it sees — all on-device.” See CrewAI + Ollama 2026 for orchestrating local AI agents.

See Embedding Models 2026 for training custom neural network architectures from scratch, and Edge Computing Guide 2026 for hardware selection.

Troubleshooting & Common Issues

Issue: `CUDA out of memory` during training

Cause: Batch size too large for GPU VRAM.

# Fix: Reduce batch size
# In code: batch=8 instead of batch=16
model.train(data="data.yaml", batch=8, imgsz=640)

# Or reduce image size
model.train(data="data.yaml", batch=16, imgsz=416)

Issue: `FileNotFoundError: data.yaml not found`

Cause: data.yaml path incorrect or missing.

# Fix: Verify file exists and path is correct
ls -la custom_dataset/data.yaml
# Should show the file; if not, create it:
cat > custom_dataset/data.yaml << 'EOF'
path: /full/path/to/custom_dataset
train: images/train
val: images/val
nc: 3
names: ["cat", "dog", "bird"]
EOF

Issue: `mAP stuck at 0.0 after 50 epochs`

Cause: Dataset format wrong or labels invalid.

# Fix: Validate dataset format
# YOLO format: .txt files with: class_id cx cy w h (0-1 normalized)
# Verify: images/train/img1.jpg has images/train/img1.txt with same name
ls images/train/*.jpg | wc -l  # Should match:
ls images/train/*.txt | wc -l

# Check label format:
head -1 images/train/*.txt  # Should show: 0 0.5 0.5 0.3 0.4

Issue: `Low mAP (0.4–0.5) even with 1000 images`

Cause: Images too blurry, inconsistent classes, or poor labeling.

# Fix: Inspect dataset quality
# 1. Check images are clear (not blurry, right resolution)
# 2. Verify labels are correct (spot-check 10 images)
# 3. Check for class imbalance (some classes rare)

# For class imbalance, try weighted loss:
model.train(data="data.yaml", batch=16, weights_loss_weight=[1.0, 2.0, 0.5])

Issue: `Inference slow (1 FPS) on edge device`

Cause: Model too large or device has low GPU.

# Fix: Use smaller model variant
model = YOLO("yolo11n.pt")  # Nano: 80+ FPS on RTX 4090

# Or quantize model
from ultralytics import YOLO
model = YOLO("yolo11n.pt")
model.export(format="onnx", half=True)  # FP16 precision: 2× faster, minimal accuracy loss

Issue: `Model overfitting: training loss decreases but validation mAP plateaus`

Cause: Not enough training data or regularization.

# Fix: Increase regularization
model.train(
    data="data.yaml",
    epochs=50,
    weight_decay=1e-3,  # Increase from default 5e-4
    augment=True,
    mosaic=1.0,  # Enable mosaic augmentation
    mixup=0.1    # Enable mixup
)

Model Selection Decision Tree

What's your primary goal?
├─ Real-time inference (>30 FPS required)
│  └─ Use YOLOv11n (nano)
├─ Balanced speed + accuracy
│  └─ Use YOLOv11s or YOLOv11m (small/medium)
├─ Maximum accuracy, speed not critical
│  └─ Use YOLOv11l or YOLOv11x (large/xlarge)
└─ Edge device (mobile, Raspberry Pi)
   └─ Use YOLOv11n with quantization (INT8)

Quick Reference: Dataset Preparation Checklist

✅ Images in images/train and images/val folders
✅ Labels in labels/train and labels/val folders (matching filenames)
✅ Each image has a corresponding .txt label file
✅ Label format: class_id cx cy w h (all 0-1 normalized)
✅ data.yaml with correct paths and class names
✅ 70–80% images in train, 20–30% in val
✅ No images in both train and val (prevents data leakage)
✅ All classes represented in train set (no missing classes)

Annotation Workflow Comparison

Workflow	Speed	Cost	Quality	Best For
Roboflow (cloud)	Fast (2 min/img)	Free/Paid	Good (auto-augment)	Small teams, quick iteration
LabelImg (desktop)	Medium (3–5 min/img)	Free	Good	Solo developers, privacy
CVAT (self-hosted)	Medium (2–3 min/img)	Free	Excellent	Teams, enterprise
Freelancers (MTurk)	Slow (cost: $0.10–0.50/img)	Medium–High	Varies	Large datasets (1000+ images)

Frequently Asked Questions (FAQ)

Q: What’s the difference between YOLOv11 and YOLOv10?

A: YOLOv11 (2026) is 2–5% more accurate and slightly faster than YOLOv10 (2024). Use YOLOv11 for new projects. Upgrade from v10 only if accuracy improvement matters.

Q: Can I train YOLOv11 on CPU?

A: Yes, but expect 10–50× slower training. On CPU: ~1 epoch/minute. On RTX 4090: ~1 epoch/second. Use CPU only for testing; switch to GPU for production training.

Q: How do I export YOLOv11 to run on phone/browser?

A: Three options:

# ONNX (runs on CPU, cross-platform)
model.export(format="onnx")

# TensorFlow Lite (mobile)
model.export(format="tflite")

# NCNN (fast inference, mobile/edge)
model.export(format="ncnn")

Q: Can I train on video directly instead of images?

A: YOLOv11 trains on images, not video. Extract frames from video first:

import cv2
cap = cv2.VideoCapture("video.mp4")
frame_id = 0
while cap.isOpened():
    ret, frame = cap.read()
    if not ret: break
    cv2.imwrite(f"frames/frame_{frame_id:06d}.jpg", frame)
    frame_id += 1

Q: How do I handle class imbalance (some classes rare)?

A: Use weighted loss:

# If class 0 is rare (50 images) vs class 1 (500 images)
# Set weight: rare class = 10, common class = 1
model.train(data="data.yaml", class_weights=[10, 1, 1])

Q: What’s the minimum number of classes I can train?

A: 1 class is possible (detect “cat” anywhere in image). YOLOv11 handles single-class well. Multi-class (5+) yields better results due to learned feature diversity.

Q: Can I detect small objects (<50 pixels)?

A: YOLOv11 struggles with very small objects. Options:

Use larger input: imgsz=1280 instead of 640 (4× more computation)
Tile images: Split large images into 640×640 tiles, detect, merge results
Use different model: Faster R-CNN better for small objects

Q: How do I use YOLOv11 for video surveillance 24/7?

A: Use the async inference pattern:

import cv2, threading
from ultralytics import YOLO

model = YOLO("best.pt")
cap = cv2.VideoCapture("rtsp://camera_url")

def process_frames():
    while True:
        ret, frame = cap.read()
        if not ret: break
        results = model(frame)  # Non-blocking inference
        display(results)

thread = threading.Thread(target=process_frames, daemon=True)
thread.start()

Q: What’s the cost to train a custom YOLOv11 vs cloud services?

A: Local training: ~$1 electricity (on RTX 4090). Cloud (AWS/Google): $5–50/hour GPU rental. 50-epoch training on cloud: ~$10–50. Local is cheaper if you already own GPU.

Q: How do I evaluate model performance beyond mAP?

A: Use confusion matrix:

results = model.val()  # Validation
print(results.confusion_matrix)  # Shows misclassified pairs
# Example: "dog" confused as "cat" → may need more dog examples

YOLOv11 on Raspberry Pi & Jetson Nano 2026 — edge deployment for real-time inference
Sovereign AI Agents Hub 2026 — use YOLOv11 as a tool in multi-agent systems
CrewAI + Ollama: Local Multi-Agent Orchestration — multimodal reasoning with local models
Best Open-Weight AI Models 2026 — optimize model selection for vision tasks

YOLOv11 on Raspberry Pi 5 & Jetson Nano 2026: Edge AI Object Detection

>_ 17 May | 26 min | Dev Corner

🟡Intermediate

Deploy YOLOv11 object detection on edge hardware (Raspberry Pi 5, Jetson Nano, Orange Pi) for real-time inference without cloud APIs. Covers model quantization, hardware optimization, privacy-first deployments for wildlife monitoring, garden sensors, and edge security cameras.

By Kofi Mensah

AI Agent Security 2026: Prompt Injection, Tool Permissions & Sandboxing

>_ 28 Apr | 18 min | Dev Corner

🟡Intermediate

Secure agentic AI systems: prompt injection defence, tool permission scoping, human-in-the-loop approval gates, agent audit logging, and sandboxed code execution.

By Divya Prakash

Apache Security Hardening Guide 2026: Headers, mod_security & Rate Limiting

>_ 13 May | 15 min | Dev Corner

🟡Intermediate

Harden Apache HTTP Server against common attacks on Ubuntu 24.04. Covers mod_security WAF, ServerTokens config, directory traversal blocking, mod_evasive rate limiting, and security headers.

By Divya Prakash

#yolo #yolov11 #computer-vision #object-detection #opencv #pytorch #dev-corner #2026

Key Takeaways

Introduction

Can YOLOv11 run 100% offline?

The Vucense 2026 Computer Vision Sovereignty Index

Why Local Computer Vision Matters

Part 1: Installation and Setup

Part 2: Custom Dataset Training

Part 2.5: Annotation Tools for Custom Datasets

Option 1: Roboflow (Fastest for small projects)

Option 2: LabelImg (Free, self-contained)

Option 3: CVAT (Best for teams, self-hosted privacy)

Recommendation:

Part 3: Custom Dataset Training

Part 4: Real-Time Video Detection

Part 5: Export for Edge Deployment

Sovereign Deployment: Local Inference Endpoint

PII Protection: Frame Sanitization Before Storage

People Also Ask

What is the difference between YOLOv11 nano, small, medium, large, and xlarge?

How many images do I need to train a custom YOLOv11 model?

Troubleshooting & Common Issues

Issue: CUDA out of memory during training

Issue: FileNotFoundError: data.yaml not found

Issue: mAP stuck at 0.0 after 50 epochs

Issue: Low mAP (0.4–0.5) even with 1000 images

Issue: Inference slow (1 FPS) on edge device

Issue: Model overfitting: training loss decreases but validation mAP plateaus

Model Selection Decision Tree

Quick Reference: Dataset Preparation Checklist

Annotation Workflow Comparison

Frequently Asked Questions (FAQ)

Q: What’s the difference between YOLOv11 and YOLOv10?

Q: Can I train YOLOv11 on CPU?

Q: How do I export YOLOv11 to run on phone/browser?

Q: Can I train on video directly instead of images?

Q: How do I handle class imbalance (some classes rare)?

Q: What’s the minimum number of classes I can train?

Q: Can I detect small objects (<50 pixels)?

Q: How do I use YOLOv11 for video surveillance 24/7?

Q: What’s the cost to train a custom YOLOv11 vs cloud services?

Q: How do I evaluate model performance beyond mAP?

Related Vucense Guides

Further Reading

Further Reading

YOLOv11 on Raspberry Pi 5 & Jetson Nano 2026: Edge AI Object Detection

AI Agent Security 2026: Prompt Injection, Tool Permissions & Sandboxing

Apache Security Hardening Guide 2026: Headers, mod_security & Rate Limiting

The Sovereign Brief

You're in!

Comments

Recently Visited

Issue: `CUDA out of memory` during training

Issue: `FileNotFoundError: data.yaml not found`

Issue: `mAP stuck at 0.0 after 50 epochs`

Issue: `Low mAP (0.4–0.5) even with 1000 images`

Issue: `Inference slow (1 FPS) on edge device`

Issue: `Model overfitting: training loss decreases but validation mAP plateaus`