Key Takeaways
- Pre-trained COCO weights for quick start:
YOLO("yolo11n.pt")detects 80 common object classes out of the box — no training needed for people, cars, animals, etc. For advanced computer vision, see Embedding Models 2026. - YOLO format for custom training: One
.txtlabel file per image, one line per bounding box:class_id cx cy w h(all normalised 0–1). Use annotation tools (Roboflow, LabelImg, CVAT) to prepare your dataset. model.train()handles everything: Augmentation, validation splits, learning rate scheduling, and checkpoint saving are automatic.- Export for deployment:
model.export(format="onnx")orformat="tflite"converts trained weights for edge deployment. See GGUF Quantization Explained for optimization techniques.
Introduction
Direct Answer: How do I use YOLOv11 for object detection on my own hardware in 2026?
Install with pip install ultralytics, load pre-trained weights YOLO("yolo11n.pt"), and run inference on images or video. For custom training, prepare dataset in YOLO format (images + normalized bounding box labels), create data.yaml, and run model.train(data="data.yaml", epochs=50). All training and inference runs locally without cloud APIs.
Can YOLOv11 run 100% offline?
Yes. Ultralytics YOLOv11 supports local CPU/GPU/MPS inference with zero network calls. Unlike AWS Rekognition or Google Vision, local YOLOv11 processes frames entirely on your hardware. No image data, no metadata, and no telemetry leave your machine.
The Vucense 2026 Computer Vision Sovereignty Index
| Inference Method | Data Retention | Latency | Auditability | Sovereignty Score |
|---|---|---|---|---|
| Cloud Vision API | 🔴 Logged & Retained | ⚠️ 200-800ms | ❌ Black-box | 12/100 |
| Hybrid (Local + Cloud Fallback) | 🟡 Partial | 🟢 50-150ms | ⚠️ Partial | 48/100 |
| Local YOLOv11 (CPU/GPU) | 🟢 100% On-Device | 🟢 15-45ms | ✅ Open weights | 89/100 |
| Air-Gapped Edge (Jetson/RPi) | 🟢 Physically Isolated | 🟢 30-60ms | ✅ Verifiable pipeline | 94/100 |
Why Local Computer Vision Matters
Cloud vision APIs (AWS Rekognition, Google Vision, Azure Computer Vision) offer convenience but require sending images to third-party servers. For sovereign deployments, running YOLOv11 locally is critical:
| Use Case | Cloud API Risk | Local YOLO Benefit |
|---|---|---|
| Home security camera | Video streams to vendor cloud | Footage never leaves your network |
| Industrial inspection | Proprietary models + data retention | Full control over model + data |
| Medical imaging | HIPAA compliance complexity | Air-gapped inference possible |
| Privacy-sensitive analysis | Metadata harvesting, facial recognition | Detection results stay local |
Key principle: If the image contains people, locations, or sensitive assets, local inference is the only way to guarantee the data never leaves your control.
Part 1: Installation and Setup
YOLOv11 is available via Ultralytics. The nano model (yolo11n.pt) is best for CPU inference; larger models require GPU.
pip install ultralytics --break-system-packages
python3 -c "from ultralytics import YOLO; print('Ultralytics version:', YOLO.__module__.split('.')[0])"
# 01_quick_start.py — run pre-trained YOLO on an image
from ultralytics import YOLO
from pathlib import Path
# Load pre-trained YOLOv11 (downloads ~6MB on first run)
# Variants: yolo11n (nano), yolo11s (small), yolo11m (medium), yolo11l (large), yolo11x (xlarge)
model = YOLO("yolo11n.pt") # Nano: fastest, smallest
# Inference on a single image
results = model("https://ultralytics.com/images/bus.jpg") # or local path
# Parse results
for result in results:
for box in result.boxes:
cls = model.names[int(box.cls)]
conf = float(box.conf)
x1, y1, x2, y2 = map(int, box.xyxy[0])
print(f" {cls}: {conf:.2f} at [{x1},{y1},{x2},{y2}]")
# Save annotated image
results[0].save(filename="detected.jpg")
print("Saved: detected.jpg")
Expected output:
person: 0.94 at [54, 192, 244, 756]
person: 0.87 at [310, 155, 501, 750]
bus: 0.82 at [5, 228, 640, 756]
person: 0.72 at [478, 208, 564, 748]
Saved: detected.jpg
Part 2: Custom Dataset Training
Transfer learning from COCO-pretrained weights significantly accelerates training on custom data. The model already understands object shapes, textures, and spatial relationships. You only need to fine-tune for your specific classes and domain.
# 02_batch_inference.py — process multiple images efficiently
from ultralytics import YOLO
from pathlib import Path
import time
model = YOLO("yolo11n.pt")
# Warmup (first inference is slower due to CUDA initialisation)
model("warmup.jpg", verbose=False)
# Batch process all images in a directory
image_dir = Path("/tmp/images")
images = list(image_dir.glob("*.jpg")) + list(image_dir.glob("*.png"))
start = time.perf_counter()
results = model(
[str(img) for img in images],
batch=16, # Process 16 images simultaneously
device="cuda",
conf=0.5, # Minimum confidence threshold
iou=0.45, # IoU threshold for NMS
verbose=False
)
elapsed = time.perf_counter() - start
print(f"Processed {len(images)} images in {elapsed:.2f}s ({len(images)/elapsed:.0f} img/s)")
# Aggregate results
detection_counts = {}
for result in results:
for box in result.boxes:
cls = model.names[int(box.cls)]
detection_counts[cls] = detection_counts.get(cls, 0) + 1
print("\nDetection summary:")
for cls, count in sorted(detection_counts.items(), key=lambda x: -x[1])[:10]:
print(f" {cls}: {count}")
Expected output (RTX 4090):
Processed 500 images in 8.3s (60 img/s)
Detection summary:
person: 847
car: 312
bicycle: 89
truck: 67
Part 2.5: Annotation Tools for Custom Datasets
Before training on custom data, you need to annotate (label) images with bounding boxes. Here’s a comparison of the best tools:
| Tool | Cost | Ease of Use | YOLO Format Export | Team Support | Speed per Image | Best For |
|---|---|---|---|---|---|---|
| Roboflow | Free / $50/mo | ⭐⭐⭐⭐⭐ | ✓ Direct | Teams | ~2-3 min | Small teams, quick iteration |
| Labelimg (LabelImg) | Free | ⭐⭐⭐ | ✓ Manual convert | Solo | ~3-5 min | Single annotator, low budget |
| CVAT (Computer Vision Annotation Tool) | Free (self-host) | ⭐⭐⭐⭐ | ✓ Plugin | Teams | ~2 min | Enterprise, self-hosted |
| Makesense.ai | Free | ⭐⭐⭐⭐ | ✓ Yes | Solo/Small | ~2 min | Quick browser-based labeling |
| Supervisely | Freemium | ⭐⭐⭐⭐ | ✓ Yes | Teams | ~1-2 min | Professional CV teams |
| Label Studio | Free (self-host) | ⭐⭐⭐⭐ | ✓ Yes | Teams | ~2 min | Custom workflows, privacy |
Recommended workflow for 100-500 images:
Option 1: Roboflow (Fastest for small projects)
# 1. Upload images to Roboflow Web UI
# 2. Draw bounding boxes in browser
# 3. Export in YOLO format
# 4. Roboflow automatically creates train/val split and applies augmentation
# Download and extract
unzip roboflow-dataset.zip
cd roboflow-dataset
# Ready to train immediately
python3 train.py --data data.yaml --epochs 50
Pros: Zero setup, browser-based, automatic data augmentation
Cons: Cloud-dependent, free tier limited to 3 models/month
Option 2: LabelImg (Free, self-contained)
# Install
pip install labelimg
# Run GUI
labelimg custom_dataset/images/train
# Draw boxes and save — creates .xml files
# Convert XML to YOLO format
python3 - << 'EOF'
import xml.etree.ElementTree as ET
from pathlib import Path
def xml_to_yolo(xml_file, img_width, img_height):
tree = ET.parse(xml_file)
root = tree.getroot()
yolo_lines = []
for obj in root.findall('object'):
class_name = obj.find('name').text
class_id = 0 # Change based on your classes
bbox = obj.find('bndbox')
xmin = float(bbox.find('xmin').text)
ymin = float(bbox.find('ymin').text)
xmax = float(bbox.find('xmax').text)
ymax = float(bbox.find('ymax').text)
cx = (xmin + xmax) / 2 / img_width
cy = (ymin + ymax) / 2 / img_height
w = (xmax - xmin) / img_width
h = (ymax - ymin) / img_height
yolo_lines.append(f"{class_id} {cx:.6f} {cy:.6f} {w:.6f} {h:.6f}")
txt_file = str(xml_file).replace('.xml', '.txt')
Path(txt_file).write_text('\n'.join(yolo_lines))
# Convert all XMLs
for xml in Path('custom_dataset/images/train').glob('*.xml'):
xml_to_yolo(xml, 640, 480)
EOF
Pros: Completely offline, no account needed
Cons: Manual XML→YOLO conversion, no augmentation
Option 3: CVAT (Best for teams, self-hosted privacy)
# Docker-based self-hosted setup
docker run -d -p 8080:8080 \
-e DJANGO_SU_NAME=admin \
-e DJANGO_SU_EMAIL=admin@localhost \
-e DJANGO_SU_PASSWORD=admin \
cvat/cvat:latest
# Access at http://localhost:8080
# Create project, upload images, annotate in browser
# Export as YOLO format directly
Pros: Enterprise-grade, full control, team collaboration
Cons: Higher setup complexity, requires Docker
Recommendation:
- 100 images, quick prototype: Roboflow (fastest)
- 500+ images, privacy critical: CVAT self-hosted (full control)
- Solo, zero cost: LabelImg (works offline)
Part 3: Custom Dataset Training
# Directory structure for custom training
mkdir -p custom_dataset/{images,labels}/{train,val}
# data.yaml — dataset configuration
cat > custom_dataset/data.yaml << 'EOF'
path: /home/user/custom_dataset # Absolute path to dataset root
train: images/train
val: images/val
nc: 3 # Number of classes
names: ["cat", "dog", "bird"]
EOF
# Create YOLO format labels from your annotations
# YOLO format: class_id cx cy w h (all normalised to [0, 1])
def convert_bbox_to_yolo(img_width: int, img_height: int,
x1: int, y1: int, x2: int, y2: int,
class_id: int) -> str:
"""Convert pixel bounding box to YOLO normalised format."""
cx = ((x1 + x2) / 2) / img_width
cy = ((y1 + y2) / 2) / img_height
w = (x2 - x1) / img_width
h = (y2 - y1) / img_height
return f"{class_id} {cx:.6f} {cy:.6f} {w:.6f} {h:.6f}"
# Example: cat at pixels (100, 50, 300, 250) in 640x480 image
line = convert_bbox_to_yolo(640, 480, 100, 50, 300, 250, class_id=0)
print(line) # → 0 0.312500 0.312500 0.312500 0.416667
# 03_training.py — Train YOLOv11 on Custom Dataset using Transfer Learning
# Transfer learning: fine-tune pre-trained weights instead of training from scratch
# Benefit: converges 10× faster, requires fewer images (100+ instead of 1000+)
from ultralytics import YOLO
# ══════════════════════════════════════════════════════════════════════════════════════════════
# Load Pre-trained Model (Transfer Learning)
# ══════════════════════════════════════════════════════════════════════════════════════════════
# YOLO("yolo11n.pt"): load YOLOv11 Nano with weights from Coco pre-training
# "yolo11n": nano size — lightweight (smallest accuracy, fastest inference)
# Alternatives: yolo11s (small), yolo11m (medium), yolo11l (large), yolo11x (extra-large)
# Trade-off: Nano=18M params, fastest; Large=93M params, most accurate
model = YOLO("yolo11n.pt")
# ══════════════════════════════════════════════════════════════════════════════════════════════
# Training Configuration — Hyperparameters Optimized for Custom Data
# ══════════════════════════════════════════════════════════════════════════════════════════════
results = model.train(
# ── Dataset Configuration ──────────────────────────────────────────────────────────────
data="custom_dataset/data.yaml", # Path to data.yaml (train/val paths, class names)
# ── Training Duration ──────────────────────────────────────────────────────────────────
epochs=50, # Number of times to iterate over entire dataset
# Each epoch: load all images, compute loss, backprop, update weights
# 50 epochs typical for 100-500 images; increase to 100+ for larger datasets
# ── Image Preprocessing ────────────────────────────────────────────────────────────────
imgsz=640, # Input image size for training (640×640 pixels)
# Ultralytics auto-splits to 32-pixel multiples: 640 = 20×20 grid for predictions
# Larger (832): more detail, slower; smaller (416): faster, less detail
batch=16, # Images per batch (batch size)
# Memory requirement: roughly batch_size × 2 GB for RTX 3090
# 16 = ~32 GB VRAM; reduce to 8 for RTX 3080, 4 for RTX 2080
# Recommended: use largest batch that fits in GPU memory
device="cuda", # Device to train on: "cuda" (NVIDIA), "mps" (Apple Silicon), "cpu"
# GPU training 50–100× faster than CPU; CUDA availability: python -c "import torch; print(torch.cuda.is_available())"
# ── Learning Rate Schedule ────────────────────────────────────────────────────────────
lr0=1e-3, # Initial learning rate (0.001)
# Learning rate controls step size for weight updates
# Too high: training diverges (loss explodes)
# Too low: training stalls (loss plateaus)
# Typical range: 1e-4 to 1e-2
lrf=1e-2, # Final learning rate multiplier (0.01)
# End learning rate = lr0 × lrf = 0.001 × 0.01 = 0.00001
# Cosine annealing: learning rate decreases smoothly over epochs
# Helps model converge to local minima
# ── Regularization (Prevent Overfitting) ────────────────────────────────────────────
weight_decay=5e-4, # L2 regularization penalty (0.0005)
# Penalizes large weights; encourages simpler, more generalizable model
# Typical values: 1e-5 to 1e-3
augment=True, # Enable data augmentation
# Augmentation techniques applied each epoch:
# - Mosaic: combine 4 images into one (increases effective dataset size)
# - Mixup: blend two images together (smoother transitions, better robustness)
# - Rotations, flips, color jitter (invariance to lighting, orientation changes)
# ── Early Stopping (Prevent Overfitting) ────────────────────────────────────────────
patience=20, # Stop training if validation mAP doesn't improve for 20 epochs
# Prevents overfitting: training loss decreases, but validation loss increases
# Example: best mAP at epoch 30, no improvement by epoch 50 → stop at epoch 50
# ── Checkpointing ──────────────────────────────────────────────────────────────────
save_period=10, # Save checkpoint every 10 epochs
# Allows resuming if interrupted: yolo train resume=runs/custom_yolo11/weights/last.pt
# Best model automatically saved as best.pt
# ── Output Directory ───────────────────────────────────────────────────────────────
project="runs", # Root directory for output
name="custom_yolo11" # Subdirectory: runs/custom_yolo11/
# Final structure: runs/custom_yolo11/weights/{best,last}.pt, runs/custom_yolo11/results.csv
)
# ══════════════════════════════════════════════════════════════════════════════════════════════
# Training Results Analysis
# ══════════════════════════════════════════════════════════════════════════════════════════════
# Extract performance metrics from training results
# mAP50(B): mean Average Precision at 0.5 IoU threshold
# IoU (Intersection over Union): overlap between predicted and ground-truth boxes
# mAP50 > 0.85 is good; > 0.90 is excellent for custom datasets
print(f"Best mAP50: {results.results_dict['metrics/mAP50(B)']:.4f}")
# Path to best model (lowest validation loss)
# Use this for inference, not the last checkpoint (which may be overfit)
print(f"Best model: runs/custom_yolo11/weights/best.pt")
Expected output (during training):
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/50 21.4G 1.2341 2.3847 1.1234 47 640
10/50 21.4G 0.8231 1.2347 0.9123 52 640
50/50 21.4G 0.5123 0.7234 0.7891 61 640
Class Images Instances Box(P R mAP50
all 200 847 0.891 0.843 0.872
Part 4: Real-Time Video Detection
# 04_webcam_detection.py — live object detection from webcam
import cv2
from ultralytics import YOLO
import time
# Load your trained model (or use pre-trained)
model = YOLO("yolo11n.pt")
cap = cv2.VideoCapture(0) # 0 = default webcam; or video file path
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)
fps_tracker = time.time()
frame_count = 0
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# Inference
results = model(frame, conf=0.5, verbose=False)
# Annotate frame with bounding boxes
annotated = results[0].plot()
# Add FPS counter
frame_count += 1
if frame_count % 30 == 0:
fps = 30 / (time.time() - fps_tracker)
fps_tracker = time.time()
print(f"FPS: {fps:.1f}")
cv2.putText(annotated, f"YOLOv11 | Local GPU", (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)
cv2.imshow("YOLOv11 Detection", annotated)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
Expected output (terminal):
FPS: 62.3
FPS: 64.1
FPS: 61.8
62+ FPS on RTX 4090 with YOLOv11n — real-time detection with headroom.
Part 5: Export for Edge Deployment
# 05_export.py — export to various formats
model = YOLO("runs/custom_yolo11/weights/best.pt")
# ONNX (runs on any hardware via ONNX Runtime)
model.export(format="onnx", imgsz=640, simplify=True)
# TensorRT (fastest on NVIDIA GPUs)
model.export(format="engine", imgsz=640, half=True) # FP16
# TFLite (for Raspberry Pi / mobile)
model.export(format="tflite", imgsz=320) # Smaller resolution for edge
# Run ONNX inference
from ultralytics import YOLO
onnx_model = YOLO("runs/custom_yolo11/weights/best.onnx")
results = onnx_model("test.jpg")
Sovereign Deployment: Local Inference Endpoint
For production computer vision workloads, expose YOLOv11 inference through a private API endpoint that runs 100% on your infrastructure:
# local_inference.py
from fastapi import FastAPI, UploadFile, File
from ultralytics import YOLO
import cv2
import numpy as np
import io
from PIL import Image
app = FastAPI(title="Sovereign CV Endpoint", version="1.0.0")
model = YOLO("yolo11n.pt", task="detect") # Loads once at startup, stays in memory
@app.post("/api/detect")
async def detect(file: UploadFile = File(...)):
"""Run YOLO inference on uploaded image. Results never leave this server."""
# Read image from upload
contents = await file.read()
img_array = np.frombuffer(contents, np.uint8)
img = cv2.imdecode(img_array, cv2.IMREAD_COLOR)
# Inference on CPU or GPU (auto-detect)
results = model(img, verbose=False, device="cpu") # device="mps" or "cuda" if available
# Extract detections
detections = []
for box in results[0].boxes:
detections.append({
"class": model.names[int(box.cls.item())],
"confidence": float(box.conf.item()),
"bbox": box.xyxy[0].tolist()
})
return {"detections": detections, "num_objects": len(detections)}
@app.get("/health")
async def health():
"""Health check endpoint."""
return {"status": "ok", "model": "yolo11n", "inference_device": "cpu"}
# Run locally only (no network exposure by default)
if __name__ == "__main__":
import uvicorn
# Bind to localhost only — production access via Tailscale/VPN
uvicorn.run(app, host="127.0.0.1", port=8000)
Usage (local machine only):
pip install fastapi uvicorn
python local_inference.py
# In another terminal
curl -X POST "http://127.0.0.1:8000/api/detect" -F "[email protected]"
# Returns: {"detections": [...], "num_objects": 3}
Sovereign Design: The endpoint binds to
127.0.0.1(not0.0.0.0), so it’s inaccessible from the network. For remote access, use Tailscale VPN (see Docker Networking 2026) to securely route inference requests without exposing the API to the public internet.
PII Protection: Frame Sanitization Before Storage
If your CV pipeline logs, archives, or trains on frames, you must strip identifiable data before write. This is a legal requirement for GDPR/CCPA compliance and a sovereignty best practice.
# sanitize_frames.py
import cv2
import numpy as np
from ultralytics import YOLO
def blur_faces_and_licenses(frame, results, blur_strength=51):
"""
Blur detected faces and license plates before storage.
Sovereignty rule: If the image contains people, locations, or vehicles,
redact before archival.
"""
for box in results[0].boxes:
x1, y1, x2, y2 = map(int, box.xyxy[0])
# Class 0 = person (in COCO, adjust for your custom dataset)
if int(box.cls.item()) == 0:
roi = frame[y1:y2, x1:x2]
frame[y1:y2, x1:x2] = cv2.GaussianBlur(roi, (blur_strength, blur_strength), 30)
# Class 2 = car (license plate inside, blur entire vehicle if desired)
if int(box.cls.item()) == 2:
roi = frame[y1:y2, x1:x2]
# Blur only upper 15% (where license plate likely is)
plate_height = int((y2 - y1) * 0.15)
frame[y1:y1+plate_height, x1:x2] = cv2.GaussianBlur(
roi[:plate_height], (99, 99), 30
)
return frame
# Example: process video and save sanitized frames
model = YOLO("yolo11n.pt")
cap = cv2.VideoCapture("security_camera.mp4")
frame_count = 0
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
results = model(frame, verbose=False)
sanitized = blur_faces_and_licenses(frame, results)
# Save sanitized frame to archive (safe for logs, training, etc.)
cv2.imwrite(f"archive/sanitized_frame_{frame_count}.jpg", sanitized)
frame_count += 1
if frame_count % 100 == 0:
print(f"Processed {frame_count} frames")
cap.release()
print(f"Total frames archived (sanitized): {frame_count}")
Key principle: Blur faces, license plates, and location identifiers before any storage, logging, or training. This ensures your CV pipeline respects user privacy and complies with regulations. For security cameras on your property, this is still best practice — it protects visitors and guests.
YOLOv11 provides the full computer vision pipeline: pre-trained COCO detection in 3 lines, custom dataset training with automatic augmentation, real-time webcam inference at 60+ FPS, and export to ONNX/TensorRT for production. Everything runs locally — no cloud inference API, no per-image cost.
Sovereign Deployment Tip: Combine YOLOv11 with Ollama for multimodal reasoning: “Detect objects with YOLO, then ask a local LLM to explain what it sees — all on-device.” See CrewAI + Ollama 2026 for orchestrating local AI agents.
See Embedding Models 2026 for training custom neural network architectures from scratch, and Edge Computing Guide 2026 for hardware selection.
People Also Ask
What is the difference between YOLOv11 nano, small, medium, large, and xlarge?
The variants trade accuracy for speed and memory: YOLOv11n (nano, 2.6M params) runs at 80+ FPS on an RTX 4090 and is best for real-time applications with modest accuracy needs. YOLOv11s (small) and YOLOv11m (medium) are the balanced choices for most production applications. YOLOv11l (large) and YOLOv11x (xlarge) maximize accuracy at the cost of speed. Start with YOLOv11n for real-time requirements, YOLOv11s for balanced performance, and YOLOv11m if accuracy is the priority.
How many images do I need to train a custom YOLOv11 model?
For transfer learning (fine-tuning from COCO pre-trained weights): 100–500 images per class is typically sufficient for common-looking objects. For unusual or highly specific objects: 500–2000 images per class. YOLOv11’s data augmentation (mosaic, mixup, flips, colour jitter) effectively multiplies your dataset 10–20×, reducing the data requirement significantly. Start with 100 images per class, train, and evaluate — add more data only if accuracy is insufficient.
Troubleshooting & Common Issues
Issue: CUDA out of memory during training
Cause: Batch size too large for GPU VRAM.
# Fix: Reduce batch size
# In code: batch=8 instead of batch=16
model.train(data="data.yaml", batch=8, imgsz=640)
# Or reduce image size
model.train(data="data.yaml", batch=16, imgsz=416)
Issue: FileNotFoundError: data.yaml not found
Cause: data.yaml path incorrect or missing.
# Fix: Verify file exists and path is correct
ls -la custom_dataset/data.yaml
# Should show the file; if not, create it:
cat > custom_dataset/data.yaml << 'EOF'
path: /full/path/to/custom_dataset
train: images/train
val: images/val
nc: 3
names: ["cat", "dog", "bird"]
EOF
Issue: mAP stuck at 0.0 after 50 epochs
Cause: Dataset format wrong or labels invalid.
# Fix: Validate dataset format
# YOLO format: .txt files with: class_id cx cy w h (0-1 normalized)
# Verify: images/train/img1.jpg has images/train/img1.txt with same name
ls images/train/*.jpg | wc -l # Should match:
ls images/train/*.txt | wc -l
# Check label format:
head -1 images/train/*.txt # Should show: 0 0.5 0.5 0.3 0.4
Issue: Low mAP (0.4–0.5) even with 1000 images
Cause: Images too blurry, inconsistent classes, or poor labeling.
# Fix: Inspect dataset quality
# 1. Check images are clear (not blurry, right resolution)
# 2. Verify labels are correct (spot-check 10 images)
# 3. Check for class imbalance (some classes rare)
# For class imbalance, try weighted loss:
model.train(data="data.yaml", batch=16, weights_loss_weight=[1.0, 2.0, 0.5])
Issue: Inference slow (1 FPS) on edge device
Cause: Model too large or device has low GPU.
# Fix: Use smaller model variant
model = YOLO("yolo11n.pt") # Nano: 80+ FPS on RTX 4090
# Or quantize model
from ultralytics import YOLO
model = YOLO("yolo11n.pt")
model.export(format="onnx", half=True) # FP16 precision: 2× faster, minimal accuracy loss
Issue: Model overfitting: training loss decreases but validation mAP plateaus
Cause: Not enough training data or regularization.
# Fix: Increase regularization
model.train(
data="data.yaml",
epochs=50,
weight_decay=1e-3, # Increase from default 5e-4
augment=True,
mosaic=1.0, # Enable mosaic augmentation
mixup=0.1 # Enable mixup
)
Model Selection Decision Tree
What's your primary goal?
├─ Real-time inference (>30 FPS required)
│ └─ Use YOLOv11n (nano)
├─ Balanced speed + accuracy
│ └─ Use YOLOv11s or YOLOv11m (small/medium)
├─ Maximum accuracy, speed not critical
│ └─ Use YOLOv11l or YOLOv11x (large/xlarge)
└─ Edge device (mobile, Raspberry Pi)
└─ Use YOLOv11n with quantization (INT8)
Quick Reference: Dataset Preparation Checklist
- ✅ Images in
images/trainandimages/valfolders - ✅ Labels in
labels/trainandlabels/valfolders (matching filenames) - ✅ Each image has a corresponding
.txtlabel file - ✅ Label format:
class_id cx cy w h(all 0-1 normalized) - ✅
data.yamlwith correct paths and class names - ✅ 70–80% images in train, 20–30% in val
- ✅ No images in both train and val (prevents data leakage)
- ✅ All classes represented in train set (no missing classes)
Annotation Workflow Comparison
| Workflow | Speed | Cost | Quality | Best For |
|---|---|---|---|---|
| Roboflow (cloud) | Fast (2 min/img) | Free/Paid | Good (auto-augment) | Small teams, quick iteration |
| LabelImg (desktop) | Medium (3–5 min/img) | Free | Good | Solo developers, privacy |
| CVAT (self-hosted) | Medium (2–3 min/img) | Free | Excellent | Teams, enterprise |
| Freelancers (MTurk) | Slow (cost: $0.10–0.50/img) | Medium–High | Varies | Large datasets (1000+ images) |
Frequently Asked Questions (FAQ)
Q: What’s the difference between YOLOv11 and YOLOv10?
A: YOLOv11 (2026) is 2–5% more accurate and slightly faster than YOLOv10 (2024). Use YOLOv11 for new projects. Upgrade from v10 only if accuracy improvement matters.
Q: Can I train YOLOv11 on CPU?
A: Yes, but expect 10–50× slower training. On CPU: ~1 epoch/minute. On RTX 4090: ~1 epoch/second. Use CPU only for testing; switch to GPU for production training.
Q: How do I export YOLOv11 to run on phone/browser?
A: Three options:
# ONNX (runs on CPU, cross-platform)
model.export(format="onnx")
# TensorFlow Lite (mobile)
model.export(format="tflite")
# NCNN (fast inference, mobile/edge)
model.export(format="ncnn")
Q: Can I train on video directly instead of images?
A: YOLOv11 trains on images, not video. Extract frames from video first:
import cv2
cap = cv2.VideoCapture("video.mp4")
frame_id = 0
while cap.isOpened():
ret, frame = cap.read()
if not ret: break
cv2.imwrite(f"frames/frame_{frame_id:06d}.jpg", frame)
frame_id += 1
Q: How do I handle class imbalance (some classes rare)?
A: Use weighted loss:
# If class 0 is rare (50 images) vs class 1 (500 images)
# Set weight: rare class = 10, common class = 1
model.train(data="data.yaml", class_weights=[10, 1, 1])
Q: What’s the minimum number of classes I can train?
A: 1 class is possible (detect “cat” anywhere in image). YOLOv11 handles single-class well. Multi-class (5+) yields better results due to learned feature diversity.
Q: Can I detect small objects (<50 pixels)?
A: YOLOv11 struggles with very small objects. Options:
- Use larger input:
imgsz=1280instead of 640 (4× more computation) - Tile images: Split large images into 640×640 tiles, detect, merge results
- Use different model: Faster R-CNN better for small objects
Q: How do I use YOLOv11 for video surveillance 24/7?
A: Use the async inference pattern:
import cv2, threading
from ultralytics import YOLO
model = YOLO("best.pt")
cap = cv2.VideoCapture("rtsp://camera_url")
def process_frames():
while True:
ret, frame = cap.read()
if not ret: break
results = model(frame) # Non-blocking inference
display(results)
thread = threading.Thread(target=process_frames, daemon=True)
thread.start()
Q: What’s the cost to train a custom YOLOv11 vs cloud services?
A: Local training: ~$1 electricity (on RTX 4090). Cloud (AWS/Google): $5–50/hour GPU rental. 50-epoch training on cloud: ~$10–50. Local is cheaper if you already own GPU.
Q: How do I evaluate model performance beyond mAP?
A: Use confusion matrix:
results = model.val() # Validation
print(results.confusion_matrix) # Shows misclassified pairs
# Example: "dog" confused as "cat" → may need more dog examples
Related Vucense Guides
- YOLOv11 on Raspberry Pi & Jetson Nano 2026 — edge deployment for real-time inference
- Sovereign AI Agents Hub 2026 — use YOLOv11 as a tool in multi-agent systems
- CrewAI + Ollama: Local Multi-Agent Orchestration — multimodal reasoning with local models
- Best Open-Weight AI Models 2026 — optimize model selection for vision tasks
Further Reading
Vucense Guides
- PyTorch Deep Learning Guide 2026 — train custom architectures from scratch
- Machine Learning Fundamentals 2026 — traditional ML for structured data
- Edge Computing Guide 2026 — hardware options for model deployment
- MLOps Guide 2026 — annotation and labeling workflows for training datasets
Official Documentation & Frameworks
- Ultralytics YOLOv11 Documentation — official YOLOv11 framework and models
- YOLOv11 GitHub Repository — source code and pre-trained weights
- PyTorch Official Documentation — deep learning framework; v2.5.x
- CUDA Toolkit Documentation — GPU acceleration; CUDA 12.4
- OpenCV Documentation — image processing library for preprocessing/post-processing
- Hugging Face Model Hub — pre-trained YOLOv11 and custom models
Annotation & Labeling Tools
- Roboflow Annotation Platform — browser-based labeling and dataset management
- LabelImg GitHub — open-source desktop annotation tool
- CVAT (Computer Vision Annotation Tool) — self-hosted annotation platform
- Makesense.ai — browser-based annotation for images
- Supervisely Platform — collaborative annotation and training
- Label Studio — multi-modal annotation platform
Benchmarks & Deployment
- COCO Dataset — benchmark dataset for object detection
- OpenImages Dataset — large-scale labeled image dataset
- MLflow Model Registry — track and manage model versions
- TensorRT Inference — optimise YOLOv11 for production inference
Tested on: Ubuntu 24.04 LTS (RTX 4090, CUDA 12.4). Ultralytics 8.3.42, PyTorch 2.5.1. Last verified: May 16, 2026.