Dev Corner category

Local AI & On-Device Inference

Build, deploy, and optimise AI that runs entirely on local hardware. This category covers sovereign model deployment, inference performance, and edge-friendly AI stacks.

Topic breadth

Active builds, guides, and subtopic coverage.

Subtopics

Ollama

View topic

Run open-weight LLMs locally with Ollama 5.x: model pulling, Modelfile customisation, REST API, GPU acceleration, multi-model management, and zero-cloud inference verification.

llama.cpp & GGUF

View topic

Maximum sovereignty with llama.cpp: compile from source, GGUF model formats, quantisation levels (Q4_K_M, Q8_0), CLI inference, server mode, and performance tuning for CPU and GPU.

On-Device Inference

View topic

Run AI models entirely on local hardware: Apple Silicon with MLX framework, NVIDIA CUDA with TensorRT, and AMD ROCm. Covers memory requirements, throughput benchmarks, and chip selection.

Local AI Stack Builds

View topic

End-to-end sovereign AI stack builds: Ollama + Open WebUI + pgvector + LangChain running 100% on-device. Full companion repo, tested code, and sovereignty audit included.

Benchmarking Local Models

View topic

Benchmark sovereign local LLM performance: tokens/second, memory usage, first-token latency, and quality metrics across Ollama models on Apple Silicon, NVIDIA, and AMD hardware.

Subtopics

Ollama

llama.cpp & GGUF

On-Device Inference

Local AI Stack Builds

Benchmarking Local Models

The Sovereign Brief

You're in!