Vucense
TOPIC

llama.cpp & GGUF

Maximum sovereignty with llama.cpp: compile from source, GGUF model formats, quantisation levels (Q4_K_M, Q8_0), CLI inference, server mode, and performance tuning for CPU and GPU.

Total articles

3

Featured build

llama.cpp Tutorial 2026: Run GGUF Models Locally on CPU and GPU

Featured build

All articles

3 Articles