Marcus Thorne

Local-First AI Infrastructure Engineer

Marcus Thorne is an AI infrastructure engineer focused on optimizing large language models and multimodal AI for on-device deployment without cloud dependencies. With an MSc in machine learning and 7+ years architecting production inference pipelines, Marcus specializes in quantization techniques, ONNX runtime optimization, and efficient model serving on commodity hardware. His expertise spans Llama, Gemma, and other open models, with deep knowledge of techniques like 4-bit quantization, low-rank adaptation (LoRA), and flash attention. Marcus has optimized inference performance across CPU, GPU, and NPU targets, making privacy-first AI accessible on edge devices. At Vucense, Marcus writes about practical on-device AI deployment, inference optimization, and building truly private AI applications that never send data to external servers.

17 Articles

Dirty Frag CVE-2026-43284: Linux Privilege Escalation Vulnerability Actively Exploited

12 May | 19 min read | guides-security

Dirty Frag CVE-2026-43284 and CVE-2026-43500: Linux kernel LPE exploiting memory fragmentation. Active exploitation, affected distributions, mitigation patches, and defense strategies

By Marcus Thorne

Mythos vs. Cyber: Security Model Restrictions & Vendor

30 Apr | 8 min read | guides-security

Mythos and Cyber are advanced vulnerability-detection AI security models. Anthropic and OpenAI restrict access claiming responsible AI, but gatekeeping…

By Marcus Thorne

Fine-Tuning LLMs with QLoRA and Unsloth 2026: Local Training Guide

>_ 22 Apr | 22 min | Dev Corner

🔴Advanced

Fine-tune large language models locally with QLoRA and Unsloth on Ubuntu 24.04 in 2026. Covers dataset preparation, LoRA configuration, training on RTX 4090, evaluation, GGUF export, and Ollama deployment.

By Marcus Thorne

Private Document Q&A with pgvector: 100% Local RAG Pipeline 2026

>_ 17 Apr | 18 min | Dev Corner

🟡Intermediate

Build a fully local RAG pipeline in Python 2026. Ollama embeddings, pgvector 0.8 HNSW search, and Llama 4 Scout for document Q&A. No OpenAI. No cloud. Zero data leaves your machine.

By Marcus Thorne

llama.cpp Tutorial 2026: Run GGUF Models Locally on CPU and GPU

>_ 17 Apr | 17 min | Dev Corner

🟡Intermediate

Complete llama.cpp tutorial for 2026. Install, compile with CUDA/Metal, run GGUF models, tune all inference flags, use the API server, speculative decoding, and benchmark your hardware.

By Marcus Thorne

How to Install Ollama and Run LLMs Locally: Complete 2026 Guide

>_ 17 Apr | 16 min | Dev Corner

🟢Beginner

Install Ollama 5.x on Ubuntu, macOS, and Windows. Pull and run Llama 4, Qwen3, Gemma 3, and Mistral locally. REST API setup, GPU acceleration, Open WebUI, and sovereign model management.

By Marcus Thorne

Speculative Decoding Explained: 2x Faster Local LLMs with Ollama and llama.cpp 2026

>_ 16 Apr | 17 min | Dev Corner

🟡Intermediate

Speculative decoding doubles local LLM inference speed with zero quality loss. How it works, how to enable it in Ollama and llama.cpp today, and which model pairs give the best speedup.

By Marcus Thorne

GGUF Quantization Explained: Q4_K_M vs Q8_0 vs F16 — Which to Use in 2026

>_ 16 Apr | 16 min | Dev Corner

🟡Intermediate

Master GGUF quantization formats for local LLMs in 2026. Q2_K, Q4_K_M, Q5_K_S, Q8_0, F16 explained with benchmarks, VRAM tables, and exact Ollama and llama.cpp commands.

By Marcus Thorne

The Shatner Standoff: How AI 'Fake News' Bots Forced Meta

4 Apr | 5 min read | privacy-sovereignty

William Shatner's battle with AI-generated 'death' rumors has exposed a dark side of platform risk.

By Marcus Thorne

PS5 Price Hike 2026: Why Sony's Console Now Costs 30% More

2 Apr | 6 min read | reviews-hardware

Sony has raised PlayStation 5 prices again, citing a global helium shortage and regional conflict.

By Marcus Thorne

Ollama Hits 52 Million Monthly Downloads

29 Mar | 6 min read | ai-intelligence

With 52 million monthly downloads and 135,000 local models on HuggingFace, Ollama and local AI inference have officially moved from niche hobby to…

By Marcus Thorne

NVIDIA Agent Toolkit: 80% of Governments Using AI by 2028

27 Mar | 6 min read | ai-intelligence

Gartner predicts 80% of governments will deploy AI agents by 2028. We analyse NVIDIA's Agent Toolkit, NemoClaw, and what autonomous governance means for.

By Marcus Thorne

Meta-AMD $60B AI Deal: Breaking NVIDIA's Monopoly (2026)

26 Mar | 5 min read | ai-intelligence

Meta and AMD's $60B AI chip deal and 6-gigawatt GPU rollout signal a major shift in compute sovereignty.

By Marcus Thorne

US AI Policy Is a Democracy Crisis: 2026 Deep Analysis

25 Mar | 14 min read | privacy-sovereignty

A major analysis argues US AI governance is breaking down. We contrast governance risk with capacity building and explain why AI sovereignty is the.

By Marcus Thorne

Project Maven & Claude: Inside the AI Target Factory 2026

24 Mar | 6 min read | privacy-sovereignty

How the Pentagon's Maven Smart System and Claude are creating an AI targeting engine that doubles strike tempos in the 2026 US–Iran conflict — and the…

By Marcus Thorne

NVIDIA Free AI API: Building Sovereign LLM Apps with GLM-4.7

>_ 18 Mar | 10 min read | Dev Corner

Get a free NVIDIA API key, access GLM-4.7, and build a working AI app in Python. No cloud lock-in. Full sovereign dev setup with Streamlit.

By Marcus Thorne

12 Best Free Open-Source Tools for Creators in 2026

4 Dec | 11 min read | comparisons-alternatives

Ditch expensive subscriptions. The 12 best free, open-source creative tools in 2026 — ranked for sovereignty, privacy, and professional-grade output.

By Marcus Thorne