Key Takeaways
- Pull model: Prometheus scrapes (pulls) metrics from exporters — unlike push-based systems, you don’t need agents to know your monitoring server’s address. Add new hosts by adding them to
prometheus.yml; Prometheus finds them. - Node Exporter is the foundation: Install it on every server you want to monitor. One binary, one port (9100), 800+ instant metrics. It requires zero configuration.
- Grafana is the display layer: Prometheus stores metrics; Grafana queries and visualises them. Import dashboard ID 1860 (Node Exporter Full) from grafana.com/dashboards for a complete server overview in 30 seconds.
- Alertmanager closes the loop: Prometheus evaluates alerting rules; Alertmanager routes the firing alerts. The two are separate processes — both need to be running for end-to-end alerts.
Introduction
Direct Answer: How do I set up Prometheus and Grafana for server monitoring on Ubuntu 24.04 in 2026?
The fastest path is Docker Compose. Create a docker-compose.yml with three services: prometheus (image: prom/prometheus:v2.51.0), grafana (image: grafana/grafana:10.4.0), and node-exporter (image: prom/node-exporter:v1.8.0). Mount a prometheus.yml config file that scrapes node-exporter:9100 every 15 seconds. Run docker compose up -d. Access Grafana at http://localhost:3000 (admin/admin on first login), add Prometheus (http://prometheus:9090) as a data source, then import dashboard ID 1860 to see a complete server metrics overview. For a bare-metal install (no Docker), download the Prometheus binary from github.com/prometheus/prometheus/releases, extract to /opt/prometheus, create a systemd service, and follow the same config pattern. The full stack runs comfortably on a Hetzner CX22 (2 vCPU, 4GB RAM) monitoring 5–10 servers.
Architecture
┌──────────────────────────────────────────────────────────────────┐
│ MONITORING SERVER │
│ │
│ ┌─────────────┐ scrapes every 15s ┌───────────────────┐ │
│ │ Prometheus │◄───────────────────────│ Node Exporter │ │
│ │ :9090 │ │ :9100 │ │
│ │ Stores TSDB │◄─── also scrapes ──────│ (on each server) │ │
│ │ Eval rules │ └───────────────────┘ │
│ └──────┬───────┘ │
│ │ fires alerts ┌───────────────────┐ │
│ ┌──────▼───────┐ │ cAdvisor :8080 │ │
│ │ Alertmanager │─── Slack / email ──► │ (Docker metrics) │ │
│ │ :9093 │ └───────────────────┘ │
│ └──────────────┘ │
│ │ │
│ ┌──────▼───────┐ │
│ │ Grafana │◄─── queries PromQL ── Prometheus │
│ │ :3000 │ │
│ └──────────────┘ │
└──────────────────────────────────────────────────────────────────┘
Part 1: Docker Compose Stack
mkdir -p ~/monitoring/{prometheus,grafana/dashboards,alertmanager}
cd ~/monitoring
Prometheus configuration:
cat > prometheus/prometheus.yml << 'EOF'
global:
scrape_interval: 15s # Scrape every 15 seconds
evaluation_interval: 15s # Evaluate alerting rules every 15 seconds
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
# Alerting rules files
rule_files:
- /etc/prometheus/rules/*.yml
scrape_configs:
# Prometheus self-monitoring
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Host metrics via Node Exporter
- job_name: 'node'
static_configs:
- targets:
- 'node-exporter:9100' # The monitoring server itself
# Add more servers here:
# - '10.0.0.2:9100' # web server
# - '10.0.0.3:9100' # database server
# Docker container metrics via cAdvisor
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
EOF
Alerting rules:
mkdir -p prometheus/rules
cat > prometheus/rules/alerts.yml << 'EOF'
groups:
- name: server_alerts
interval: 30s
rules:
# CPU usage over 90% for 5 minutes
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU on {{ $labels.instance }}"
description: "CPU usage is {{ $value | humanize }}% for more than 5 minutes."
# Memory usage over 85%
- alert: HighMemoryUsage
expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High memory on {{ $labels.instance }}"
description: "Memory usage is {{ $value | humanize }}%."
# Disk usage over 80%
- alert: DiskSpaceLow
expr: (1 - node_filesystem_avail_bytes{fstype!="tmpfs",mountpoint="/"} / node_filesystem_size_bytes{fstype!="tmpfs",mountpoint="/"}) * 100 > 80
for: 5m
labels:
severity: critical
annotations:
summary: "Low disk space on {{ $labels.instance }}"
description: "Disk usage is {{ $value | humanize }}% on /."
# Server down (no scrape for 2 minutes)
- alert: InstanceDown
expr: up == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} is down"
description: "{{ $labels.instance }} has been unreachable for more than 2 minutes."
EOF
Alertmanager configuration:
cat > alertmanager/alertmanager.yml << 'EOF'
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'instance']
group_wait: 30s # Wait 30s before sending first alert (groups related alerts)
group_interval: 5m # How long to wait before sending additional alerts in the same group
repeat_interval: 4h # Re-send unresolved alerts every 4 hours
receiver: 'slack'
receivers:
- name: 'slack'
slack_configs:
- api_url: 'YOUR_SLACK_WEBHOOK_URL' # Replace with your Slack webhook
channel: '#alerts'
title: '{{ template "slack.default.title" . }}'
text: '{{ template "slack.default.text" . }}'
send_resolved: true
# Email alternative — uncomment and configure if using email
# - name: 'email'
# email_configs:
# - to: '[email protected]'
# from: '[email protected]'
# smarthost: 'smtp.gmail.com:587'
# auth_username: '[email protected]'
# auth_password: 'app-specific-password'
EOF
Docker Compose file:
cat > docker-compose.yml << 'EOF'
name: monitoring
services:
prometheus:
image: prom/prometheus:v2.51.0
container_name: prometheus
restart: unless-stopped
ports:
- "127.0.0.1:9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./prometheus/rules:/etc/prometheus/rules:ro
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d' # Keep 30 days of metrics
- '--web.enable-lifecycle' # Allow hot-reload via HTTP
grafana:
image: grafana/grafana:10.4.0
container_name: grafana
restart: unless-stopped
ports:
- "127.0.0.1:3000:3000"
volumes:
- grafana-data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD:-changeme_on_first_login}
- GF_USERS_ALLOW_SIGN_UP=false
- GF_SERVER_DOMAIN=monitoring.example.com
depends_on:
- prometheus
node-exporter:
image: prom/node-exporter:v1.8.0
container_name: node-exporter
restart: unless-stopped
network_mode: host # Access host network metrics directly
pid: host # Access host process info
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.49.1
container_name: cadvisor
restart: unless-stopped
ports:
- "127.0.0.1:8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
privileged: true # Required for container metrics access
alertmanager:
image: prom/alertmanager:v0.27.0
container_name: alertmanager
restart: unless-stopped
ports:
- "127.0.0.1:9093:9093"
volumes:
- ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
- alertmanager-data:/alertmanager
volumes:
prometheus-data:
grafana-data:
alertmanager-data:
EOF
cat > .env << 'EOF'
GRAFANA_PASSWORD=set_a_strong_password_here
EOF
docker compose up -d
sleep 10
docker compose ps
Expected output:
NAME IMAGE STATUS
alertmanager prom/alertmanager:v0.27.0 Up 8 seconds
cadvisor gcr.io/cadvisor/cadvisor:v... Up 8 seconds
grafana grafana/grafana:10.4.0 Up 9 seconds
node-exporter prom/node-exporter:v1.8.0 Up 9 seconds
prometheus prom/prometheus:v2.51.0 Up 10 seconds
Part 2: Verify Prometheus is Scraping
# Check Prometheus targets
curl -s http://localhost:9090/api/v1/targets | \
python3 -c "
import json, sys
data = json.load(sys.stdin)
for t in data['data']['activeTargets']:
print(f\" {t['labels']['job']:15s} {t['health']:6s} {t['labels'].get('instance','')}\")
"
Expected output:
prometheus up localhost:9090
node up node-exporter:9100
cadvisor up cadvisor:8080
# Query a metric to confirm data is flowing
curl -sG http://localhost:9090/api/v1/query \
--data-urlencode 'query=100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)' | \
python3 -c "
import json, sys
data = json.load(sys.stdin)
val = data['data']['result'][0]['value'][1]
print(f'Current CPU usage: {float(val):.1f}%')
"
Expected output:
Current CPU usage: 12.3%
Part 3: Grafana Dashboard Setup
# Add Prometheus as a data source via Grafana API
curl -s -X POST http://localhost:3000/api/datasources \
-H "Content-Type: application/json" \
-u "admin:${GRAFANA_PASSWORD:-changeme_on_first_login}" \
-d '{
"name": "Prometheus",
"type": "prometheus",
"url": "http://prometheus:9090",
"access": "proxy",
"isDefault": true
}' | python3 -m json.tool | grep '"message"'
Expected output:
"message": "Datasource added",
# Import the Node Exporter Full dashboard (ID 1860 — most popular Grafana dashboard)
curl -s -X POST http://localhost:3000/api/dashboards/import \
-H "Content-Type: application/json" \
-u "admin:${GRAFANA_PASSWORD:-changeme_on_first_login}" \
-d '{
"dashboard": null,
"overwrite": true,
"inputs": [{"name": "DS_PROMETHEUS", "type": "datasource", "pluginId": "prometheus", "value": "Prometheus"}],
"folderId": 0,
"pluginId": null,
"path": "https://grafana.com/api/dashboards/1860/revisions/37/download"
}' | python3 -c "import json,sys; d=json.load(sys.stdin); print('Dashboard imported:', d.get('slug',''))" 2>/dev/null || \
echo "Import via UI: Dashboards → Import → Enter ID 1860 → Load"
Manual import via UI (more reliable):
- Open
http://localhost:3000in browser - Login: admin / your password
- Left menu → Dashboards → New → Import
- Enter 1860 in the “Import via grafana.com” box → Load
- Select Prometheus as the data source → Import
You now have a complete server monitoring dashboard with CPU, memory, disk, network, and systemd service status.
Part 4: Monitor Additional Servers
# ── ON EACH ADDITIONAL SERVER TO MONITOR ─────────────────────────────────
# Install and start Node Exporter
# Create a dedicated system user
sudo useradd --system --no-create-home --shell /bin/false prometheus
# Download Node Exporter
NODE_EXPORTER_VERSION="1.8.0"
wget -q "https://github.com/prometheus/node_exporter/releases/download/v${NODE_EXPORTER_VERSION}/node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz"
tar xzf node_exporter-*.tar.gz
sudo cp node_exporter-*/node_exporter /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/node_exporter
# Create systemd service
sudo tee /etc/systemd/system/node_exporter.service << 'EOF'
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
sudo systemctl status node_exporter --no-pager | grep "Active:"
Expected output:
Active: active (running)
# ── ON THE MONITORING SERVER — add the new server to prometheus.yml ───────
# Edit prometheus/prometheus.yml, add target under the 'node' job:
# - '10.0.0.2:9100'
# Hot-reload Prometheus (no restart needed)
curl -s -X POST http://localhost:9090/-/reload
echo "Prometheus config reloaded"
# Verify new target appears
sleep 5
curl -sG http://localhost:9090/api/v1/targets | python3 -c "
import json, sys
for t in json.load(sys.stdin)['data']['activeTargets']:
if t['labels']['job'] == 'node':
print(f\" {t['labels']['instance']:25s} {t['health']}\")
"
Expected output:
node-exporter:9100 up
10.0.0.2:9100 up
Part 5: Useful PromQL Queries
# CPU usage (%) per host
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory usage (%)
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
# Disk usage (%) for root partition
(1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100
# Network receive rate (Mbps)
rate(node_network_receive_bytes_total{device!="lo"}[5m]) * 8 / 1e6
# Container CPU usage (per container)
rate(container_cpu_usage_seconds_total{image!=""}[5m]) * 100
# Container memory usage (MB)
container_memory_usage_bytes{image!=""} / 1e6
# System load average
node_load1
# Disk I/O (reads/sec)
rate(node_disk_reads_completed_total[5m])
# HTTP requests rate (if using nginx-prometheus-exporter)
rate(nginx_http_requests_total[5m])
Troubleshooting
Prometheus target shows state: down
Cause: Node Exporter is not running on the target host, or UFW is blocking port 9100. Fix:
# On target server
sudo systemctl status node_exporter
# Open port 9100 only to monitoring server IP (not public):
sudo ufw allow from MONITORING_SERVER_IP to any port 9100
Grafana shows “No data” on dashboards
Cause: Prometheus data source URL is wrong, or time range mismatch. Fix:
# Test Prometheus is reachable from Grafana container
docker exec grafana wget -qO- http://prometheus:9090/api/v1/query?query=up | head -c 100
# Should return JSON. If it fails, check container networking.
Alertmanager not sending Slack alerts
Cause: Webhook URL wrong, or Alertmanager config syntax error. Fix:
# Check Alertmanager config is valid
docker exec alertmanager amtool check-config /etc/alertmanager/alertmanager.yml
# Send a test alert
curl -s -X POST http://localhost:9093/api/v1/alerts \
-H "Content-Type: application/json" \
-d '[{"labels":{"alertname":"TestAlert","severity":"info"},"annotations":{"summary":"Test"}}]'
Conclusion
The full Prometheus + Grafana + Alertmanager stack is running: Node Exporter feeds host metrics into Prometheus every 15 seconds, cAdvisor provides per-container metrics, Grafana displays them on imported dashboards, and Alertmanager fires Slack notifications when CPU, memory, or disk thresholds are crossed. All data stays on your infrastructure — no Datadog, no New Relic, no SaaS monitoring bill.
See Docker Compose Tutorial 2026 for the deployment pattern used here, and Python for DevOps Automation for scripting automated responses to Prometheus alerts.
People Also Ask
What is the difference between Prometheus and Grafana?
Prometheus is a metrics collection and storage system — it scrapes metric endpoints, stores the time-series data in its local database, and evaluates alerting rules. Grafana is a visualisation and dashboarding tool — it queries data sources (including Prometheus) and renders graphs, tables, and gauges. Prometheus doesn’t have a polished UI for browsing metrics; Grafana doesn’t collect or store data. They are complementary: Prometheus handles data ingestion and alerting logic; Grafana handles display and exploration.
Can Prometheus monitor application metrics, not just server metrics?
Yes — any application can expose a /metrics endpoint in Prometheus format, and Prometheus will scrape it. Most major frameworks have Prometheus client libraries: prometheus_client for Python, prom-client for Node.js, prometheus crate for Rust. A minimal FastAPI app can expose request counts, latency histograms, and error rates in under 20 lines of code. The same Grafana dashboard can then show both server metrics (from Node Exporter) and application metrics (from your service) side by side.
How long does Prometheus store data by default?
Prometheus defaults to 15 days of retention. In the Docker Compose config above, --storage.tsdb.retention.time=30d extends this to 30 days. For longer retention, either increase the retention time (at the cost of more disk space — approximately 1–2 bytes per sample) or use a remote storage backend like Thanos or VictoriaMetrics, which store Prometheus data in object storage (S3-compatible) indefinitely.
Further Reading
- Docker Compose Tutorial 2026 — the deployment pattern for this monitoring stack
- Python for DevOps Automation — automate responses to Prometheus alerts with Python
- Linux Networking Basics 2026 — understanding the network metrics Prometheus collects
- Ubuntu 24.04 LTS Server Setup Checklist — the servers this stack monitors
Tested on: Ubuntu 24.04 LTS (Hetzner CX22). Prometheus 2.51.0, Grafana 10.4.0, Node Exporter 1.8.0, cAdvisor 0.49.1. Last verified: April 28, 2026.