Vucense

Prometheus and Grafana on Ubuntu 24.04: Monitoring Setup 2026

🟡Intermediate

Set up Prometheus and Grafana for server monitoring on Ubuntu 24.04 LTS in 2026. Covers Node Exporter, Docker metrics, custom dashboards, alerting with Alertmanager, and Docker Compose deployment.

Prometheus and Grafana on Ubuntu 24.04: Monitoring Setup 2026
Article Roadmap

Key Takeaways

  • Pull model: Prometheus scrapes (pulls) metrics from exporters — unlike push-based systems, you don’t need agents to know your monitoring server’s address. Add new hosts by adding them to prometheus.yml; Prometheus finds them.
  • Node Exporter is the foundation: Install it on every server you want to monitor. One binary, one port (9100), 800+ instant metrics. It requires zero configuration.
  • Grafana is the display layer: Prometheus stores metrics; Grafana queries and visualises them. Import dashboard ID 1860 (Node Exporter Full) from grafana.com/dashboards for a complete server overview in 30 seconds.
  • Alertmanager closes the loop: Prometheus evaluates alerting rules; Alertmanager routes the firing alerts. The two are separate processes — both need to be running for end-to-end alerts.

Introduction

Direct Answer: How do I set up Prometheus and Grafana for server monitoring on Ubuntu 24.04 in 2026?

The fastest path is Docker Compose. Create a docker-compose.yml with three services: prometheus (image: prom/prometheus:v2.51.0), grafana (image: grafana/grafana:10.4.0), and node-exporter (image: prom/node-exporter:v1.8.0). Mount a prometheus.yml config file that scrapes node-exporter:9100 every 15 seconds. Run docker compose up -d. Access Grafana at http://localhost:3000 (admin/admin on first login), add Prometheus (http://prometheus:9090) as a data source, then import dashboard ID 1860 to see a complete server metrics overview. For a bare-metal install (no Docker), download the Prometheus binary from github.com/prometheus/prometheus/releases, extract to /opt/prometheus, create a systemd service, and follow the same config pattern. The full stack runs comfortably on a Hetzner CX22 (2 vCPU, 4GB RAM) monitoring 5–10 servers.


Architecture

┌──────────────────────────────────────────────────────────────────┐
│  MONITORING SERVER                                                │
│                                                                   │
│  ┌─────────────┐    scrapes every 15s    ┌───────────────────┐  │
│  │  Prometheus  │◄───────────────────────│  Node Exporter    │  │
│  │  :9090       │                        │  :9100            │  │
│  │  Stores TSDB │◄─── also scrapes ──────│  (on each server) │  │
│  │  Eval rules  │                        └───────────────────┘  │
│  └──────┬───────┘                                               │
│         │ fires alerts                   ┌───────────────────┐  │
│  ┌──────▼───────┐                        │  cAdvisor :8080   │  │
│  │ Alertmanager │─── Slack / email ──►   │  (Docker metrics) │  │
│  │  :9093       │                        └───────────────────┘  │
│  └──────────────┘                                               │
│         │                                                        │
│  ┌──────▼───────┐                                               │
│  │   Grafana    │◄─── queries PromQL ── Prometheus              │
│  │   :3000      │                                               │
│  └──────────────┘                                               │
└──────────────────────────────────────────────────────────────────┘

Part 1: Docker Compose Stack

mkdir -p ~/monitoring/{prometheus,grafana/dashboards,alertmanager}
cd ~/monitoring

Prometheus configuration:

cat > prometheus/prometheus.yml << 'EOF'
global:
  scrape_interval: 15s        # Scrape every 15 seconds
  evaluation_interval: 15s    # Evaluate alerting rules every 15 seconds

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']

# Alerting rules files
rule_files:
  - /etc/prometheus/rules/*.yml

scrape_configs:
  # Prometheus self-monitoring
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Host metrics via Node Exporter
  - job_name: 'node'
    static_configs:
      - targets:
          - 'node-exporter:9100'    # The monitoring server itself
          # Add more servers here:
          # - '10.0.0.2:9100'       # web server
          # - '10.0.0.3:9100'       # database server

  # Docker container metrics via cAdvisor
  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']
EOF

Alerting rules:

mkdir -p prometheus/rules
cat > prometheus/rules/alerts.yml << 'EOF'
groups:
  - name: server_alerts
    interval: 30s
    rules:
      # CPU usage over 90% for 5 minutes
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU on {{ $labels.instance }}"
          description: "CPU usage is {{ $value | humanize }}% for more than 5 minutes."

      # Memory usage over 85%
      - alert: HighMemoryUsage
        expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory on {{ $labels.instance }}"
          description: "Memory usage is {{ $value | humanize }}%."

      # Disk usage over 80%
      - alert: DiskSpaceLow
        expr: (1 - node_filesystem_avail_bytes{fstype!="tmpfs",mountpoint="/"} / node_filesystem_size_bytes{fstype!="tmpfs",mountpoint="/"}) * 100 > 80
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"
          description: "Disk usage is {{ $value | humanize }}% on /."

      # Server down (no scrape for 2 minutes)
      - alert: InstanceDown
        expr: up == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} is down"
          description: "{{ $labels.instance }} has been unreachable for more than 2 minutes."
EOF

Alertmanager configuration:

cat > alertmanager/alertmanager.yml << 'EOF'
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'instance']
  group_wait: 30s        # Wait 30s before sending first alert (groups related alerts)
  group_interval: 5m     # How long to wait before sending additional alerts in the same group
  repeat_interval: 4h    # Re-send unresolved alerts every 4 hours
  receiver: 'slack'

receivers:
  - name: 'slack'
    slack_configs:
      - api_url: 'YOUR_SLACK_WEBHOOK_URL'   # Replace with your Slack webhook
        channel: '#alerts'
        title: '{{ template "slack.default.title" . }}'
        text: '{{ template "slack.default.text" . }}'
        send_resolved: true

  # Email alternative — uncomment and configure if using email
  # - name: 'email'
  #   email_configs:
  #     - to: '[email protected]'
  #       from: '[email protected]'
  #       smarthost: 'smtp.gmail.com:587'
  #       auth_username: '[email protected]'
  #       auth_password: 'app-specific-password'
EOF

Docker Compose file:

cat > docker-compose.yml << 'EOF'
name: monitoring

services:
  prometheus:
    image: prom/prometheus:v2.51.0
    container_name: prometheus
    restart: unless-stopped
    ports:
      - "127.0.0.1:9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./prometheus/rules:/etc/prometheus/rules:ro
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'   # Keep 30 days of metrics
      - '--web.enable-lifecycle'               # Allow hot-reload via HTTP

  grafana:
    image: grafana/grafana:10.4.0
    container_name: grafana
    restart: unless-stopped
    ports:
      - "127.0.0.1:3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD:-changeme_on_first_login}
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_SERVER_DOMAIN=monitoring.example.com
    depends_on:
      - prometheus

  node-exporter:
    image: prom/node-exporter:v1.8.0
    container_name: node-exporter
    restart: unless-stopped
    network_mode: host    # Access host network metrics directly
    pid: host             # Access host process info
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:v0.49.1
    container_name: cadvisor
    restart: unless-stopped
    ports:
      - "127.0.0.1:8080:8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    privileged: true    # Required for container metrics access

  alertmanager:
    image: prom/alertmanager:v0.27.0
    container_name: alertmanager
    restart: unless-stopped
    ports:
      - "127.0.0.1:9093:9093"
    volumes:
      - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
      - alertmanager-data:/alertmanager

volumes:
  prometheus-data:
  grafana-data:
  alertmanager-data:
EOF

cat > .env << 'EOF'
GRAFANA_PASSWORD=set_a_strong_password_here
EOF

docker compose up -d
sleep 10
docker compose ps

Expected output:

NAME            IMAGE                          STATUS
alertmanager    prom/alertmanager:v0.27.0      Up 8 seconds
cadvisor        gcr.io/cadvisor/cadvisor:v...  Up 8 seconds
grafana         grafana/grafana:10.4.0         Up 9 seconds
node-exporter   prom/node-exporter:v1.8.0      Up 9 seconds
prometheus      prom/prometheus:v2.51.0        Up 10 seconds

Part 2: Verify Prometheus is Scraping

# Check Prometheus targets
curl -s http://localhost:9090/api/v1/targets | \
  python3 -c "
import json, sys
data = json.load(sys.stdin)
for t in data['data']['activeTargets']:
    print(f\"  {t['labels']['job']:15s} {t['health']:6s} {t['labels'].get('instance','')}\")
"

Expected output:

  prometheus      up     localhost:9090
  node            up     node-exporter:9100
  cadvisor        up     cadvisor:8080
# Query a metric to confirm data is flowing
curl -sG http://localhost:9090/api/v1/query \
  --data-urlencode 'query=100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)' | \
  python3 -c "
import json, sys
data = json.load(sys.stdin)
val = data['data']['result'][0]['value'][1]
print(f'Current CPU usage: {float(val):.1f}%')
"

Expected output:

Current CPU usage: 12.3%

Part 3: Grafana Dashboard Setup

# Add Prometheus as a data source via Grafana API
curl -s -X POST http://localhost:3000/api/datasources \
  -H "Content-Type: application/json" \
  -u "admin:${GRAFANA_PASSWORD:-changeme_on_first_login}" \
  -d '{
    "name": "Prometheus",
    "type": "prometheus",
    "url": "http://prometheus:9090",
    "access": "proxy",
    "isDefault": true
  }' | python3 -m json.tool | grep '"message"'

Expected output:

    "message": "Datasource added",
# Import the Node Exporter Full dashboard (ID 1860 — most popular Grafana dashboard)
curl -s -X POST http://localhost:3000/api/dashboards/import \
  -H "Content-Type: application/json" \
  -u "admin:${GRAFANA_PASSWORD:-changeme_on_first_login}" \
  -d '{
    "dashboard": null,
    "overwrite": true,
    "inputs": [{"name": "DS_PROMETHEUS", "type": "datasource", "pluginId": "prometheus", "value": "Prometheus"}],
    "folderId": 0,
    "pluginId": null,
    "path": "https://grafana.com/api/dashboards/1860/revisions/37/download"
  }' | python3 -c "import json,sys; d=json.load(sys.stdin); print('Dashboard imported:', d.get('slug',''))" 2>/dev/null || \
  echo "Import via UI: Dashboards → Import → Enter ID 1860 → Load"

Manual import via UI (more reliable):

  1. Open http://localhost:3000 in browser
  2. Login: admin / your password
  3. Left menu → DashboardsNewImport
  4. Enter 1860 in the “Import via grafana.com” box → Load
  5. Select Prometheus as the data source → Import

You now have a complete server monitoring dashboard with CPU, memory, disk, network, and systemd service status.


Part 4: Monitor Additional Servers

# ── ON EACH ADDITIONAL SERVER TO MONITOR ─────────────────────────────────
# Install and start Node Exporter

# Create a dedicated system user
sudo useradd --system --no-create-home --shell /bin/false prometheus

# Download Node Exporter
NODE_EXPORTER_VERSION="1.8.0"
wget -q "https://github.com/prometheus/node_exporter/releases/download/v${NODE_EXPORTER_VERSION}/node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz"
tar xzf node_exporter-*.tar.gz
sudo cp node_exporter-*/node_exporter /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/node_exporter

# Create systemd service
sudo tee /etc/systemd/system/node_exporter.service << 'EOF'
[Unit]
Description=Node Exporter
After=network.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/node_exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
sudo systemctl status node_exporter --no-pager | grep "Active:"

Expected output:

     Active: active (running)
# ── ON THE MONITORING SERVER — add the new server to prometheus.yml ───────
# Edit prometheus/prometheus.yml, add target under the 'node' job:
# - '10.0.0.2:9100'

# Hot-reload Prometheus (no restart needed)
curl -s -X POST http://localhost:9090/-/reload
echo "Prometheus config reloaded"

# Verify new target appears
sleep 5
curl -sG http://localhost:9090/api/v1/targets | python3 -c "
import json, sys
for t in json.load(sys.stdin)['data']['activeTargets']:
    if t['labels']['job'] == 'node':
        print(f\"  {t['labels']['instance']:25s} {t['health']}\")
"

Expected output:

  node-exporter:9100        up
  10.0.0.2:9100             up

Part 5: Useful PromQL Queries

# CPU usage (%) per host
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory usage (%)
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

# Disk usage (%) for root partition
(1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100

# Network receive rate (Mbps)
rate(node_network_receive_bytes_total{device!="lo"}[5m]) * 8 / 1e6

# Container CPU usage (per container)
rate(container_cpu_usage_seconds_total{image!=""}[5m]) * 100

# Container memory usage (MB)
container_memory_usage_bytes{image!=""} / 1e6

# System load average
node_load1

# Disk I/O (reads/sec)
rate(node_disk_reads_completed_total[5m])

# HTTP requests rate (if using nginx-prometheus-exporter)
rate(nginx_http_requests_total[5m])

Troubleshooting

Prometheus target shows state: down

Cause: Node Exporter is not running on the target host, or UFW is blocking port 9100. Fix:

# On target server
sudo systemctl status node_exporter
# Open port 9100 only to monitoring server IP (not public):
sudo ufw allow from MONITORING_SERVER_IP to any port 9100

Grafana shows “No data” on dashboards

Cause: Prometheus data source URL is wrong, or time range mismatch. Fix:

# Test Prometheus is reachable from Grafana container
docker exec grafana wget -qO- http://prometheus:9090/api/v1/query?query=up | head -c 100
# Should return JSON. If it fails, check container networking.

Alertmanager not sending Slack alerts

Cause: Webhook URL wrong, or Alertmanager config syntax error. Fix:

# Check Alertmanager config is valid
docker exec alertmanager amtool check-config /etc/alertmanager/alertmanager.yml

# Send a test alert
curl -s -X POST http://localhost:9093/api/v1/alerts \
  -H "Content-Type: application/json" \
  -d '[{"labels":{"alertname":"TestAlert","severity":"info"},"annotations":{"summary":"Test"}}]'

Conclusion

The full Prometheus + Grafana + Alertmanager stack is running: Node Exporter feeds host metrics into Prometheus every 15 seconds, cAdvisor provides per-container metrics, Grafana displays them on imported dashboards, and Alertmanager fires Slack notifications when CPU, memory, or disk thresholds are crossed. All data stays on your infrastructure — no Datadog, no New Relic, no SaaS monitoring bill.

See Docker Compose Tutorial 2026 for the deployment pattern used here, and Python for DevOps Automation for scripting automated responses to Prometheus alerts.


People Also Ask

What is the difference between Prometheus and Grafana?

Prometheus is a metrics collection and storage system — it scrapes metric endpoints, stores the time-series data in its local database, and evaluates alerting rules. Grafana is a visualisation and dashboarding tool — it queries data sources (including Prometheus) and renders graphs, tables, and gauges. Prometheus doesn’t have a polished UI for browsing metrics; Grafana doesn’t collect or store data. They are complementary: Prometheus handles data ingestion and alerting logic; Grafana handles display and exploration.

Can Prometheus monitor application metrics, not just server metrics?

Yes — any application can expose a /metrics endpoint in Prometheus format, and Prometheus will scrape it. Most major frameworks have Prometheus client libraries: prometheus_client for Python, prom-client for Node.js, prometheus crate for Rust. A minimal FastAPI app can expose request counts, latency histograms, and error rates in under 20 lines of code. The same Grafana dashboard can then show both server metrics (from Node Exporter) and application metrics (from your service) side by side.

How long does Prometheus store data by default?

Prometheus defaults to 15 days of retention. In the Docker Compose config above, --storage.tsdb.retention.time=30d extends this to 30 days. For longer retention, either increase the retention time (at the cost of more disk space — approximately 1–2 bytes per sample) or use a remote storage backend like Thanos or VictoriaMetrics, which store Prometheus data in object storage (S3-compatible) indefinitely.


Further Reading


Tested on: Ubuntu 24.04 LTS (Hetzner CX22). Prometheus 2.51.0, Grafana 10.4.0, Node Exporter 1.8.0, cAdvisor 0.49.1. Last verified: April 28, 2026.

Anju Kushwaha

About the Author

Founder & Editorial Director

B-Tech Electronics & Communication Engineering | Founder of Vucense | Technical Operations & Editorial Strategy

Anju Kushwaha is the founder and editorial director of Vucense, driving the publication's mission to provide independent, expert analysis of sovereign technology and AI. With a background in electronics engineering and years of experience in tech strategy and operations, Anju curates Vucense's editorial calendar, collaborates with subject-matter experts to validate technical accuracy, and oversees quality standards across all content. Her role combines editorial leadership (ensuring author expertise matches topics, fact-checking and source verification, coordinating with specialist contributors) with strategic direction (choosing which emerging tech trends deserve in-depth coverage). Anju works directly with experts like Noah Choi (infrastructure), Elena Volkov (cryptography), and Siddharth Rao (AI policy) to ensure each article meets E-E-A-T standards and serves Vucense's readers with authoritative guidance. At Vucense, Anju also writes curated analysis pieces, trend summaries, and editorial perspectives on the state of sovereign tech infrastructure.

View Profile

Further Reading

All Dev Corner

Comments