Monitoring in Docker

stack for Story using Docker Compose (Prometheus + Grafana + Alertmanager) with alerts

Prometheus scraping Story metrics
Grafana dashboards (pre-provisioned datasource)
Alertmanager to route alerts (Telegram/Discord/email later)
Node Exporter for host disk/CPU/RAM

This assumes your Story node runs on the host, and monitoring runs in Docker on the same server.

1. Enable metrics on the Story node

A) Consensus (CometBFT) metrics

Enable Prometheus metrics in your config.toml:

prometheus = true

check:

curl -s http://127.0.0.1:26660/metrics

B) Execution (story-geth) metrics

Enable geth metrics:

--metrics
--metrics.addr 127.0.0.1
--metrics.port 6060

2. Repo layout

Create a folder like story-monitoring-stack/:

story-monitoring-stack/
  docker-compose.yml
  prometheus/
    prometheus.yml
    rules.story.yml
  alertmanager/
    alertmanager.yml
  grafana/
    provisioning/
      datasources/ds.yml
      dashboards/dashboards.yml
    dashboards/
      (optional JSON dashboards)

3. Create docker-compose.yml

services:
  prometheus:
    image: prom/prometheus:latest
    command:
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.path=/prometheus
    volumes:
      - ./prometheus:/etc/prometheus:ro
      - prometheus_data:/prometheus
    ports:
      - "9090:9090"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    restart: unless-stopped

  alertmanager:
    image: prom/alertmanager:latest
    command:
      - --config.file=/etc/alertmanager/alertmanager.yml
    volumes:
      - ./alertmanager:/etc/alertmanager:ro
    ports:
      - "9093:9093"
    restart: unless-stopped

  grafana:
    image: grafana/grafana:latest
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=change_me
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
      - ./grafana/dashboards:/var/lib/grafana/dashboards:ro
    ports:
      - "3000:3000"
    restart: unless-stopped

  node-exporter:
    image: prom/node-exporter:latest
    pid: host
    volumes:
      - /:/host:ro,rslave
    command:
      - --path.rootfs=/host
    ports:
      - "9100:9100"
    restart: unless-stopped

volumes:
  prometheus_data:
  grafana_data:

4. Prometheus config

in path prometheus/prometheus.yml

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - /etc/prometheus/rules.story.yml

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["alertmanager:9093"]

scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ["prometheus:9090"]

  - job_name: alertmanager
    static_configs:
      - targets: ["alertmanager:9093"]

  - job_name: node
    static_configs:
      - targets: ["node-exporter:9100"]

  # Story consensus (CometBFT)
  - job_name: story_consensus
    metrics_path: /metrics
    static_configs:
      - targets: ["host.docker.internal:26660"]

  # Story geth (adjust metrics_path after curl test)
  - job_name: story_geth
    metrics_path: /debug/metrics/prometheus
    static_configs:
      - targets: ["host.docker.internal:6060"]

5. Alert rules

in path prometheus/rules.story.yml

groups:
- name: story.rules
  rules:
  # Endpoints down
  - alert: StoryConsensusMetricsDown
    expr: up{job="story_consensus"} == 0
    for: 2m
    labels: { severity: critical }
    annotations:
      summary: "Story consensus metrics endpoint is DOWN (26660)"

  - alert: StoryGethMetricsDown
    expr: up{job="story_geth"} == 0
    for: 2m
    labels: { severity: warning }
    annotations:
      summary: "Story geth metrics endpoint is DOWN (6060)"

  # Peers low (tune threshold)
  - alert: StoryLowPeers
    expr: p2p_peers{job="story_consensus"} < 10
    for: 5m
    labels: { severity: warning }
    annotations:
      summary: "Low peers: {{ $value }} (p2p_peers < 10)"

  # Height not increasing (node stalled)
  - alert: StoryHeightNotIncreasing
    expr: increase(consensus_latest_block_height{job="story_consensus"}[5m]) < 1
    for: 5m
    labels: { severity: critical }
    annotations:
      summary: "Block height not increasing for 5m"

  # Still syncing for a long time
  - alert: StoryStillSyncing
    expr: blocksync_syncing{job="story_consensus"} == 1
    for: 20m
    labels: { severity: warning }
    annotations:
      summary: "Node is syncing for >20m (blocksync_syncing=1)"

  # “Jailed risk” heuristics (not signing)
  # These metric names can vary by CometBFT build — verify in /metrics and adjust.
  - alert: StoryValidatorNotSigning
    expr: (consensus_height{job="story_consensus"} - consensus_validator_last_signed_height{job="story_consensus"}) > 10
    for: 2m
    labels: { severity: critical }
    annotations:
      summary: "Validator last signed height lags > 10 blocks"

  - alert: StoryMissedBlocksIncreasing
    expr: increase(consensus_validator_missed_blocks{job="story_consensus"}[10m]) > 0
    for: 0m
    labels: { severity: warning }
    annotations:
      summary: "Missed blocks increased in last 10m"

  - alert: StoryValidatorPowerZero
    expr: consensus_validator_power{job="story_consensus"} == 0
    for: 2m
    labels: { severity: critical }
    annotations:
      summary: "Validator power is 0 (may be jailed / not in active set)"

- name: host.rules
  rules:
  - alert: HostDiskLow
    expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) < 0.10
    for: 5m
    labels: { severity: critical }
    annotations:
      summary: "Disk free < 10% on /"

  - alert: HostCPUHigh
    expr: (100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)) > 85
    for: 10m
    labels: { severity: warning }
    annotations:
      summary: "CPU usage > 85% for 10m"

6. Alertmanager

in path alertmanager/alertmanager.yml

This is a “starter” config that just groups alerts. Replace receiver with Telegram/Discord/email later.

global: {}

route:
  group_by: ["alertname", "job", "instance"]
  group_wait: 10s
  group_interval: 2m
  repeat_interval: 2h
  receiver: "default"

receivers:
  - name: "default"
    # Add integrations here (webhook, email, etc.)
    # webhook_configs:
    #   - url: "http://your-webhook:port/..."

7. Grafana provisioning

in path grafana/provisioning/datasources/ds.yml

apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true

grafana/provisioning/dashboards/dashboards.yml

apiVersion: 1
providers:
  - name: "Local Dashboards"
    orgId: 1
    folder: ""
    type: file
    disableDeletion: true
    updateIntervalSeconds: 60
    options:
      path: /var/lib/grafana/dashboards

You can drop JSON dashboards into grafana/dashboards/ later.

8. Run it

From the stack directory:

cd $HOME/story-monitoring-stack && docker compose up -d

http://<server-ip>:3000
user: admin, pass: admin

PreviousCheatsheet NextValidator Runbook

Last updated 1 month ago

hashtag1. Enable metrics on the Story node

hashtagA) Consensus (CometBFT) metrics

hashtagB) Execution (story-geth) metrics

hashtag2. Repo layout

hashtag3. Create docker-compose.yml

hashtag4. Prometheus config

hashtag5. Alert rules

hashtag6. Alertmanager

hashtag7. Grafana provisioning

hashtag8. Run it