# Monitoring in Docker

{% hint style="info" %}

* **Prometheus** scraping Story metrics
* **Grafana** dashboards (pre-provisioned datasource)
* **Alertmanager** to route alerts (Telegram/Discord/email later)
* **Node Exporter** for host disk/CPU/RAM

This assumes your **Story node runs on the host**, and monitoring runs in Docker on the same server.
{% endhint %}

## 1. Enable metrics on the Story node

#### A) Consensus (CometBFT) metrics

Enable Prometheus metrics in your `config.toml`:

`prometheus = true`

check:

```bash
curl -s http://127.0.0.1:26660/metrics
```

#### B) Execution (story-geth) metrics

Enable geth metrics:

* `--metrics`
* `--metrics.addr 127.0.0.1`
* `--metrics.port 6060`

## 2. Repo layout

Create a folder like `story-monitoring-stack/`:

```
story-monitoring-stack/
  docker-compose.yml
  prometheus/
    prometheus.yml
    rules.story.yml
  alertmanager/
    alertmanager.yml
  grafana/
    provisioning/
      datasources/ds.yml
      dashboards/dashboards.yml
    dashboards/
      (optional JSON dashboards)
```

## 3. Create docker-compose.yml

```yaml
services:
  prometheus:
    image: prom/prometheus:latest
    command:
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.path=/prometheus
    volumes:
      - ./prometheus:/etc/prometheus:ro
      - prometheus_data:/prometheus
    ports:
      - "9090:9090"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    restart: unless-stopped

  alertmanager:
    image: prom/alertmanager:latest
    command:
      - --config.file=/etc/alertmanager/alertmanager.yml
    volumes:
      - ./alertmanager:/etc/alertmanager:ro
    ports:
      - "9093:9093"
    restart: unless-stopped

  grafana:
    image: grafana/grafana:latest
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=change_me
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
      - ./grafana/dashboards:/var/lib/grafana/dashboards:ro
    ports:
      - "3000:3000"
    restart: unless-stopped

  node-exporter:
    image: prom/node-exporter:latest
    pid: host
    volumes:
      - /:/host:ro,rslave
    command:
      - --path.rootfs=/host
    ports:
      - "9100:9100"
    restart: unless-stopped

volumes:
  prometheus_data:
  grafana_data:
```

## 4. Prometheus config

in path prometheus/prometheus.yml

```yaml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - /etc/prometheus/rules.story.yml

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["alertmanager:9093"]

scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ["prometheus:9090"]

  - job_name: alertmanager
    static_configs:
      - targets: ["alertmanager:9093"]

  - job_name: node
    static_configs:
      - targets: ["node-exporter:9100"]

  # Story consensus (CometBFT)
  - job_name: story_consensus
    metrics_path: /metrics
    static_configs:
      - targets: ["host.docker.internal:26660"]

  # Story geth (adjust metrics_path after curl test)
  - job_name: story_geth
    metrics_path: /debug/metrics/prometheus
    static_configs:
      - targets: ["host.docker.internal:6060"]
```

## 5. Alert rules

in path prometheus/rules.story.yml

```yaml
groups:
- name: story.rules
  rules:
  # Endpoints down
  - alert: StoryConsensusMetricsDown
    expr: up{job="story_consensus"} == 0
    for: 2m
    labels: { severity: critical }
    annotations:
      summary: "Story consensus metrics endpoint is DOWN (26660)"

  - alert: StoryGethMetricsDown
    expr: up{job="story_geth"} == 0
    for: 2m
    labels: { severity: warning }
    annotations:
      summary: "Story geth metrics endpoint is DOWN (6060)"

  # Peers low (tune threshold)
  - alert: StoryLowPeers
    expr: p2p_peers{job="story_consensus"} < 10
    for: 5m
    labels: { severity: warning }
    annotations:
      summary: "Low peers: {{ $value }} (p2p_peers < 10)"

  # Height not increasing (node stalled)
  - alert: StoryHeightNotIncreasing
    expr: increase(consensus_latest_block_height{job="story_consensus"}[5m]) < 1
    for: 5m
    labels: { severity: critical }
    annotations:
      summary: "Block height not increasing for 5m"

  # Still syncing for a long time
  - alert: StoryStillSyncing
    expr: blocksync_syncing{job="story_consensus"} == 1
    for: 20m
    labels: { severity: warning }
    annotations:
      summary: "Node is syncing for >20m (blocksync_syncing=1)"

  # “Jailed risk” heuristics (not signing)
  # These metric names can vary by CometBFT build — verify in /metrics and adjust.
  - alert: StoryValidatorNotSigning
    expr: (consensus_height{job="story_consensus"} - consensus_validator_last_signed_height{job="story_consensus"}) > 10
    for: 2m
    labels: { severity: critical }
    annotations:
      summary: "Validator last signed height lags > 10 blocks"

  - alert: StoryMissedBlocksIncreasing
    expr: increase(consensus_validator_missed_blocks{job="story_consensus"}[10m]) > 0
    for: 0m
    labels: { severity: warning }
    annotations:
      summary: "Missed blocks increased in last 10m"

  - alert: StoryValidatorPowerZero
    expr: consensus_validator_power{job="story_consensus"} == 0
    for: 2m
    labels: { severity: critical }
    annotations:
      summary: "Validator power is 0 (may be jailed / not in active set)"

- name: host.rules
  rules:
  - alert: HostDiskLow
    expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) < 0.10
    for: 5m
    labels: { severity: critical }
    annotations:
      summary: "Disk free < 10% on /"

  - alert: HostCPUHigh
    expr: (100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)) > 85
    for: 10m
    labels: { severity: warning }
    annotations:
      summary: "CPU usage > 85% for 10m"
```

## 6. Alertmanager

in path alertmanager/alertmanager.yml

This is a “starter” config that just groups alerts. Replace receiver with Telegram/Discord/email later.

```yaml
global: {}

route:
  group_by: ["alertname", "job", "instance"]
  group_wait: 10s
  group_interval: 2m
  repeat_interval: 2h
  receiver: "default"

receivers:
  - name: "default"
    # Add integrations here (webhook, email, etc.)
    # webhook_configs:
    #   - url: "http://your-webhook:port/..."
```

## 7. Grafana provisioning

in path grafana/provisioning/datasources/ds.yml

```yaml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
```

grafana/provisioning/dashboards/dashboards.yml

```yaml
apiVersion: 1
providers:
  - name: "Local Dashboards"
    orgId: 1
    folder: ""
    type: file
    disableDeletion: true
    updateIntervalSeconds: 60
    options:
      path: /var/lib/grafana/dashboards
```

You can drop JSON dashboards into `grafana/dashboards/` later.

## 8. Run it

From the stack directory:

```bash
cd $HOME/story-monitoring-stack && docker compose up -d
```

Login to Grafana:

* `http://<server-ip>:3000`
* user: `admin`, pass: `admin`


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://cryptomolot.gitbook.io/cryptomolot-docs/mainnets/story-protocol/monitoring-in-docker.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
