Tools to monitore your validator

Special thanks for important parts @p1xel32

Hello friends! I consider it necessary to make a general analysis of useful software which you will need to make the validator always work.

Here I will present how monitoring is performed using 3 utilities:

  • 1st Part - Prometheus

  • 2nd Part - Grafana cloud

  • 3d Part - Node exporter

  • 4th Part - Dashborad setting up

  • 5th Part - Conclusion

Before we start - deploy and connecting to a new server

Update dependencies

sudo apt update && sudo apt upgrade -y
sudo apt install nano

Setup Prometheus

1 Create a dedicated user and group for Prometheus on your server

groupadd --system prometheus
useradd -s /sbin/nologin --system -g prometheus prometheus

1.2 Download the latest version of Prometheus

wget 
https://github.com/prometheus/prometheus/releases/download/v2.51.2/prometheus-2.51.2.linux-amd64.tar.gz

1.3 Extract Prometheus

tar -xvf prometheus*.tar.gz

1.4 Change the directory to the extracted directory

cd prometheus-2.51.2.linux-amd64

1.5 Create some required directories

mkdir /etc/prometheus
mkdir /var/lib/prometheus

1.6 Copy the required files

mv prometheus.yml /etc/prometheus/prometheus.yml
mv consoles/ console_libraries/ /etc/prometheus/
mv prometheus promtool /usr/local/bin/

1.7 Create a systemd service file

sudo tee /etc/systemd/system/prometheus.service > /dev/null <<EOF
[Unit]
Description=Prometheus
Documentation=https://prometheus.io/docs/introduction/overview/
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries \
  --web.listen-address=0.0.0.0:9090 \
  --web.external-url=
SyslogIdentifier=prometheus
Restart=always

[Install]
WantedBy=multi-user.target
EOF

1.8 Set proper ownership and permission to the Prometheus directory

sudo chown -R prometheus:prometheus /etc/prometheus/
sudo chmod -R 775 /etc/prometheus/
sudo chown -R prometheus:prometheus /var/lib/prometheus/

Setup Grafana Cloud

2 Create account and api keys grafana free service

https://grafana.com/auth/sign-up/

2.1 Head over to your Grafana Cloud Portal and select Send Metrics on Prometheus. If you scroll above, you should see the section for API Key.

Click on Generate now and create an API Key with the Role MetricsPublisher. Copy the Prometheus config and save it locally. The url and username should be unique for every user. The password in both snippet should be filled with your API key.

2.2 Change prometheus config change url, password and username in config

nano /etc/prometheus/prometheus.yml

Change 5 lines by yours (origin_prometheus, url, username, password, job_name exporter targets)

# Sample config for Prometheus.

global:
  scrape_interval: 60s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 60s # Evaluate rules every 15 seconds. The default is every 1 minute.

  # scrape_timeout is set to the global default (10s).
  # external systems (federation, remote storage, Alertmanager).

  external_labels:
    monitor: 'example'
    origin_prometheus: <AnyName>
remote_write:
- url: https://prometheus-prod-12-prod-us-central-4.grafana.net/api/prom/push
  basic_auth:
    username: 77777
    password: AOHSDJASHDKASDUhkasjdhauKSADHausdhaskj
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['localhost:9093']

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  #- "first_rules.yml"
  #- "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label 'job=<job_name>' to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 60s
    scrape_timeout: 60s

    #metrics_path defaults to '/metrics'
    #scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'exporter'
    # If prometheus-node-exporter is installed, grab stats about the local
    # machine by default.
    static_configs:
      - targets: ['localhost:9100']

  - job_name: <AnyName>
    static_configs:
      - targets: ['localhost:9101']

2.3 Run prometheus

systemctl daemon-reload
systemctl start prometheus
systemctl enable prometheus

Setup Node Exporter

Next, go to the server where your Validator Node is installed and install Node Exporter

3 Download node_exporter

wget 
https://github.com/prometheus/node_exporter/releases/download/v1.8.0/node_exporter-1.8.0.linux-amd64.tar.gz

3.2 Extract Node Exporter

tar -xvzf node_exporter-1.8.0.linux-amd64.tar.gz

3.3 Move the extracted directory to the /etc/prometheus/

mv node_exporter-1.8.0.linux-amd64 /etc/prometheus/node_exporter

3.4 Set proper ownership

sudo chown -R prometheus:prometheus /etc/prometheus/node_exporter

3.5 Create a systemd service file

sudo tee /etc/systemd/system/node_exporter.service > /dev/null <<EOF
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
ExecStart=/etc/prometheus/node_exporter/node_exporter

[Install]
WantedBy=default.target
EOF

3.6 Run Node exporter

systemctl daemon-reload
systemctl start node_exporter
systemctl enable node_exporter

Dashboard setting up

Now go to grafana.net → dashboard → import dashboard → import your desired dashboard + you can import exporter dashboard with the detailed server info for example 11074.

Also in that dashboard you can add any statistic about your node which was collected by prometheus.

Useful commands:

Check status

systemctl status prometheus
systemctl status node_exporter

Stop prometheus and exporter

systemctl stop prometheus && systemctl disable prometheus
systemctl stop node_exporter && systemctl disable node_exporter

That’s all you need to monitor your node - please remember that alerts is really important part as well since need to instantly react on what’s happening on logs. I hope that guide was helpful for you to understand what tools do you need to be aware of your validator health. Enjoy your day!

Last updated