MetricHill: Centralizing Netdata
Netdata: The New Kid!
Netdata is great tool to see the realtime metrics of your system or systems. There are some great benefits of using netdata likeā¦
Itās extremely no nonsense setup procedure
Itās out-of-the-box monitoring presets
Itās ability to stay low on resources
Itās great documentation wiki
Extensible with plugins
Itās active and vibrant community
⦠but with a pinch of saltā¦
There is no centralized look at all the Netdata nodes
Brendan Gregg takes dig at it on Hacker News
It misses overlays, making difficult run live comparisons
Very long configuration file(8000 lines!!!)
Setting the perspective
I am most interested in getting the first pinch fixed at the moment.
When I say āNo centralized look at the Netdata nodesā, I mean that netdata does not offer out-of-box mechanism to discover which nodes in your infrastructure have netdata installed and how they could be individually addressable. There is of course the Netdata registry, but with serious drawbacks when nodes are across mulitple subnets.
Let me set the perspective of this writing. Say I have node w1, i want it to be addressable using some internal domain say http://w1.netdata.monitor.zapped.pigs, in other words I would want the w1 to publish itself to central system so its netdata dashboard becomes accessible though that URI. Also it would be better for me to have central dashboard provide me with the list all registered nodes and their netdata accessible URIs. Even better if it allowed dynamic registration of these URI routes.
Gorouter It
Though netdata can be setup behind a proxy like, Nginx, Apache2 or Caddy, but these are not robust, for dynamic registration in an environment where there are microservices or instances coming up, and going. My work with Cloudfoundry led me to look at Gorouter, which is an effective http(tcp) traffic router, that would allow you to add service endpoints at will.
A jist of how Gorouter works. Gorouter works along with GNatsd, and it subscribes to messages on the GNatsd for nodes publishing their accessible URI(in this case w1.netdata.monitor.zapped.pigs, where w1 is a hostname) and their ip address, and port(netdata port). Once the Gorouter picks this message it updates its registry with the new node details its URI(w1.netdata.monitor.zapped.pigs). As you can see in the following diagram.
Gorouter sits behind a proxy like Nginx which serves say, *.netdata.monitor.zapped.pigs data, and any w1.netdata.monitor.zapped.pigs is sent to it(Gorouter) by the Nginx proxy. Now when Gorouter receives the w1.netdata.monitor.zapped.pigs through the Nginx proxy, it just routes the traffic to w1 nodeās ip netdata port, which it picks from its registry. As you can see in the following diagram.
Advantage Gorouter
The design of Gorouter makes it most suited for Dynamic environments where service nodes come up and go. Every node requires to ping its status to the Gorouter publishing its availability, otherwise the Gorouter will remove the node details from its registry, after a defined TTL(usually 2 minutes).
The Admin API of Gorouter(usually on 8082), which is protected by some basic authentication, allows users to check the health of Gorouter, get runtime variables and most importantly the routes. The /routes is very helpful to see what netdata nodes are registered and URIs they are reachable at.
Scalablility is one of the features of the Gorouter, as you could run multiple Gorouter instances behind the an Nginx proxy, allowing it load balance between the configured Gorouters. This capability is enabled by running Gnatsd on a independepent instance, providing each Gorouter to pick the routes from Gnatsd queue, and update their registry. Though this article will not detail the scalable setup; but it is pretty obvious.
Play it!!!
Having understood the theoretical premise of what and how we are to achieve a centralized netdata, it is worth taking it for a ride. The resources(self contained binaries) for the setup are availalble here, at the resources directory. Now for the setup, we would require the following birnariesā¦
gorouter
gnatsd
nats-pub
ā¦from that directory. This setup exercise is completely manual to understand the nuances. And all commands here are run on a Ubuntu 16.04 server. And we would require two machinesā¦
One: Running Nginx, Gorouter and Gnatsd
Two: Node which publishes and is monitored
Supervised Gorouter
Always using process a manager to manage long running process is an ideal mechanism. For this setup we would be using supervisord, to manage both Gorouter and Gnatsd, on the same machine. Lets first download these resources on to the machine which would run Gorouter.
$ # Download Gorouter $ sudo wget -O /usr/bin/gorouter https://github.com/samof76/writtings/blob/master/metric_hill_netdata_part1/resources/gorouter?raw=true $ sudo chmod +x /usr/bin/gorouter $ # Download Gnatsd $ sudo wget -O /usr/bin/gnatsd https://github.com/samof76/writtings/blob/master/metric_hill_netdata_part1/resources/gorouter?raw=true $ sudo chmod +x /usr/bin/gnatsd
Now we have Gorouter and Gnatsd in place on our machine, so off to setting up supervisord. Supervisord is program that manage multiple coworking or disparate long running processes, that you would like to put into the background. Supervisord also provide a control center(CLI-based) called supervisorctl, that allows you to manage the processes. Supervisord installation is simple.
$ sudo apt-get install python-pip python-setuptools $ sudo pip install supervisor
To make supervisor aware of running both Gorouter and Gnatsd, we have to create supervisord.conf file, and place it in /etc, which is one of the locations where supervisord will pick up the configuration from. Our supervisord.conf looks like the following.
[unix_http_server] file=/var/run/supervisor.sock [supervisord] logfile=/var/log/supervisord logfile_maxbytes=50MB logfile_backups=10 loglevel=info pidfile=/var/run/supervisord.pid nodaemon=false minfds=1024 minprocs=200 [supervisorctl] serverurl=unix:///var/run/supervisor.sock ; use a unix:// URL for a unix socket [rpcinterface:supervisor] supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface [program:gnatsd] command=/usr/bin/gnatsd priority=400 stdout_logfile=/var/log/gorouter/access.log stdout_logfile_maxbytes=10MB stdout_logfile_backups=10 stderr_logfile=/var/log/gorouter/error.log stderr_logfile_maxbytes=10MB stderr_logfile_backups=10 [program:gorouter] command=/usr/bin/gorouter -c /etc/gorouter.yml priority=500 stdout_logfile=/var/log/gnatsd/access.log stdout_logfile_maxbytes=10MB stdout_logfile_backups=10 stderr_logfile=/var/log/gnatsd/error.log stderr_logfile_maxbytes=10MB stderr_logfile_backups=10
Notice /etc/gorouter.yml, this is just a basic Gorouter configuration file, that looks like this.
status: port: 8082 user: admin pass: 5tr0ngp@55w0rd nats: - host: "localhost" port: 4222 user: pass: port: 8081 index: 0 go_max_procs: 5
The Gorouter configuration, is actually not needed, yet needs to created if you have specified -c option on your /etc/supervisord.conf file. Also note the priority in the supervisord configuration, this actually ensures that gorouter is launched after gnatsd, which is the idea.
Now we can start the supervisord daemon.
$ sudo supervisord -c /etc/supervisord.conf
This will start both gnatsd and gorouter, ensure they are start but making curl to the Gorouterās admin API.
$ curl http://admin:5tr0ngp@55w0rd@localhost:8082/healthz
This should return ok. That means that you are all set to register your netdata URI. But one pit stop to setup Nginx proxy.
Proxy thru Nginx
Remember our second diagram, we have to route *.netdata.monitor.zapped.pigs to gorouter. This is where that happens. We setup Nginx on our machine(node One)ā¦
$ sudo apt-get install nginx
We would do that on the same server thats running gnatsd and gorouter(feel free to run it on a different server). Here is the configuration,
upstream gorouter { # the netdata server server 127.0.0.1:8081; keepalive 64; } server { # nginx listens to this listen 80; # the virtual host name of this server_name *.netdata.monitor.zapped.pigs; location / { proxy_set_header X-Forwarded-Host $host; proxy_set_header X-Forwarded-Server $host; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_pass http://gorouter; proxy_http_version 1.1; proxy_pass_request_headers on; proxy_set_header Connection "keep-alive"; proxy_store off; } }
We take that and create /etc/nginx/sites-available/star_netdata_monitor_zapped_pigs, and then symlink to it with /etc/nginx/sites-enabled/star_netdata_monitor_zapped_pigs. Then we restart the nginx service.
$ sudo service nginx restart
All weāve left to do is point our domain *.netdata.monitor.zapped.pigs to our server on the DNS. One more thing we could do is point admin.netdata.monitor.zapped.pigs, to admin API of the Gorouter, that running on the serverās localhost:8082. So we create /etc/nginx/sites-available/admin_netdata_monitor_zapped_pigs, and symlink to it with /etc/nginx/sites-enabled/admin_netdata_monitor_zapped_pigs. Here is its configuration.
upstream gorouter-admin { # the netdata server server 127.0.0.1:8082; keepalive 64; } server { # nginx listens to this listen 80; # the virtual host name of this server_name admin.netdata.monitor.zapped.pigs; location / { proxy_set_header X-Forwarded-Host $host; proxy_set_header X-Forwarded-Server $host; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_pass http://gorouter-admin; proxy_http_version 1.1; proxy_pass_request_headers on; proxy_set_header Connection "keep-alive"; proxy_store off; } }
Now we restart the nginx service again. Now, we should able be to login from the browser into http://admin.netdata.monitor.zapped.pigs/healthz, using the username-password combination as provided in the gorouterās yml.
Time to Monitor
This is where we select a server(node, Two) to setup Netdata and register that server with, gorouter. This going to be our w1 host. Netdata installation is as mentioned here. But for this articleās sake I would explain it here, as well. First lets get done with the dependencies.
$ sudo apt-get install zlib1g-dev uuid-dev \ libmnl-dev gcc make git autoconf autoconf-archive \ autogen automake pkg-config curl
Lets us then checkout the latest version of the Nedata. And then source install it.
$ git clone https://github.com/firehol/netdata.git --depth=1 /tmp/netdata $ cd /tmp/netdata $ sudo ./netdata-installer
Once installed it would part of the services, which would allow us to start and stop at will. To ensure that netdata is running, lets us restart(or start) the service.
$ sudo service netdata restart
Let us register our newly installed service(running on port 19999) with the gorouter, this is done using nats-pub. Since gorouter has a TTL set to 120s, it is necessary to have nats-pub publish its host details to gorouter every minute, to keep the entries alive. So we setup nats-pub as a cron to run every minute. First we download nats-pub.
$ sudo wget -O /usr/bin/nats-pub https://github.com/samof76/writtings/blob/master/metric_hill_netdata_part1/resources/nats-pub?raw=true $ sudo chmod +x /usr/bin/nats-pub
Then we setup the nats-pubās publish cron like the following.
* * * * * /usr/bin/nats-pub -s 172.16.30.23 'router.register' '{\"host\":\"172.16.30.20\",\"port\":19999,\"uris\":[\"w1.netdata.monitor.zapped.pigs\"],\"tags\":{\"name\":\"w1\",\"type\":\"webserver\"}}'
The above command, would publish the nodeās ip(172.16.30.20), the netdata port(19999), and the addressable uri w1.netdata.monitor.zapped.pigs on the gnatsd service(172.16.30.23), every minute, this is picked up the gorouter.
Done!!!
Now we are done; we should be able to address the netdata dashboard for our w1 node at http://w1.netdata.monitor.zapped.pigs.
And we would be able to have a look at all the netdata addressable URIs at http://admin.netdata.monitor.zapped.pigs/routes. This way we have kind of a centrally addressable location for all our nodes monitored by netdata, and also through this mechanism we could dynamically add routes(or nodes).
Next Up???
You would already be thinking it would nice if there was some automation around this? Yes, the automation is coming in the article. Also you might have noticed the publishing of tags, from nats-pub, this is a useful feature for consolidating dashboards based on tags, which too will be covered, in the future.












