Burgin's Data Thing @burginsdatathing - Tumblr Blog

Hadoopi 1.2 Now With Wires, Metrics and Graphs

Hadoopi has been updated and now has wired networking (for improved performance and reliability) plus the addition of metrics collection with Prometheus and visualisation of those metrics in Grafana dashboards.

#hadoop #rapberrypi #grafana #prometheus

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Visualising IOT Data on a Pi Cluster using Mesos, Spark & Kafka

The sensorpi repo on GitHub holds various ramblings, scripts and code put together for and experiment to visualise realtime sensor data processed on a cluster of Raspberry Pis. Not dissimilar to https://www.circuits.dk/datalogger-example-using-sense-hat-influxdb-grafana/ but using the features of the cluster for near realtime processing.

But first a few caveats, unlike my Hadoopi project as this is an experiment there isn’t chef code to setup and configure the cluster of Raspberry Pis (you’ll need to do this by hand). Originally this project was intended to implement a SMACK (Spark, Mesos, Akka, Cassandra and Kafka) stack. You’ll see the end result is more of a SMKS stack (Spark, Mesos, Kafka and Scala) acting as a transfer mechanism between two Pis for capturing sensor mnetrics and visualsation of the collected data. On the sensor side a Pi zero is using an EnviroPhat pushing data to Kafka via Python. On the visualisation side there is an influxdb instance and grafana server to store and serve a realtime dashboard of the data.

Despite all of those caveats, I learned a tonne about running a Mesos on a cluster (on tin), writing Scala code, building it, Spark streaming, IOT sensors, Influxdb TICK stack and Grafana dashboards. So if you want to play along I expect you’ll learn all those things too, so it’s a case of manual setup, please make sure your command line Fu is cranked up to 11!

Running Hue on a Raspberry Pi Hadoop Cluster

Hadoopi - the Raspberry Pi Hadoop cluster

This project contains the configuration files and chef code to configure a cluster of five Raspberry Pi 3s as a working Hadoop running Hue.

This video shows how to set up and configure the cluster using this code.

#hadoop #hue #raspberrypi #chef

Time Travel With Logfiles

One of the challenges of playing with log data is you rarely have a fresh supply of a high "production" volume logs to hand, especially in a test/development environment.

#logs #logfiles #log files

Experiment 2 - Vagrant Chef setup for Beaver, Logstash and Kibana

I was never completely happy with how I'd left Experiment 2, there were too many "ignore this error" and "as a work around you'll need to...", I'm please to say these are now gone.

#logstash #elasticsearch #kibana #chef #Vagrant

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Experiment 2 - Running Logstash 1.3.1

So I've tweaked the config as per...

filters: [ { condition: 'if [type] == "apache-access"', block: { ... } } ]

TODO There are still a couple of things I'd like to update...

geoip filter so I can use the Kibana maps feature.

Omnibus installer for vagrant instead of my inline hack.

fix those annoying apache test errors.

add an apache-error pattern so the fields are tokenised.

Play along at home I’ve updated the vagrantfiles and berkshelf files on my github account

https://github.com/andyburgin/burginsdatathing/tree/master/experiment02

The readme.md gives full instructions on how to get the servers up and running

#logstash #kibana

es2gexf - extract logstash and elasticsearch data to a gexf file

In my last post http://data.andyburgin.co.uk/post/65706647269/visualising-logstash-apache-data-in-gephi I generated some pretty visualisations of apache webserver logs by extracting data from my logstash elasticsearch server using a python script.

You can now grab the script from https://github.com/andyburgin/es2gefx

#python #datavisualisation #logstash #elasticsearch #gephi

Visualising Logstash Apache data in Gephi

I've been wanting to create a data visualisation in Gephi for a while, so using the stack I've built in Experiments one and two I made this...

Look Shinys!

#datavisualisation #logstash #gephi #python

Experiment 2 - How to secure Beaver

Just to recap, I chose Beaver to ship my logs as it's written in python which comes preinstalled on all the cloud servers I use, it has a small runtime footprint and using the logstash shipper would require java to be installed and consume a not insignificant chunk of memory. Another advantage here is Beaver can create an ssh tunnel to the logstash server and make the redis server look like it's local. So you have the data encrypted as it transports it and Beaver will reconnect automatically if the tunnel breaks.

In the below instructions I'm going to make modifications to our wordpress demo, this is just for illustration, if you get it all up and running then when you relaunch the vm with either "vagrant up" or "vagrant provision" it will overwrite the changes. Unfortunately at present there's nothing in the logstash cookbook to configure the beaver ssh related options, so see this as more of a proof of concept than how to rollout for production.

On the Logstash Server

Firstly add a user and generate a key and store it

sudo adduser beaver sudo su - beaver ssh-keygen -t dsa cp .ssh/id_dsa.pub .ssh/authorized_keys

On the Wordpress Server

Copy the id_dsa to the wordpress server, store it and make it accessible to the logstash user

copy the id_dsa file to /opt/logstash/beaver/etc/id_rsa chown logstash /opt/logstash/beaver/etc/id_rsa chmod 700 /opt/logstash/beaver/etc/id_rsa

Now just once connect manually via ssh to make sure the ssh fingerprint has been accepted for the logstash user

su logstash ssh -i /opt/logstash/beaver/etc/id_rsa [email protected]

..enter yes when prompted

Next we need to get Beaver to make and use the tunnel, so edit /opt/logstash/beaver/etc/beaver.conf

[beaver] redis_namespace: logstash redis_url: redis://localhost:6380/0 transport: redis ssh_key_file: /opt/logstash/beaver/etc/id_rsa ssh_tunnel: [email protected] ssh_tunnel_port: 6380 ssh_remote_host: 192.168.2.220 ssh_remote_port: 6379

...etc...

Finally bounce beaver

service logstash_beaver stop service logstash_beaver start

Wrapping Up

You will now have what looks like a redis server running on localhost on port 6380 - this is infact a tunnel over to port 6379 on 192.168.2.220

That's it, just a shame this is just a temporary config due to cookbook limitations.

#devops #beaver #redis #logstash

Experiment 2 - Now with better logstash handling

Firstly the syslog messages are just stored as a single text message, this needs to be tokenised, so lets add a new filter to the logstash config in the Vagrantfile:

{grok: { type: "syslog", pattern: "%{SYSLOGBASE} %{GREEDYDATA:message}" }},

...also I wouldn't sleep well at night if I hadn't made you at least think about security...

# please customise these rules as per your requirements admin=192.168.2.10,user=192.168.2.58,wordpressdemo=192.168.2.222 firewall: { rules: [ {http192168210: { port: "80", source: "192.168.2.10", action: "allow"}}, {http192168258: { port: "80", source: "192.168.2.58", action: "allow"}}, {ssh192168210: { port: "22", source: "192.168.2.10", action: "allow"}}, {ssh1921682222: { port: "22", source: "192.168.2.222", action: "allow"}}, {es192168210: { port: "9200", source: "192.168.2.10", action: "allow"}}, {es192168258: { port: "9200", source: "192.168.2.58", action: "allow"}}, {redis1921682222: { port: "6379", source: "192.168.2.222", action: "allow"}}, ], }

Note that by default the ufw cookbook I am using is opening port 22 to the world, you may want to change this by forking the cookbook and removing that code.

I did try and get the geoip filter working, but despite best efforts no luck, if the logstash cookbook gets upgraded to support version 1.2.x I suspect that will be quite easy.

In latest rollout it looks like the apache2 cookbook is throwing some errors relating to the security conf having the wrong server tokens, I believe this is a fault with the cookbook.

Play along at home

I’ve updated the vagrantfiles and berkshelf files on my github account

https://github.com/andyburgin/burginsdatathing/tree/master/experiment02

The readme.md gives full instructions on how to get the servers up and running.

#logstash #syslog #ufw #firewall #vagrant #chef

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Experiment 2 - Graphite and a new Kibana

In experiment one I used chef to setup a logstash server which threw incoming log data from a wordpress stack into elasticsearch. This was then interrogated and analysed using the wonderful Kibana interface.

I'd be stretching the term "visualising the data" with Kibana, as pretty and functional as it is, it's job isn't to create informative and immersive graphs of the data. To see data trends I could use something like a centralised Munin server, but that's not going to give me information about the logfiles, just performance metrics and doesn't make use of all the wonderful log data sat in elasticsearch.

#chef devops vagrant graphite logstash kibana elasticsearch virtualbox github

Experiment 1 - Vagrant, Chef, Elasticsearch, Logstash and Kibana

Time to roll my sleeves up and get stuck in with making something. Part of my day job involves being a sysadmin, looking after servers, deploying releases of code and on the odd occasion troubleshooting. It dawned on me that the servers are constantly keeping track of events on the system and applications, by generating logfiles, basically that's a lot of data when you have a few servers. What I'm doing in this experiment is setting up a centralised logging server and using a web frontend to view and interrogate the collated logs. I'm going to use the devops notion of configuration as code by using chef to provision servers.

The initial idea here was to spin up a virtual machine using Vagrant and provision it using Chef solo to install Logstash. Next I'd create another VM (again using Vagrant and Chef solo) to run apache and send it's logs to the first VM. It seemed fairly simple in principle, the Chef cookbook for logstash already existed and the "stack" is established and described in the accompanying docs http://logstash.net/docs/1.1.13/tutorials/getting-started-centralized

#chef #devops #vagrant #elasticsearch #logstash #redis #virtualbox #github

I’m a qualified data analyst

As the new year started I decided I'd learn something new, I've always been interested in "data" and decided it was time I learnt something about "Data Analysis".

I'd already discovered a really good course over on coursera for "Social Network Analysis", I'd been looking for a more "structured" way to learn about Gephi (which is awesome check out http://gephi.org/) and this fitted the bill - https://www.coursera.org/course/sna . But I started the course as it was about to finish, so although I worked my way through the 6 weeks of lectures using Gephi and NetLogo I never got the qualification as I didn't have time.

#coursera #sna #dataanalysis #r #gephi #sysadmin

Let's start at the beginning, a very good place to start

I'm putting this blog together for a number of reasons...

Learn lots of new stuff

Provide a record of who,what,why,when,where

Let whoever is interested play along at home

#about #devops #dayjob

Trending Blogs

Last Seen Blogs

Burgin's Data Thing