StatsD is a daemon that receives and aggregates metrics data. Cal Henderson wrote the original version of StatsD at flickr. There have been various implementations after since. Ian Malpass posted an article bout Etsy’s version of StatsD in 2011. By far it’s probably the most popular implementation.
Etsy’s StatsD is implemented in NodeJS. It listens on a UDP port, processes the incoming metrics data, and flushes the data to the backend.
A simple use case of StatD could be:
The application uses StatsD client library to send metrics data to StatsD. StatsD listens to a UDP port and receives the metric data for a period of time, e.g. 10 seconds. It summarizes the data and flushes to the backend. The backend could be your database, files, or even console.
How do I start?
Etsy describes the backend interfaces in the project’s github repository. The best way to understand StatsD backend interface is to play with the reference backends provided by StatsD. There are more backends available in NPM. StatsD provides three backends, console, graphite, and repeater, nonetheless, you might want to change the format of the output data to match your own data model, or store the data to a different storage. This article shows you how to create a new StatsD backend in three steps.
Step 1: Prepare your environment
Install nodejs:
# apt-get install nodejs npm
Install StatsD:
# git clone https://github.com/etsy/statsd.git # cd statsd; cp exampleConfig.js config.js
Edit config.js
{ port: 8125 , backends: [ "./backends/console" ] }
Start StatsD
# node stats.js ./config.js
You should be able to see the following messages in your console now.
Step 2: Code structure and template backend
StatsD’s backend path is in statsd/node_modules. So first create a node-modules in your StatsD folder. Next, create a folder for your module, dummy-console in my example. You can arrange your source code however you want. I prefer to put my source code under lib, and test cases under test. My code structure would be like:
statsd/node_modules/dummy-console/ |— lib/ |— — index.js |— test/ |— — test_case1.js |— package.json
You might want to package and publish to your backend later on. So let's use npm to create the package description file.
cd dummpu-console; npm init
Follow the instructions and fill in the information. Make sure you enter the correct entry point. In my example, it would be:
Main module/entry point: (none) lib/index.js
Here is a template for your index.js. It does nothing but logging “Hello dummy?!” to the console for each flush event.
function dummyConsole(startupTime, config, emitter){ var self = this; emitter.on('flush', function(timestamp, metrics) { self.flush(timestamp, metrics); }); emitter.on('status', function(callback) { self.status(callback); }); } dummyConsole.prototype.flush = function(timestamp, metrics) { console.log("Hello dummy?!"); } dummyConsole.prototype.status = function(write) { } exports.init = function(startupTime, config, events) { var instance = new dummyConsole(startupTime, config, events); return true; };
Let’s try it out. Edit your statsd/config.js. Add the following configuration:
{ port: 8125 , backends: [ "dummy-console" ] }
Restart your StatsD you should see the following message:
Step 3: Process incoming data
Incoming data are time-series data. StatsD aggregates data in the flush interval and send the data to the backed with two parameters, timestamp, and metrics. The representation of metrics is:
metrics: { counters: counters, gauges: gauges, timers: timers, sets: sets, counter_rates: counter_rates, timer_data: timer_data, statsd_metrics: statsd_metrics, pctThreshold: pctThreshold }
A real metrics would look like below in a real world.
{ counters: { 'statsd.bad_lines_seen': 0, 'statsd.packets_received': 581, 'my.test.1': 2805 }, gauges: { 'my.test.1': 4, 'statsd.timestamp_lag': 0 }, timers: { 'my.test.1': [ 0, 0, 0, 0, 0, 0, 1, 1, 1, 1 ] }, timer_counters: { 'my.test.1': 10 }, sets: {}, counter_rates: { 'statsd.bad_lines_seen': 0, 'statsd.packets_received': 58.1, 'my.test.1': 280.5 }, timer_data: { 'my.test.1': { count_90: 9, mean_90: 0.3333333333333333, upper_90: 1, sum_90: 3, sum_squares_90: 3, std: 0.4898979485566356, upper: 1, lower: 0, count: 10, count_ps: 1, sum: 4, sum_squares: 4, mean: 0.4, median: 0 } }, pctThreshold: [ 90 ], histogram: undefined, statsd_metrics: { processing_time: 0 } }
You can rearrange the data according to your backend data model. For example, to print all counters in the format of “name.counter = value”, you can iterate the counters in the flush function as the followings.
dummyConsole.prototype.flush = function(timestamp, metrics) { var counters = metrics.counters; for (var key in counters) { console.log(key +".counter = ", counters[key]); } }
Restart StatsD and you shall see:
Voila! You have your first StatsD backend that can hanle counters !