Personal Update: Going full time on Grafana

It’s been a year since I started at Raintank (soon to be renamed to GrafanaLabs). During that time, I have worked on lots of interesting stuff. Some highlights:

I’ve learnt a lot. The biggest change being the switch from Windows to Linux. I have worked on projects in Node and Go, both new languages for me. It’s been intensive but now it is going to get even more intensive!

As one of my colleagues in the Stockholm office, Carl, is gone on parental leave for a few months, we decided that now was a good time for me to start focusing on Grafana full time.

It’s a bit mad starting to work on Grafana due to the sheer volume of issues, comments on issues, PR’s, questions on the forum, questions on StackOverflow, questions on our public Slack channel, comments on Twitter and questions on the #grafana channel on IRC. Torkel even answers questions on Google+!

It’s very obvious from the outside that Grafana is a really popular project. It has just under 15 000 stars on GitHub and a look at the GitHub pulse shows a lot of activity – during the last month there have been 58 active pull requests and 252 active issues.

GitHubPulse

But what those stats do not show is the number of comments on issues, a lot of them on closed issues. Grafana currently has 900 open issues (570 of those are feature requests) but if you count closed issues and pull requests then there are more than 7000. Carl and Torkel have closed tons of issues and pull requests over the last 12 months but they have also answered tons of follow up questions on closed issues. Since I started writing this blog post a few minutes ago, 12 13 14 notifications from GitHub for Grafana issues have landed in my gmail.

It’s crazy that just two full time people have kept up this furious tempo. You’re machines, Torkel and Carl!

The Grafana community is still growing rapidly and it is noticeable that merging pull requests and answering issues generates more pull requests and feature requests. Hopefully we’ll be growing our team soon so that we can work on more of those 500+ feature requests! The future looks very exciting (and busy).

Graphite and Grafana – How to calculate Percentage of Total/Percent Distribution

When working with Grafana and Graphite, it is quite common that I need to calculate the percentage of a total from Graphite time series. There are a few variations on this that are solved in different ways.

SingleStat

diskusagesinglestat

With the SingleStat panel in Grafana, you need to reduce a time series down to one number. For example, to calculate the available memory percentage for a group of servers we need to sum all available memory for all servers and sum total memory for all servers and then divide the available memory total by the total memory total.

The way to do this in Grafana is to create two queries, #A for the total and #B for the subtotal and then divide #B by #A. Graphite has a function divideSeries that you can use for this. Then hide #A (you can see that is grayed out below) and use #B for the SingleStat value.

The divideSeries function can be used in a Graph panel too, as long as the divisor is a single time series (for example, it will work for the sum of all servers but not when grouped by server).

diskusagesinglestatqueries

Graph Multiple Percentage of Totals

diskusagepernodegraph

Sometimes I want to graph the percentage of total grouped by server/node e.g. disk usage percentage per server. In this case, divideSeries will not work. It cannot take multiple time series and divide them against each other (Prometheus has vector matching but Graphite does not have anything quite as smooth unfortunately). One way to solve this is to use a different graphite function called reduceSeries.

diskusagepernodegraphquery
Query to calculate subtotals for multiple time series
diskusagebynodequerysnippet
Same query – zoomed in on the end of the query

In the example, there are two values, capacity (the total) and usage (the subtotal). First, a groupByNode function is applied, this will return a list with the two values for each server (e.g. minion-2.capacity and minion-2.usage). The mapSeries and reduceSeries take this list and for each server applies the asPercent reduce function to the two values. The result is a list of percentage totals per server.

The reduceSeries function can also apply two other reduce functions: a diff function and a divide function.

Same result with the AsPercent Function

In the query above, the values (usage and capacity) are in the same namespace if that is not the case then the reduceSeries technique will be difficult or not work. Another function worth checking out is the AsPercent function which might work better in some cases. The example below uses the same two query technique that we used for divideSeries but it works with multiple time series!

diskusageaspercentquery

I learned these three techniques by looking at Grafana dashboards built by some of the Graphite experts that work with me at Raintank/Grafana Labs. I did not know them before so I think they will help others too.