Graphite and Grafana – How to calculate Percentage of Total/Percent Distribution

When working with Grafana and Graphite, it is quite common that I need to calculate the percentage of a total from Graphite time series. There are a few variations on this that are solved in different ways.

SingleStat

diskusagesinglestat

With the SingleStat panel in Grafana, you need to reduce a time series down to one number. For example, to calculate the available memory percentage for a group of servers we need to sum all available memory for all servers and sum total memory for all servers and then divide the available memory total by the total memory total.

The way to do this in Grafana is to create two queries, #A for the total and #B for the subtotal and then divide #B by #A. Graphite has a function divideSeries that you can use for this. Then hide #A (you can see that is grayed out below) and use #B for the SingleStat value.

The divideSeries function can be used in a Graph panel too, as long as the divisor is a single time series (for example, it will work for the sum of all servers but not when grouped by server).

diskusagesinglestatqueries

Graph Multiple Percentage of Totals

diskusagepernodegraph

Sometimes I want to graph the percentage of total grouped by server/node e.g. disk usage percentage per server. In this case, divideSeries will not work. It cannot take multiple time series and divide them against each other (Prometheus has vector matching but Graphite does not have anything quite as smooth unfortunately). One way to solve this is to use a different graphite function called reduceSeries.

diskusagepernodegraphquery
Query to calculate subtotals for multiple time series
diskusagebynodequerysnippet
Same query – zoomed in on the end of the query

In the example, there are two values, capacity (the total) and usage (the subtotal). First, a groupByNode function is applied, this will return a list with the two values for each server (e.g. minion-2.capacity and minion-2.usage). The mapSeries and reduceSeries take this list and for each server applies the asPercent reduce function to the two values. The result is a list of percentage totals per server.

The reduceSeries function can also apply two other reduce functions: a diff function and a divide function.

Same result with the AsPercent Function

In the query above, the values (usage and capacity) are in the same namespace if that is not the case then the reduceSeries technique will be difficult or not work. Another function worth checking out is the AsPercent function which might work better in some cases. The example below uses the same two query technique that we used for divideSeries but it works with multiple time series!

diskusageaspercentquery

I learned these three techniques by looking at Grafana dashboards built by some of the Graphite experts that work with me at Raintank/Grafana Labs. I did not know them before so I think they will help others too.

Profiling Golang Programs on Kubernetes

Recently I needed to profile a Go application running inside a Kubernetes pod using net/http/pprof. I got stuck for a while trying to figure out how to copy the profile file from a pod but there is an easier way.

net/http/pprof – A Short Intro

First, a little about profiling in Go. net/http/pprof is a library for profiling live Go applications and exposing the profiling data via HTTP. If you want to profile an application, it needs to be instrumented before profiling. Here are some articles that describe that process:

Once you have instrumented your application, you just need to be able to access it from outside of a Kubernetes cluster.

Kubernetes and Pprof

The easiest way to get at the application in the pod is to use port forwarding with kubectl.

kubectl port-forward pod-123ab -n a-namespace 6060

The HTTP endpoint will now be available as a local port.

You can now generate the file for the CPU profile with curl and pipe the data to a file (7200 seconds is two hours):

curl "http://127.0.0.1:6060/debug/pprof/profile?seconds=7200" > cpu.pprof

It is also possible to send the data directly to the pprof tool. The pprof tool (not the same thing as the net/http/prof library) is a tool for generating a pdf or svg analysis from the profile data.

To save the pprof data with the pprof tool you can use the interactive mode:

go tool pprof http://localhost:8282/debug/pprof/profile

And per default the generated data will be saved as a tar file in the pprof subdirectory in your home directory. Exit interactive mode by typing exit.

Congratulations, you now have the raw profile from your application from inside a Kubernetes pod!

Some bonus information on the pprof tool

To generate an analysis, you will need the binary file for your Go application.

Here is how to pipe the profile data in directly:

go tool --pdf your-binary-file pprof http://localhost:8282/debug/pprof/profile > profile.pdf
go tool --svg your-binary-file pprof http://localhost:8282/debug/pprof/profile > profile.svg

You can do a lot more with net/http/pprof and the pprof tool.

Memory profile for in use space:

go tool --pdf your-binary-file pprof http://localhost:8282/debug/pprof/heap > in-use-space.pdf

Memory profile for allocated objects:

1. Generate the data:

go tool pprof http://localhost:8282/debug/pprof/heap

2. Exit interactive mode by typing exit.
3. Analyse the data:

go tool pprof -alloc_objects --svg your-binary-file /home/username/pprof/pprof.localhost:6060.alloc_objects.alloc_space.003.pb.gz > alloc-objects.svg

There are also switches for in use object counts (-inuse_objects) and allocated space (-alloc_space).

The pprof tool has an interactive mode that has lots of nifty functions like topn. Read more about that on the official Go blog.