Graphite Series #5: Stress Testing Carbon Caches

In the Graphite Series blog posts, I'll provide a guide to help through all of the steps involved in setting up a monitoring and alerting system using a Graphite stack. Disclaimer: I am no expert, I am just trying to help the Graphite community by providing more detailed documentation. If there's something wrong, please comment below or drop me an email at feangulo@yaipan.com.


In this blog post, I'm going to be stress testing the carbon processes. This involves publishing a great deal of metrics in a very short amount of time.

Increase the Number of Creates Per Minute Threshold

 The carbon-cache configuration limits the number of Whisper file creations per minute. Open the configuration file and change this configuration setting.

# vi /opt/graphite/conf/carbon.conf

# Softly limits the number of whisper files that get created each minute.
# Setting this value low (like at 50) is a good way to ensure your graphite
# system will not be adversely impacted when a bunch of new metrics are
# sent to it. The trade off is that it will take much longer for those metrics'
# database files to all get created and thus longer until the data becomes usable.
# Setting this value high (like "inf" for infinity) will cause graphite to create
# the files quickly but at the risk of slowing I/O down considerably for a while.
MAX_CREATES_PER_MINUTE = 50

Since we are about to publish a bunch of metrics, let's modify this part of the configuration and set the value to 10,000.

MAX_CREATES_PER_MINUTE = 10000

NOTE: You need to restart your carbon cache process for these changes to take effect.

The "Stresser"

I have a Stresser application that will publish a fixed amount of metrics to the carbon-cache regularly. It uses the Coda Hale metrics library internally and more specifically, it uses a Timer object to collect data.

The Stresser receives a couple of parameters:

  • Graphite host: in our case, the server housing our carbon cache
  • Graphite port: in our case, the carbon cache port
  • Number of hosts: to simulate publishing from
  • Number of timers: each timer generates 15 distinct metrics 
  • Publishing interval
  • Debug mode: true/false - logs the metrics that are published

You can run the Stresser using the following command:

java -jar stresser.jar localhost 2003 1 128 10 false

The Coda Hale metrics library generates 15 distinct metrics per timer:

# ls -l /opt/graphite/storage/whisper/STRESS/host/ip-0/com/graphite/stresser/feg/
total 300
-rw-r--r--. 1 root root 17308 Jun  4 11:22 count.wsp
-rw-r--r--. 1 root root 17308 Jun  4 11:22 m15_rate.wsp
-rw-r--r--. 1 root root 17308 Jun  4 11:22 m1_rate.wsp
-rw-r--r--. 1 root root 17308 Jun  4 11:22 m5_rate.wsp
-rw-r--r--. 1 root root 17308 Jun  4 11:22 max.wsp
-rw-r--r--. 1 root root 17308 Jun  4 11:22 mean_rate.wsp
-rw-r--r--. 1 root root 17308 Jun  4 11:22 mean.wsp
-rw-r--r--. 1 root root 17308 Jun  4 11:22 min.wsp
-rw-r--r--. 1 root root 17308 Jun  4 11:22 p50.wsp
-rw-r--r--. 1 root root 17308 Jun  4 11:22 p75.wsp
-rw-r--r--. 1 root root 17308 Jun  4 11:22 p95.wsp
-rw-r--r--. 1 root root 17308 Jun  4 11:22 p98.wsp
-rw-r--r--. 1 root root 17308 Jun  4 11:22 p999.wsp
-rw-r--r--. 1 root root 17308 Jun  4 11:22 p99.wsp
-rw-r--r--. 1 root root 17308 Jun  4 11:22 stddev.wsp

Using the command line above, with 128 timers specified, the number of distinct metrics published to the carbon-cache will be 1920 (128 * 15). Open the dashboard that we built as part of the previous blog post. You will be able to see details about the number of create operations performed by the carbon cache. I started the Stresser before the MAX_CREATES_PER_MINUTE configuration change was made. While it was running, I made the configuration change and restarted the carbon-cache process. The dashboard graph for create operations shows what happened. A few create operations were performed at a rate of 10 per minute before the configuration change. After the restart, close to 1800 create operations occurred during one minute because the threshold was set to a high value of 10,000.

You can also test the behavior by spinning up the carbon cache with the creates threshold set to 10,000 and then kick off the Stresser. You would see all 1920 metrics created during the same minute.

Finally, the total number of metrics published every 10 seconds is 1920. Carbon process metrics are reported every 60 seconds. Therefore, the total number of metrics received by the carbon cache every minute would be 11,520 (1920 * 6).

@ 11K metrics received per minute

Now that we have a way to publish metrics to the carbon cache and visualize it's behavior, we can push it to the limit. As the metrics are getting published, run the iostat command to measure the I/O performance as the carbon cache is reading from and writing to the disk.

I have configured the Stresser to do the following:

  • Number of timers: 128
  • Publishing interval: 10 seconds
  • Total metrics published every 10 seconds: 1920 (128 * 15) 
  • Total metrics published per minute: 11,520 (1920 * 6)
java -jar stresser.jar localhost 2003 1 128 10 false
Initializing 128 timers - publishing 1920 metrics every 10 seconds from 1 host(s)
# iostat -x 10
Linux 2.6.32-431.11.2.el6.x86_64 (ip-10-43-138-169)     06/04/2014  _x86_64_    (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.50    0.00    0.38    0.35    0.00   98.77

Device:         await  svctm  %util
xvdap1          5.28   0.32   0.26
xvdap1          6.56   0.39   0.16
xvdap1          40.25  0.33   3.62
xvdap1          37.94  0.31   3.13
xvdap1          6.39   0.54   0.44

@ 23K metrics received per minute

We can now increase the number of metrics that are reported by the Stresser and see how it affects I/O performance.

  • Number of timers: 256
  • Publishing interval: 10 seconds
  • Total metrics published every 10 seconds: 3,840 (256 * 15)
  • Total metrics published per minute: 23,040 (3,840 * 6)
java -jar stresser.jar localhost 2003 1 256 10 false
Initializing 256 timers - publishing 3840 metrics every 10 seconds from 1 host(s)
# iostat -x 10
Linux 2.6.32-431.11.2.el6.x86_64 (ip-10-43-138-169)     06/04/2014  _x86_64_    (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.08    0.00    0.08    0.04    0.00   99.80

Device:         await  svctm  %util
xvdap1          51.48   0.38   6.97
xvdap1          48.06   0.38   6.96
xvdap1          21.10   0.39   1.30
xvdap1          45.21   0.58   1.48
xvdap1          60.48   0.49  13.19

It is also important to note that there isn't much CPU activity at this point.

@ 175K metrics received per minute

Now let's increase the metric publishing rates considerably to see how our setup holds up. I'll use the following Stresser configuration:

  • Number of timers: 1956
  • Publishing interval: 10 seconds
  • Total metrics published every 10 seconds: 29,340 (1,956 * 15)
  • Total metrics published per minute: 176,040 (29,340 * 6)
java -jar stresser.jar localhost 2003 1 1956 10 false
Initializing 1956 timers - publishing 29340 metrics every 10 seconds from 1 host(s)
# iostat -x 10
Linux 2.6.32-431.11.2.el6.x86_64 (ip-10-43-138-169)     06/04/2014  _x86_64_    (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.08    0.00    0.08    0.04    0.00   99.80

Device:         await  svctm  %util
xvdap1          94.20   0.67  40.70
xvdap1          50.60   0.38  11.84
xvdap1          59.21   0.42  23.53
xvdap1          53.77   0.39  25.62
xvdap1          45.52   0.34  12.75

CPU usage and memory usage increased slightly. Notice that the cache sizes and cache queues also started to grow after the metric publishing rate was bumped.

Cache size grew but remains steady

Cache queues grew but remains steady

@ 440K metrics received per minute

Use the following Stresser configuration:

  • Number of timers: 4887
  • Publishing interval: 10 seconds
  • Total metrics published every 10 seconds: 73,305 (4,887 * 15)
  • Total metrics published per minute: 439,830 (73,305 * 6)
java -jar stresser.jar localhost 2003 1 4887 10 false
Initializing 4887 timers - publishing 73305 metrics every 10 seconds from 1 host(s)
# iostat -x 10
Linux 2.6.32-431.11.2.el6.x86_64 (ip-10-43-138-169)     06/04/2014  _x86_64_    (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.08    0.00    0.08    0.04    0.00   99.80

Device:         await  svctm  %util
xvdap1          98.76   0.74  44.29
xvdap1          83.43   0.62  22.61
xvdap1          82.47   0.62  33.51
xvdap1          123.14  0.88  52.00
xvdap1          106.97  0.75  45.58

Compared to the previous test, memory consumption increased from 500MB to 900 MB and the CPU activity increased slightly. If you stare at the htop output, you'll see a spike every 10 seconds (the metric publishing rate) for one of the cores.

@ 700K metrics received per minute

Use the following Stresser configuration:

  • Number of timers: 7824
  • Publishing interval: 10 seconds
  • Total metrics published every 10 seconds: 117,360 (7,824 * 15)
  • Total metrics published per minute: 704,160 (117,360 * 6)
java -jar stresser.jar localhost 2003 1 7824 10 false
Initializing 7824 timers - publishing 117360 metrics every 10 seconds from 1 host(s)
# iostat -x 10
Linux 2.6.32-431.11.2.el6.x86_64 (ip-10-43-138-169)     06/05/2014  _x86_64_    (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.93    0.00    5.53    6.40    0.49   74.65

Device:         await  svctm  %util
xvdap1          241.67 1.73   58.34
xvdap1          263.33 1.83   100.00
xvdap1          256.01 1.88   54.32
xvdap1          228.32 1.63   100.00
xvdap1          474.27 3.20   100.00

It took a good 30 minutes for the carbon cache to create all of the new metrics. Compared to the previous test, there is some more memory usage and CPU utilization. However, the I/O performance had the biggest impact with the await column reporting much larger values and the % utilization showing 100% at multiple points. The cache queues were high but remained constant and the cache size incremented until all metrics had been created. Afterwards it flat-lined at around 1.5M.

What is Carbon doing?

Carbon uses a strategy of dynamically buffering as many data points as necessary to sustain a rate of incoming data points that may exceed the rate of I/O operations your storage can keep up with. In the scenarios above we published between 11K and 700K metrics to the Carbon cache every minute. In a more simplistic scenario, let's say our Stresser application was publishing the following:

  • Total metrics published every minute: 60,000 distinct metrics

Let's also say that each of the 60,000 different metrics has a single data point every minute. Since a metric is represented as a single Whisper file in the filesystem, Carbon would need to perform 60,000 write operations every minute - one for each Whisper file. To handle this rate of metric publishing, Carbon would need to write to one file in less than one millisecond. At this day in age, this is not a problem.

However, if the metric publishing rate increases to very high levels - like in our scenario with 700K metrics published every minute - the rate of incoming data points exceeds the rate at which they can be written to disk. This can also happen if the disks in the Graphite server aren't SSDs and have lower speeds.

Seek Times

With hundreds of thousands of Whisper files to be written to frequently with ~12 or so bytes of data each, the disks are most definitely going to spend most of their time performing seeks.

Write Multiple Datapoints

Let's assume that the write rate for disks isn't huge and that it can't be increased substantially, even if we buy the fastest SSDs out there. The only way to make writes faster would be to do less of them, and we can do less of them if we buffer multiple related data points and write them to their corresponding Whisper file using a single write operation. Whisper arranges consecutive data points contiguously on disk.

The Update Many Function

Chris Davis decided to modify Whisper and add the update_many function. It takes a list of data points for a single metric and compacts contiguous data points into a single write operation. Even though this makes the write larger, the difference in time it takes to write ten data points (~120 bytes) versus one data point (~12 bytes) is negligible. It would take quite a few more datapoints before the size of each write starts to noticeably affect the latency.

Buffers

To be able to use the update_many function in Whisper, a buffering mechanism was implemented in Carbon. Each incoming data point gets mapped to a queue based on its metric name and is then appended to that queue - by the Receiver thread. Another thread - the Writer thread - repeatedly iterates through all of the queues and for each one it pulls all of the data points out and writes them to the appropriate Whisper file using the update_many function. If the rate of incoming data points exceeds the rate of write operations, queues will end up holding more data points. The only resource this costs the server is memory and it's acceptable because each data point is only a few bytes.

Extra Benefit: Resiliency

A nice advantage of this approach is that it adds a degree of resiliency to handle temporary I/O slowdowns. If the system needs to do other I/O not related to Graphite, then it is likely that the rate of write operations will decrease. If this happens the Carbon queues will simply grow. The larger the queues, the larger the writes. Since the overall throughput of data points is equal to the rate of write operations times the average size of each write, Carbon should be able to keep up as long as there is enough memory for the queues.

Real-Time Graphs

If Carbon maintains queues of data points in memory waiting to be written to disk, it means that those data points will not show up in your graphite-webapp graphs, right? Wrong!

A socket listener was added to Carbon that provides a query interface for accessing the buffered data points. The graphite-webapp makes use of this query interface each time it needs to retrieve data. The graphite-webapp then combines the data points it retrieves from Carbon with the data points it retrieved from the Whisper files on disk. Therefore, your graphite-webapp graphs will be real-time! In general, data points are instantly accessible once they are received by Carbon.