Graphite Series #5: Stress Testing Carbon Caches
In the Graphite Series blog posts, I'll provide a guide to help through all of the steps involved in setting up a monitoring and alerting system using a Graphite stack. Disclaimer: I am no expert, I am just trying to help the Graphite community by providing more detailed documentation. If there's something wrong, please comment below or drop me an email at feangulo@yaipan.com.
In this blog post, I'm going to be stress testing the carbon processes. This involves publishing a great deal of metrics in a very short amount of time.
Increase the Number of Creates Per Minute Threshold
The carbon-cache configuration limits the number of Whisper file creations per minute. Open the configuration file and change this configuration setting.
# vi /opt/graphite/conf/carbon.conf # Softly limits the number of whisper files that get created each minute. # Setting this value low (like at 50) is a good way to ensure your graphite # system will not be adversely impacted when a bunch of new metrics are # sent to it. The trade off is that it will take much longer for those metrics' # database files to all get created and thus longer until the data becomes usable. # Setting this value high (like "inf" for infinity) will cause graphite to create # the files quickly but at the risk of slowing I/O down considerably for a while. MAX_CREATES_PER_MINUTE = 50
Since we are about to publish a bunch of metrics, let's modify this part of the configuration and set the value to 10,000.
MAX_CREATES_PER_MINUTE = 10000
NOTE: You need to restart your carbon cache process for these changes to take effect.
The "Stresser"
I have a Stresser application that will publish a fixed amount of metrics to the carbon-cache regularly. It uses the Coda Hale metrics library internally and more specifically, it uses a Timer object to collect data.
- Github project: https://github.com/feangulo/graphite-stresser
The Stresser receives a couple of parameters:
- Graphite host: in our case, the server housing our carbon cache
- Graphite port: in our case, the carbon cache port
- Number of hosts: to simulate publishing from
- Number of timers: each timer generates 15 distinct metrics
- Publishing interval
- Debug mode: true/false - logs the metrics that are published
You can run the Stresser using the following command:
java -jar stresser.jar localhost 2003 1 128 10 false
The Coda Hale metrics library generates 15 distinct metrics per timer:
# ls 0/com/graphite/stresser/feg/ total 300 r--r--. 1 root root 17308 Jun 4 11:22 count.wsp r--r--. 1 root root 17308 Jun 4 11:22 m15_rate.wsp r--r--. 1 root root 17308 Jun 4 11:22 m1_rate.wsp r--r--. 1 root root 17308 Jun 4 11:22 m5_rate.wsp r--r--. 1 root root 17308 Jun 4 11:22 max.wsp r--r--. 1 root root 17308 Jun 4 11:22 mean_rate.wsp r--r--. 1 root root 17308 Jun 4 11:22 mean.wsp r--r--. 1 root root 17308 Jun 4 11:22 min.wsp r--r--. 1 root root 17308 Jun 4 11:22 p50.wsp r--r--. 1 root root 17308 Jun 4 11:22 p75.wsp r--r--. 1 root root 17308 Jun 4 11:22 p95.wsp r--r--. 1 root root 17308 Jun 4 11:22 p98.wsp r--r--. 1 root root 17308 Jun 4 11:22 p999.wsp r--r--. 1 root root 17308 Jun 4 11:22 p99.wsp r--r--. 1 root root 17308 Jun 4 11:22 stddev.wsp
Using the command line above, with 128 timers specified, the number of distinct metrics published to the carbon-cache will be 1920 (128 * 15). Open the dashboard that we built as part of the previous blog post. You will be able to see details about the number of create operations performed by the carbon cache. I started the Stresser before the MAX_CREATES_PER_MINUTE configuration change was made. While it was running, I made the configuration change and restarted the carbon-cache process. The dashboard graph for create operations shows what happened. A few create operations were performed at a rate of 10 per minute before the configuration change. After the restart, close to 1800 create operations occurred during one minute because the threshold was set to a high value of 10,000.
You can also test the behavior by spinning up the carbon cache with the creates threshold set to 10,000 and then kick off the Stresser. You would see all 1920 metrics created during the same minute.
Finally, the total number of metrics published every 10 seconds is 1920. Carbon process metrics are reported every 60 seconds. Therefore, the total number of metrics received by the carbon cache every minute would be 11,520 (1920 * 6).
@ 11K metrics received per minute
Now that we have a way to publish metrics to the carbon cache and visualize it's behavior, we can push it to the limit. As the metrics are getting published, run the iostat command to measure the I/O performance as the carbon cache is reading from and writing to the disk.
I have configured the Stresser to do the following:
- Number of timers: 128
- Publishing interval: 10 seconds
- Total metrics published every 10 seconds: 1920 (128 * 15)
- Total metrics published per minute: 11,520 (1920 * 6)
java -jar stresser.jar localhost 2003 1 128 10 false Initializing 128 timers - publishing 1920 metrics every 10 seconds from 1 host(s)
# iostat -x 10 Linux 2.6.32-431.11.2.el6.x86_64 (ip-10-43-138-169) 06/04/2014 _x86_64_ (4 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.50 0.00 0.38 0.35 0.00 98.77 Device: await svctm %util xvdap1 5.28 0.32 0.26 xvdap1 6.56 0.39 0.16 xvdap1 40.25 0.33 3.62 xvdap1 37.94 0.31 3.13 xvdap1 6.39 0.54 0.44
@ 23K metrics received per minute
We can now increase the number of metrics that are reported by the Stresser and see how it affects I/O performance.
- Number of timers: 256
- Publishing interval: 10 seconds
- Total metrics published every 10 seconds: 3,840 (256 * 15)
- Total metrics published per minute: 23,040 (3,840 * 6)
java -jar stresser.jar localhost 2003 1 256 10 false Initializing 256 timers - publishing 3840 metrics every 10 seconds from 1 host(s)
# iostat -x 10 Linux 2.6.32-431.11.2.el6.x86_64 (ip-10-43-138-169) 06/04/2014 _x86_64_ (4 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.08 0.00 0.08 0.04 0.00 99.80 Device: await svctm %util xvdap1 51.48 0.38 6.97 xvdap1 48.06 0.38 6.96 xvdap1 21.10 0.39 1.30 xvdap1 45.21 0.58 1.48 xvdap1 60.48 0.49 13.19
It is also important to note that there isn't much CPU activity at this point.
@ 175K metrics received per minute
Now let's increase the metric publishing rates considerably to see how our setup holds up. I'll use the following Stresser configuration:
- Number of timers: 1956
- Publishing interval: 10 seconds
- Total metrics published every 10 seconds: 29,340 (1,956 * 15)
- Total metrics published per minute: 176,040 (29,340 * 6)
java -jar stresser.jar localhost 2003 1 1956 10 false Initializing 1956 timers - publishing 29340 metrics every 10 seconds from 1 host(s)
# iostat -x 10 Linux 2.6.32-431.11.2.el6.x86_64 (ip-10-43-138-169) 06/04/2014 _x86_64_ (4 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.08 0.00 0.08 0.04 0.00 99.80 Device: await svctm %util xvdap1 94.20 0.67 40.70 xvdap1 50.60 0.38 11.84 xvdap1 59.21 0.42 23.53 xvdap1 53.77 0.39 25.62 xvdap1 45.52 0.34 12.75
CPU usage and memory usage increased slightly. Notice that the cache sizes and cache queues also started to grow after the metric publishing rate was bumped.
@ 440K metrics received per minute
Use the following Stresser configuration:
- Number of timers: 4887
- Publishing interval: 10 seconds
- Total metrics published every 10 seconds: 73,305 (4,887 * 15)
- Total metrics published per minute: 439,830 (73,305 * 6)
java -jar stresser.jar localhost 2003 1 4887 10 false Initializing 4887 timers - publishing 73305 metrics every 10 seconds from 1 host(s)
# iostat -x 10 Linux 2.6.32-431.11.2.el6.x86_64 (ip-10-43-138-169) 06/04/2014 _x86_64_ (4 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.08 0.00 0.08 0.04 0.00 99.80 Device: await svctm %util xvdap1 98.76 0.74 44.29 xvdap1 83.43 0.62 22.61 xvdap1 82.47 0.62 33.51 xvdap1 123.14 0.88 52.00 xvdap1 106.97 0.75 45.58
Compared to the previous test, memory consumption increased from 500MB to 900 MB and the CPU activity increased slightly. If you stare at the htop output, you'll see a spike every 10 seconds (the metric publishing rate) for one of the cores.
@ 700K metrics received per minute
Use the following Stresser configuration:
- Number of timers: 7824
- Publishing interval: 10 seconds
- Total metrics published every 10 seconds: 117,360 (7,824 * 15)
- Total metrics published per minute: 704,160 (117,360 * 6)
java -jar stresser.jar localhost 2003 1 7824 10 false Initializing 7824 timers - publishing 117360 metrics every 10 seconds from 1 host(s)
# iostat -x 10 Linux 2.6.32-431.11.2.el6.x86_64 (ip-10-43-138-169) 06/05/2014 _x86_64_ (4 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 12.93 0.00 5.53 6.40 0.49 74.65 Device: await svctm %util xvdap1 241.67 1.73 58.34 xvdap1 263.33 1.83 100.00 xvdap1 256.01 1.88 54.32 xvdap1 228.32 1.63 100.00 xvdap1 474.27 3.20 100.00
It took a good 30 minutes for the carbon cache to create all of the new metrics. Compared to the previous test, there is some more memory usage and CPU utilization. However, the I/O performance had the biggest impact with the await column reporting much larger values and the % utilization showing 100% at multiple points. The cache queues were high but remained constant and the cache size incremented until all metrics had been created. Afterwards it flat-lined at around 1.5M.
What is Carbon doing?
Carbon uses a strategy of dynamically buffering as many data points as necessary to sustain a rate of incoming data points that may exceed the rate of I/O operations your storage can keep up with. In the scenarios above we published between 11K and 700K metrics to the Carbon cache every minute. In a more simplistic scenario, let's say our Stresser application was publishing the following:
- Total metrics published every minute: 60,000 distinct metrics
Let's also say that each of the 60,000 different metrics has a single data point every minute. Since a metric is represented as a single Whisper file in the filesystem, Carbon would need to perform 60,000 write operations every minute - one for each Whisper file. To handle this rate of metric publishing, Carbon would need to write to one file in less than one millisecond. At this day in age, this is not a problem.
However, if the metric publishing rate increases to very high levels - like in our scenario with 700K metrics published every minute - the rate of incoming data points exceeds the rate at which they can be written to disk. This can also happen if the disks in the Graphite server aren't SSDs and have lower speeds.
Seek Times
With hundreds of thousands of Whisper files to be written to frequently with ~12 or so bytes of data each, the disks are most definitely going to spend most of their time performing seeks.
Write Multiple Datapoints
Let's assume that the write rate for disks isn't huge and that it can't be increased substantially, even if we buy the fastest SSDs out there. The only way to make writes faster would be to do less of them, and we can do less of them if we buffer multiple related data points and write them to their corresponding Whisper file using a single write operation. Whisper arranges consecutive data points contiguously on disk.
The Update Many Function
Chris Davis decided to modify Whisper and add the update_many function. It takes a list of data points for a single metric and compacts contiguous data points into a single write operation. Even though this makes the write larger, the difference in time it takes to write ten data points (~120 bytes) versus one data point (~12 bytes) is negligible. It would take quite a few more datapoints before the size of each write starts to noticeably affect the latency.
Buffers
To be able to use the update_many function in Whisper, a buffering mechanism was implemented in Carbon. Each incoming data point gets mapped to a queue based on its metric name and is then appended to that queue - by the Receiver thread. Another thread - the Writer thread - repeatedly iterates through all of the queues and for each one it pulls all of the data points out and writes them to the appropriate Whisper file using the update_many function. If the rate of incoming data points exceeds the rate of write operations, queues will end up holding more data points. The only resource this costs the server is memory and it's acceptable because each data point is only a few bytes.
Extra Benefit: Resiliency
A nice advantage of this approach is that it adds a degree of resiliency to handle temporary I/O slowdowns. If the system needs to do other I/O not related to Graphite, then it is likely that the rate of write operations will decrease. If this happens the Carbon queues will simply grow. The larger the queues, the larger the writes. Since the overall throughput of data points is equal to the rate of write operations times the average size of each write, Carbon should be able to keep up as long as there is enough memory for the queues.
Real-Time Graphs
If Carbon maintains queues of data points in memory waiting to be written to disk, it means that those data points will not show up in your graphite-webapp graphs, right? Wrong!
A socket listener was added to Carbon that provides a query interface for accessing the buffered data points. The graphite-webapp makes use of this query interface each time it needs to retrieve data. The graphite-webapp then combines the data points it retrieves from Carbon with the data points it retrieved from the Whisper files on disk. Therefore, your graphite-webapp graphs will be real-time! In general, data points are instantly accessible once they are received by Carbon.