Live Debugging

Production Debuggers - 2022 Benchmark Results - Part3

This third case is about comparing overheads of Sidekick & Rookout & agentless with active tracepoints under high load
Baris Kaya
3 mins read

Case3: Comparing Sidekick and Rookout performances with active tracepoints with 1000 hit limits under high load

Story Behind the Research

We are developing Sidekick to give developers new abilities for collecting data from their running applications. On our road to making live debugging & observability easier for developers, our performance impact is among the most questioned topics. To answer this question, we decided to make research to observe how much overhead Sidekick and its competitors Lightrun and Rookout bring to the applications.

This benchmarking research consists of 3 parts.

  1. Case 1: Passive impact under high load
  2. Case 2: Comparing Sidekick & Lightrun & Rookout load test performance with active tracepoints
  3. Case 3: Comparing Sidekick and Rookout performances with active tracepoints with 1000 hit limits under high load

All the cases will consist of load tests and we will perform these tests using JMeter. You can check it out here: https://jmeter.apache.org/

To test out the performance impact we have decided to go with the pet clinic app example developed by the Spring community itself for a fair comparison. Below you can see the system’s overall design.

https://raw.githubusercontent.com/runsidekick/sidekick-load-test/master/images/AWS-Architecture-Diagram.png?token=GHSAT0AAAAAABWWXZKAV4ANT2FNWT67ID7IYXJHBXQ

You can find our testing repo below if you want to repeat this test yourself.

Both pet clinic example and our testing repo including JMeter test JMX files can be found here:

https://github.com/runsidekick/sidekick-load-test

JMeter & Throughput Shaping Timer plugin link:

https://jmeter.apache.org/

https://jmeter-plugins.org/wiki/ThroughputShapingTimer/

Hardware info:

Check out Part-1 for hardware info:

https://medium.com/runsidekick/sidekick-blog-production-debuggers-2022-benchmark-results-part-1-ec173d0f8ccd

The case:

In this case, we are investigating the performance impact of Java Agents of production debuggers which we took in our scope with active tracepoints under load.

Agents in this case:

Note: This test does not include Lightrun due to their technical limitations.

For all agents, we will have separate EC2 instances and we will be using JMeter for all our tests.  For each case, the only dependent variable will be the agent. We are using each agent with its default settings, except hit limits. 

We will make the same amount of requests to the same endpoints for each setup and observe the impact of each agent by comparing their latencies and throughputs.

Below you can find our JMeter setup for the load test:

The pattern above(shaping-timer-load) will be repeated for each setup. You can get the .jmx file from here: https://github.com/runsidekick/sidekick-load-test/blob/master/petclinic-app/src/test/jmeter/petclinic_test_plan.jmx

For both 2 agents we will put a tracepoint at the exact same location and make the same requests:

Scenario & Limitations

This case has the same number of requests for a second as our first case, as you can see at the JMeter setup above. We’ve set hit limits to 1000 for both agents and all other settings will stay default. You can gather details about our setup and jmx files here: https://github.com/runsidekick/sidekick-load-test

Benchmark Results

Agentless: 

Statistics

Sidekick:

Statistics

Rookout:

Statistics

Results Summary:

We have made more than 1000 requests to each instance and as a result, both average number of transactions/s and avg latencies came out almost identical with each other. 

Interestingly in the case of some parameters, the agentless version came out just a little bit less performant but that’s negligable since no agents are assertive about running your applications even faster.

Observing that the least amount of Transactions/s is 7.72 and it is less than 0.39% lower than the reference value, we can see how low impact is done on the number of transactions/s made by the agents. 

Min & Median latency values of all 2 are nearly the same as the reference values.

The Max latency value of Sidekick is around 50 ms higher than the reference value and Rookout’s value is just slightly higher, investigating through the graphics and collected data shows us that those max results are just individual examples that there is no need to take them to the account.

Sidekick came out just a little bit more performant than Rookout in case of average latency but when we compare the both two’s results with the reference value, we can confidently say that overheads resulted by both Sidekick and Rookout is nearly zero and neglectable. We believe that JIT compiler’s code optimization is a key player in these results. Even though Sidekick and Rookout Java agents follow the same approach for intercepting the code at tracepoint by bytecode instrumentation and sending events asynchronously over websocket connection, we think that small details determine the winner. The micro optimizations we applied here are async snapshot taking and call stack collecting.

Conclusion:

Our third case was about comparing overheads of Sidekick & Rookout & vanilla (agentless application) with active tracepoints under high load.

In the end, we have seen that on average Sidekick & Rookout’s agents bring almost no overhead to our applications. From the first case to the end, we have tried to find out the overall state of live application debuggers in the case of performance. Results showed us that in case of performance it is safe to ship agents of Sidekick, Rookout & Lightrun with your applications, without worrying about their passive overheads and if you are planning to use these agents under higher loads Sidekick & Rookout shine out with their performance. 

Sidekick is now open source to allow self-hosting and make live debugging more accessible.

Fresh insights from Sidekick experts every week.

Get the best insights, delivered straight to your inbox.