Live Debugging

Production Debuggers - 2022 Benchmark Results - Part2

In part2 of this research, we compare Sidekick, Lightrun, and Rookout with active tracepoints under technical limitations
Baris Kaya
4 mins read

Case 2: Comparing Sidekick & Lightrun & Rookout with active tracepoints(w/ tech limitations)

Story Behind the Research

We are developing Sidekick to give developers new abilities for collecting data from their running applications. On our road to making live debugging & observability easier for developers, our performance impact is among the most questioned topics. To answer this question, we decided to make research to observe how much overhead Sidekick and its competitors Lightrun and Rookout bring to the applications.

This benchmarking research consists of 3 parts.

  1. Case1: Passive impact under high load
  2. Case2: Comparing Sidekick & Lightrun & Rookout load test performance with active tracepoints
  3. Case3: Comparing Sidekick and Rookout performances with active tracepoints with 1000 hit limits under high load

All the cases will consist of load tests and we will perform these tests using JMeter. You check it out here:

To test out the performance impact we have decided to go with the pet clinic app example served by the Spring community itself for a fair comparison. Below you can see the system’s overall design.

You can find our testing repo below if you want to repeat this test yourself. Both pet clinic example and our testing repo including JMeter test JMX files can be found here:

JMeter & Throughput Shaping Timer plugin link:

Hardware info:

Check out Part-1 for hardware info:

The case:

In this case, we are investigating the performance impact of Java Agents of production debuggers which we took in our scope with active tracepoints under load.

Agents in this case:

For all agents we will have separate EC2 instances and we will be using JMeter for all our tests. For each case, the only dependent variable will be the agent. We are using each agent with its default settings, except hit limits. For a fair comparison, we set hit limits to 50 for all of them.

We will make same amount of requests to the same endpoints for each setup and observe the impact of each agent via comparing their latencies and throughputs.

Below you can find our JMeter setup for the load test:

The pattern above(shaping-tp) will be repeated for each setup. You can download the .jmx file from here:

For both 3 agents we will put a tracepoint at the exact same location and make the same requests:

Scenario & Limitations

This case has lower requests for a second than our first case, so it has a different JMeter setup as you can see above. This is due to the technical difficulty we have faced with Lightrun’s agent. Since Lightrun’s Java Agent only allows us to have a maximum hit limit of 50, we have decided to run this test with limitations and lowered the request numbers to see how each agent performs in the same scenario. For a more realistic scenario with a higher number of requests per second, we have prepared a 3rd case which you can find in part 3 of this benchmark.

Benchmark Results









Results Summary:

Firstly, we accept that these load tests are done with a distinctly low number of requests per second, and this is due to Lightrun’s hit limit, which is only 50 by default. We decided to make a third case with a higher number of requests per second with only Sidekick and Rookout agents.

Observing that the least amount of Transactions/s is 3.01 and it is only 0.02% lower than the reference value, we can see how low impact is done on the number of transactions/s made by the agents. 

Min & Median latency values of all 3 are nearly the same as the reference values.

The Max latency value of all Sidekick is almost identical to the reference value and Rookout’s value is just slightly higher. 

Both Sidekick & Rookout’s Avg. latency is about 3 ms higher than the reference and considering that there are just a few samples at these benchmarks, values can be even closer (when JIT compiler kicks in to profile and optimize the code) to each other in the long run. 

In the case of Lightrun’s agent, Max latency is unexpectedly high at 857ms and the Avg. latency is similarly high. It won’t be fair to make the final judgment with this benchmark but it seems like Lightrun’s agent can result in at least 3-4 times extra latency with the default settings.


Our second case was about comparing overheads of Sidekick & Lightrun & Rookout & vanilla (agentless application) with active tracepoints.

As we have mentioned above we have run this test with limitations and lowered the request numbers due to Lightrun’s Java Agent’s maximum hit limit. We do not claim that this benchmark covers any real-life scenario, but since each test is done under the same conditions it can still give some ideas about the 3 agents' performance.

In the end, we have seen that on average Sidekick & Rookout’s agents bring an extra 3 ms to our applications. In our next case we will learn more about this extra overhead in the long run. Lightrun’s results were relatively high but it can still be a useful tool for use in a development environment.

In the last piece of this series, we will go one step further and implement a real life scenario on the same setup and examine the performance overheads.
Sidekick is open source to allow self-hosting and make live debugging more accessible.

Fresh insights from Sidekick experts every week.

Get the best insights, delivered straight to your inbox.