Live Debugging

How to Test in Production with Live Debugging

How do you test a system built with third-party services? You can’t replicate the entire architecture on your local machine, so what’s the alternative?
Yasin Kalafat
3 mins read

Anyone who has recently tried implementing a testing strategy for their software projects knows that cloud services don’t make this task any easier. 

Using an external service to do the heavy lifting for authentication or machine learning is immensely helpful; it saves lots of time and money. It has enabled teams of just a few people to build systems that once required the work of hundreds of developers. (To be fair, it still does, but most of these developers don’t need to be directly employed by your organization.)

The use of third-party services in software development is fast evolving, breaking molds, and constantly disrupting the market.

Once the honeymoon phase of your software service ends, customers will likely expect better quality. And this is where testing becomes essential.

So how do you test a system built with third-party services? You can’t replicate the entire architecture on your local machine, so what’s the alternative?

Can You Test in the Production Environment?

If you can’t recreate the production environment locally or in a development account of your cloud provider, there is still one option left: using the actual production environment your customers are accessing.

But how can testing in production work? Thankfully, it is possible to test in the production environment—and there are several ways to test your deployment with real traffic. To understand what they are and where they originate, we should first define what we mean by the production environment. 

What Is a Production Environment?

We tend to assume everyone has the same idea in mind when talking about production environments, but this isn’t necessarily the case. So let’s first define what we mean by the production environment. 

A production environment is the part of the software development process that directly affects end users. It has three phases: 

  1. Deployment phase: New code or config versions are moved to the production servers the customer is using.
  2. Release phase: User-facing changes are activated in the latest deployed version.
  3. Post-release phase: The latest deployed version is tweaked to ensure everything works smoothly.

Testing in Different Development Phases

Now that it’s clear what we mean by production environment, we can look into the different testing methods available to us.

Every testing method for a production environment has different goals. If at least one of those is aligned with your own goals, then testing in production is the right option for you. 

The later the phase, the more significant the impact an error will have on user experience. At the same time, each of these phases allows for multiple different types of testing, which we explore below.

1. Deployment Phase

Two popular techniques for testing in the deployment phase are tap compare and shadowing. They both work by sending production traffic—either live or recorded—to a new deployment. To do so, you usually keep the old deployment around and only switch to the latest version once the test succeeds.

The difference between these two methods is that tap compare allows you to compare the responses of the new and the old versions and see if they are different in any meaningful way. With shadowing, you just route production traffic to the latest version to see if it can handle it without throwing errors.

If your goal is to take a new version with real data and see if it crashes or how it behaves compared to a previous version, you should test in production using the above methods. It’s the surest way to cover real-life scenarios and gain confidence that your product can deliver. 

Using Sidekick, you can get more out of both methods by setting tracepoints at lines of code that may be of interest, e.g., for debugging purposes. Sidekick will then gather debug information when sending traffic to the new version.

2. Release Phase

This is when you activate new features of the freshly deployed version. The idea is to split deployment and release into two phases, separating the technical process of deploying software from the business process of releasing new features. This testing technique is called canarying, and it allows you to test new features with a small section of your user base.

While in the previous phase, the goal was to see how the system would manage historical traffic; canarying entails testing how the app performs when users access new features. If you’re concerned that making changes to your software could lead to performance issues, you should test in production using this technique.

Live debugging with Sidekick can give you better insights into the testing process. Start by activating a new feature for 10% of your users, check for errors with an APM tool or the error tracker of your choice, and then scatter tracepoints on the lines of code you’ve changed to extract additional debug information.

You can gradually add more and more users until the feature is available to all of them, then move to the next phase.

3. Post-Release Phase

If everything appears to be working in the final production phase, you can now shift your focus to maintenance. You can combine any of the following three methods with live debugging to elevate your testing. 

Feature Flags

Feature flags are routinely used in the release phase to enable new features. You can also use them to disable these features in the post-release phase if, for instance, you’ve found anomalies soon after release.

This is where you can set tracepoints with Sidekick to debug an issue. Sidekick lets you enable and disable features to locate the root of a problem and analyze live debugging data to fix it faster.

A/B Testing

If you aren’t sure how to implement a new feature, A/B testing is an excellent way to find out. It allows you to test two different feature versions by introducing each to a separate user group.

If one version of a feature is more successful than the other, you can activate the preferred version for all of your users.

If the results of the A/B testing are inconclusive, live debugging can provide the information you need to make a decision.


Teeing is similar to both tap compare and shadowing: You replay traffic to a specific version of your system. The difference is that with teeing, you do it explicitly for debugging purposes.

If an error occurs, request the user’s permission to record traffic. Next, fire up Sidekick and set tracepoints, then replay the recorded traffic over and over until you find the source of the problem.

Debugging can also be enhanced by distributed tracing. It shows you not only what’s going on in the code of one service but also which services overall were part of the user interaction.

Should You Test in the Production Environment? 

As we’ve seen, the production environment is not just a single monolithic concept in the software lifecycle; it comprises different phases that are equally important. Each phase invites a different way of testing, with distinct goals. What these different tests have in common is that they are best done in an actual deployment with real services.

To understand whether you should test in production, examine your goals. If you want to maintain software quality but can’t recreate the whole stack, testing in production is the right option for you. 


Testing can be done in any three production phases—deployment, release, or post-release. Each type of production test has its own purpose, but they all benefit from the live debugging features Sidekick provides.

Tracepoints allow you to capture the state of your systems at any point in time, and together with distributed tracing, they reveal what other services were part of the debugging request.

Sidekick is an excellent addition to APM or error trackers because it provides more profound insights into the state of your tracked system—and it works within your IDE. This way, with live debugging, you are on familiar ground (thanks to tracepoints) despite the system now running remotely.

Elevate your testing with live debugging. Get started with Sidekick.

Fresh insights from Sidekick experts every week.

Get the best insights, delivered straight to your inbox.