Debugging your production bugs with Deja vu

Working for Jayway I know that when you have an itch you have to scratch. Lately I’ve been having an itch: why is it that when you have a bug in your production system, that you have to spend a lot of brain power to try to re-create what really happened based on a stack trace in the log file?

The execution path of a program is undecidable and the same is true for backtracing: given a point in the code and the stack, it is undecidable what execution path let you there. If a bug occurs in production the best way to fix it, is to create a test locally that re-produces the error. With this test we have the benefit of a deterministic execution path that highlights the problem.

Now we are back to my itch: why do I have to do all the work of re-producing the bug in a test? Why don’t I get such a test setup from my execution environment when a bug occurs?

I decided to start an experimental framework that could soothe my itch. When a bug occurs the framework should generate a “structure” that makes it possible to re-run the “scenario” in a sandboxed environment. And what other name to given it than Deja vu? It is open source and written in Java (Java is my favorite language when it comes to framework writing).

Déjà vu

The framework is as non-invasive as possible. It requires some xml configuration, initialization code, and AspectJ available. Using Deja vu can then be done by adding annotations in your code base:

  • @Traced: This annotation can be added to methods. It means every call to that method is traced by Deja vu. For the trace to work correctly the input parameter of this method must not be mutated (this is not enforced by the framework so be careful).
  • @Impure: All non-deterministic-, randomized-, or externally dependent-calls must be done in methods marked with the annotation Impure. Everything outputted from such a method must also be immutable – otherwise a re-run of the trace will not produce the same result as the original.

Upon every finished call to a method marked with the Traced annotation the framework will pass the trace (and optional exception) to a callback implemented in your application. Based on this trace a so called DejaVuTrace can now be created and executed. The re-run of the trace will have the exact same execution path as the original. The only missing points in the code are the Impure-marked methods.

Suppose a class A has a method B which is annotated Traced. The framework initiates a re-run by first instantiating A (assuming default constructor), and hence invokes B on A with input arguments as given by the trace.

Code Example

The following illutrates the core principles of the framework (the full example can be found in the code repository).

The randomized and non-reproducable parts of the code is “hidden” behind Impures.

To setup the tracing framework one has to provide an implementation of TraceCallback that will be called after each completion of invocations to Traced methods.

The log shows it clearly:

Note that a real setup would probably serialize the trace and persist it. It could then later be run (and debugged) in a development environment.

By following this principle we have actually promoted bug reports (i.e. failed traces) to first class citizens among other domain entities. That is a fundamental shift in how to view an application, because now it contains the history of the application. In my current project I use the tracing on all use cases of the application. This makes it easier for the development team to communicate to the managers: How often does bug occur for a use case? How fast do we fix bugs? What is the cost of maintaining this use case? Is the expenses of pushing a “half baked” use case to production higher than that of a well tested one?

Performance

Since the framework uses AspectJ, once the code weaving is done (once for each affected class), there is virtually no invocation overhead. However, the immutability requirement means an overhead in replicating data when mutation is required. The actual trace is a list of pointers to the immutable input values, so the real performance penalty only occurs when serialization of a trace is needed.

This post is part of a series about the framework Deja vu. Next post is about generating test code: JSON marshaller for Deja vu

Leave a Reply

Close Menu