jstack to the rescue

On my current assignment we have amongst other things a Web Service. We use Apache CXF 2.3.2 as Web Services framework. The number of requests has increased heavily the last couple of months and currently we have about 20 requests a second, around the clock.

We decided to move the Web Service to its own server since we predict the load will increase even more. In the move we also changed from Resin application server to the Tomcat servlet container. In front we are running an Apache HTTP server.

Very soon we understood that everything was not as it should be. In the access log for Apache we could see that all of a sudden the response times were several seconds up to a minute. But most of the time it went along as expected. Something like “stop the world” seemed to occur a couple of times each day. In scenarios like these it would be great to get a snapshot of what the code is doing when the system “freezes”. And its here that jstack comes to the rescue. To run jstack you just issue the command with the pid of the process. Look here for the documentation.

When looking at the jstack response when we had the “freeze”, we had many threads having the BLOCKED state. All looked the same:

All the threads are waiting to get a lock on 0x8a2d7c68. We probably have a synchronized code block on line 106 in the StaxInInterceptor class. Let’s have a look:

So now we understand that something is taking very long to create a XMLStreamReader object. But we skip that at the moment. We can see in the code that we only get to the synchronized code block if the factory is set. Why is the factory set then? Looking in the Spring configuration for the Web Service we see:

The interesting part here is the properties that are specified. With these properties you can control the implementation of different interfaces if you don’t want to let the running container decide for you. Previously we were running on Resin and if these properties were not specified we didn’t get the implementation that we wanted. We got com.caucho.xml.stream.XMLInputFactoryImpl for instance in the javax.xml.stream.XMLInputFactory case. But now we are running on Tomcat. Maybe we don’t need to specify the properties anymore, hence not getting into the synchronized code block? And that is just the case. Running on Tomcat will give us the com.sun.xml.internal.stream.XMLInputFactoryImpl implementation without the need to configure it. Problem solved. We are out of the synchronized code block.

So, why did it take so long to create the XMLStreamReader object? We take a look on the jstack response to see who is holding the lock on 0x8a2d7c68:

Here we see that the thread is reading from a socket. There can be a number of reasons for a socket to “die”, making the code just wait for the socket to timeout. In our case the socket timeout was 60 seconds between Apache and Tomcat. Understanding timeouts is important. In short, the socket timeout means that unless something is happening on the socket, it will end after the timeout, which means that the port will be available for reuse. If we didn’t have the timeout, eventually there would be no socket resources left on the server to handle client request to the Web Service.

To wrap up, it was because of the properties configuration we entered the synchronized code block and that stopped us from handling request for one minute when there was a problem reading from a socket. My guess for having the synchronized code block is that you could configure any implementation as javax.xml.stream.XMLInputFactory and that implementation might not be thread safe.

Conclusion is that it is the little things, that can make the big difference.

This Post Has One Comment

  1. This is an interesting analyze with a good investigation. Programming is like that, we cannot predicate every problem but we can handle them well and quickly with good methodology. The methodology is for me the best quality of a development team more than only competences because method makes you better.

Leave a Reply

Close Menu