Yet Another Akka Benchmark

I have exercised my Scala and Akka skills by creating a sample application in the trading domain. I have experience from developing trading systems so even though the sample is simplified it is rooted in real world architecture.
I have found it interesting to compare performance of different technical solutions with this sample application. Therefore, several concrete implementations of the trading system are implemented and benchmarked against each other.
  1. Ordinary synchronous method invocations.
  2. Scala Actors
  3. Akka Actors

I can tell you right away that the performance of Akka Actors is outstanding compared to Scala Actors.

Before looking at the benchmark I will briefly describe the sample application.

A trading system is essentially about matching buy and sell orders. A limit order is an order to buy a security at not more, or sell at not less, than a specific price. For example, if an investor wants to buy a stock, but doesn’t want to pay more than $20 for it, the investor can place a limit order to buy the stock at $20 “or better”. There are many other types of orders and special constraints. The sample is only handling plain limit orders.

Orders that are away from the current best price in the market are collected in an order book for the security, for later execution.

A matching engine manages one or more order books, i.e. the marketplace is sharded by order book. The matching engines holds current state of the order books. Clients connect to an order receiver service, which is responsible for routing the order to the correct matching engine. The order receiver is stateless, and the clients can use any order receiver independent of order book.

For redundancy, the matching engines work in pairs. Each order is processed by both matching engines. The order is also stored in a persistent transaction log, by both matching engines. In a real setup the primary and standby matching engines are typically deployed in separated data centers.

Now, over to the benchmark. The test scenario put buy and sell orders in 15 order books, divided in 3 matching engines. The orders are at different price levels, so an order book depth is built up, but in the end all orders are traded and that is verified by the JUnit test running the benchmark.

The scenario was run at different load, by varying the number of simulated clients from 1 to 40.

The benchmark results illustrated here were performed on a real 8 core box (dual cpu Xeon 5500 machine, 2.26 Ghz per core).

Here are the result of processing 750000 orders at each load level.

The Basic solution uses ordinary synchronous method invocations. It is extremely fast, but not an option for a true scalable solution. Asynchronous message passing is a better alternative for scaling out on multi-core or multiple nodes.

In the Scala and Akka Actors solutions the clients send each order message to an order receiver and waits on the response Future (!? operator in Scala and !! in Akka). The order receiver forwards the request to the matching engine responsible for the order book, i.e. the order receiver thread/dispatcher can immediately be used for next request. The matching engine sends the order message to the standby and both matching engines process the matching logic and transaction logging in parallel. Acknowledgment is replied to client when both are done.

The benchmark results shows that Akka Actors are able to process three times as many orders compared to Scala Actors at the the same load. Similar result with latency. The latency of Akka Actors is one third of Scala Actors. This holds for low load also. Average latency is not always the best measure, so let us look at some percentiles.

Operations that are waiting for a Future to complete has been used when sending messages. This has a scalability price tag, since the thread is blocked while waiting for the Future to complete. Better scalability can be achieved with one-way message passing, which is illustrated by the Scala/Akka Actor one-way solutions. It uses bang operation (!) for sending of all messages.

The matching engines writes each order to a transaction log file. This is a blocking IO bottleneck. To push the test of message passing one step further the benchmark has also been run without the transaction log. The Akka solution shines even more. More than three times higher transaction rate compared to Scala Actors at the the same load, when using solution based on sending messages and waiting for reply. For the one-way message passing solutions the Akka Actors are two times faster than Scala Actors.

Akka has great flexibility when it comes to specification of different dispatching mechanisms. The Akka Actor hawt is included in the benchmark as a comparison with the Akka Actor one-way solution. It uses the HawtDispatch threading library which is a Java clone of libdispatch. The last test without transaction log shows that HawtDispatcher has slightly better performance than the event-based dispatcher that has been used for Akka Actor one-way.

The complete source code for the sample application is located at: http://github.com/patriknw/akka-sample-trading

To run the benchmark yourself you can download and unzip the distribution, containing all needed jar files. Included README describes how to run the tests.

Update note Aug 15: Added Scala Actor one-way solution and new description of how to run the benchmark.

Update note Aug 22: New benchmark run on real 8 core box.

This Post Has 5 Comments

  1. Good stuff… I just quickly flipped through some of the code for “regular” Scala actors and a couple things jumped out at me. The first is that rather than using the default ForkJoinPool you have customized your schedulers a fair amount. There are plenty of good reasons to do that, but I’m wondering why you did? Event-based (react) Scala Actors do suffer a performance penalty when they are not run on the ForkJoinPool. The second is that you’re using loop/react. I’d suggest that instead of using loop you make a recursive call to the function containing react. In my experience that’s substantially faster.

    But it’s good to see someone doing real benchmarks and posting the code. Keep it up!

  2. Thank for the feedback. Initially I had problems running everything on my laptop with default thread pool (starvation), but now I think I understand why, so it might be good to go back to default thread pool for some actors.

    My initial tests of default thread pool and also receive show that it is only a small improvement. I will test more and probably use your suggestions as default. Thanks again.

  3. Thanks for this benchmark but … who care about 40 concurrent clients ? 40 concurrent clients can be easily implemented with a one thread per client model.
    Why not run this benchmark with 40.000 clients ? That would show real scalability problem (starved actors ?) and highlight akka’s overhead at switching actors.

  4. Would you please give us the source code ?
    It disappears from githup
    thx

Leave a Reply

Close Menu