Post-mature optimization

I leave optimizations to the end, as much as possible. Get the code working, then profile it and optimize the problem areas. It was not always thus, but obviously one of the occasions when I found myself just deleting code that I had spent hours tweaking and perfecting, the penny dropped and I went for pragmatism and patience.

Of course, it’s possible to go too far the other way, and for too long put off optimizations which are [a] blindingly obvious and [b] not that difficult to apply.

Yesterday, my attention was drawn to the performance of Simple.Web, compared to ASP.NET MVC and Web API. Mattias Nordberg, an application performance consultant working in London, sent me a pull request. Not a huge change, Github shows it at 37 additions and 18 deletions. But it increased the requests-per-second of Simple.Web from 600 to 4,500, putting it roughly on par with ASP.NET MVC and 1,000 reqs/sec slower than Web API.

I ended up in a Skype chat with Mattias and one of his colleagues, Tomas. They said they were evaluating frameworks and performance was the main issue for them. Obviously they’d taken a pretty good look at the code, run it through a profiler and so on. The other thing they were worried about, they said, was the routing implementation. That was completely understandable: I threw together the original routing code in a bit of a hurry, if I’m honest, and I’d always intended to go back and tweak it, but I wasn’t sure how much of a block it really was, and how much I could improve it.

How not to do URI routing

  1. Turn every URI template into a regular expression with captures for the parameter names.
  2. Every time a request runs, run all the regular expressions.
  3. If more than one result is found, filter based on media types.

Seriously, that’s wrong, don’t do it. It’s slow and it absolutely devours CPU time.

In the Skype chat, Mattias mentioned something called the Rete algorithm. I was not familiar with it, and he admitted that he’d only just learned about it himself. I read the Wikipedia page and it sounded similar to the “tweak” I had in mind for the routing table, so I went ahead and had a go at implementing it.

I copied the regex-based Simple.Web routing classes into a console app, set up a routing table with 4,096 entries, and timed how long it took a single thread to run 4,096 lookups: 2.8 seconds. Slow. Much slower than I’d expected, but when I thought about it, not really surprising.

A better way to do URI routing

  1. Break each URI template down by its delimiter (i.e., forward slash).
  2. Construct a tree using the unique values at each level of the templates.
  3. Populate the tree with “matcher” objects that only check the relevant part of a URL:
  4. Every time a request runs, break its URI down and run it through the tree.

I think this is at least a nod in the general direction of the Rete algorithm, although in this case it’s been heavily adapted to the specific task of matching URIs to templates. I don’t know if I’ve made it sounds more or less complicated than it is, but you can look at the code if you’re interested. (One of the things that surprised me a little was that it didn’t actually need any regular expressions at all to reproduce the existing functionality.)

Anyway, initial implementation in a spike project took under an hour, and I ran the same performance test, 4,096 entries, 4,096 lookups: 0.04 seconds. 70 times faster. I reported back to Tomas and Mattias, who immediately asked “and with 10,000 routes?” My test harness was working in multiples of eight, so I bumped it up to 32,768 routes, and doing 32,768 lookups.

The old code took 19.5 seconds.

The new code takes 0.27 seconds.

When a route matched to the last template added to the routing table, the old code took 0.2 seconds. The new code takes 0.0005 seconds.

Oh, this is all still on a single thread, of course.

I integrated this new algorithm into Simple.Web last night, and the demands of reality meant that things slowed down a little, but a single lookup against a 32,768 route table still only takes a shade under 1.5 milliseconds, on an Ultrabook with a ULV Core i7 CPU.

What a difference a day makes

In summary, then, when I woke up yesterday, Simple.Web was an epic performance failure, managing 600 reqs/sec. Out of the blue I get a pull request which increases that almost by a factor of 8. Inspired, I finally get round to fixing the other performance bottleneck, with a speed boost somewhere between one and two orders of magnitude depending on how many routes you have. End result, it’s probably a good 10 times faster than it was yesterday, and it won’t slow down as the number of routes increases.

I’ve got some profiling software installed now, and weighttp, and I plan to run a few more tests and see if I can identify any other hideous bottlenecks that can be optimized without too much effort. I want to be faster than Web API.

I’ll be pushing new packages to NuGet in a day or two, but if you want the CI builds, you can add to your package sources and install them direct from the TeamCity server.

Share on facebook
Share on google
Share on twitter
Share on linkedin


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.