Monday, 22 February 2010

Performance matters

Over the Christmas holidays, I wrote a small benchmarking framework for Noda Time. It's not very clever - it just finds appropriately attributed methods and runs them repeatedly. There's always the overhead of a delegate invocation, which I don't even try to take account of or remove.

Until now, I haven't actually done anything with the results - but over the last few days, I've made significant improvements to the time taken to construct instances of LocalDateTime (which have a calendar system but no time zone) and ZonedDateTime (which are the same but with a time zone). The latter is particularly hairy - it needs to do more work than you might expect, in order to cope with daylight saving time transitions. (A local date and time can correspond to 0, 1 or 2 instants on the time line; we're careful to throw an exception for 0, and return the later possible result when there's ambiguity.)

Recent improvements

Joda Time already had caching built in, but we've modified it slightly (and probably will again); in particular, while a time zone cached its offset at any UTC instant, it didn't cache transitions. These can be relatively painful to calculate, as they involve working out the exact time of "the last Sunday of the month" and so on. Caching the transitions as well improved some performance more than 30-fold (yes, a 3000% improvement). I should point out that currently Joda Time doesn't use the transitions as often as we do; which is probably why they don't cache them. I should point out that this very caching makes the benchmarks slightly less useful: as we're building the same date and time repeatedly, the cache is always going to be hit after the first iteration - which sounds like real world performance will be a lot worse, until you realise that the cache is based on the assumption that most applications will be using a relatively narrow set of dates and times; so long as most of your uses fall within the same 50 year period, you're very likely to hit the cache too.

The other big win was applying caching in a new situation. By far the most commonly-used calendar system (so commonly used that we dare to default to it; not something we do with time zones) is IsoCalendarSystem. The tricky bit when it comes to working out the representation of a local date and time is the "start of the right month" problem: days of the month, hours, minutes and so on are just a matter of multiplication (remember that the time zone doesn't apply to the local date and time). I've made the assumption that the vast majority of values used will be between 1900 and 2100 - so I precalculate the start of each month in that range, for easy access later. This leads to another fairly dramatic improvement - about 300% in the cost of creating a LocalDateTime. (This was already a much faster operation than creating a ZonedDateTime, so I wasn't expecting quite as much improvement.)

The final win is a real micro-optimisation which I wouldn't normally put in. You can create a LocalDateTime by specifying the time down to the minute, second, millisecond or tick. Until now, we've only had one method in the ICalendarSystem interface which was used for all four of these calls - we just supplied 0 as the argument for anything we hadn't been told. However, each value has to then be validated (two comparisons), multiplied by a constant and added to the number of ticks. If you've only specified the time down to the minute, that's a whole 12 operations, as well as another 16 bytes of stack space, being wasted! Yes, it really does sound ridiculous - but adding overloads specific to each of these cases leads to another 20% improvement, which is enough to make it worthwhile, in my view.

We're not your bottleneck

Micro-optimisation is a lot of fun, but almost never worth doing at an application level. However, it makes a lot more sense for a library. I have no idea how Noda Time is going to be used: but I don't want it to become a problem. If anyone is ever able to take a profiling report from a real application, and say "Hmm... Noda Time appears to be taking up a significant portion of our CPU time; maybe we'd better hard code that area or go back to the BCL" then that will represent a failure in my view.

That's why I'm willing to make changes which seem a little strange to start with, in order to push the performance further than I normally would. I wouldn't do so in interfaces which are likely to be used directly (most users will never care about ICalendarSystem - advanced users may wish to specify which calendar system to use, but it'll primarily be code within Noda Time which actually calls methods within it) but if we can build an API which is not only easy to use but also lightning fast, that will be a real point of pride.

On my main laptop at home, I can now create about 18 million LocalDateTime values per second, and 5 million ZonedDateTimes (in the America/Los_Angeles time zone). I reckon that for the moment at least, that's fast enough.

2 comments:

  1. "so long as most of your uses fall within the same 50 year period, you're very likely to hit the cache too"
    I've noticed most apps that use times massively are either calendar-y, normally using dates in a time period of a few years, or lifetime-y, using dates in a 80-100 year period (consider date of birth and date of end of a savings or pension plan for example). Which is why 50 seems to me like a completely wrong number. I think the cache should be optimized to either be useful for about 10-20 years, or for about 100 years. Or, it can have a dynamic cache size with suggestions in the docs for 10-year and 100-year values.

    ReplyDelete
  2. Awesome stuff, Jon! I can't wait for v1.0.

    Keep up the great work.


    Cheers,

    Charles

    ReplyDelete