Tuesday, 30 November 2010

The joys of date/time arithmetic

(Cross-posted to my main blog and the Noda Time blog, in the hope that the overall topic is still of interest to those who aren't terribly interested in Noda Time per se.)

I've been looking at the "period" part of Noda Time recently, trying to redesign the API to simplify it somewhat. This part of the API is what we use to answer questions such as:

  • What will the date be in 14 days?
  • How many hours are there between now and my next birthday?
  • How many years, months and days have I been alive for?

I've been taking a while to get round to this because there are some tricky choices to make. Date and time arithmetic is non-trivial - not because of complicated rules which you may be unaware of, but simply because of the way calendaring systems work. As ever, time zones make life harder too. This post won't talk very much about the Noda Time API details, but will give the results of various operations as I currently expect to implement them.

The simple case: arithmetic on the instant time line

One of the key concepts to understand when working with time is that the usual human "view" on time isn't the only possible one. We don't have to break time up into months, days, hours and so on. It's entirely reasonable (in many cases, at least) to consider time as just a number which progresses linearly. In the case of Noda Time, it's the number of ticks (there are 10 ticks in a microsecond, 10,000 ticks in a millisecond, and 10 million ticks in a second) since midnight on January 1st 1970 UTC.

Leaving relativity aside, everyone around the world can agree on an instant, even if they disagree about everything else. If you're talking over the phone (using a magic zero-latency connection) you may think you're in different years, using different calendar systems, in different time zones - but still both think of "now" as "634266985845407773 ticks".

That makes arithmetic really easy - but also very limited. You can only add or subtract numbers of ticks, effectively. Of course you can derive those ticks from some larger units which have a fixed duration - for example, you could convert "3 hours" into ticks - but some other concepts don't really apply. How would you add a month? The instant time line has no concept of months, and in most calendars different months have different durations (28-31 days in the ISO calendar, for example). Even the idea of a day is somewhat dubious - it's convenient to treat a day as 24 hours, but you need to at least be aware that when you translate an instant into a calendar that a real person would use, days don't always last for 24 hours due to daylight savings.

Anyway, the basic message is that it's easy to do arithmetic like this. In Noda Time we have the Instant structure for the position on the time line, and the Duration structure as a number of ticks which can be added to an Instant. This is the most appropriate pair of concepts to use to measure how much time has passed, without worrying about daylight savings and so on: ideal for things like timeouts, cache purging and so on.

Things start to get messy: local dates, times and date/times

The second type of arithmetic is what humans tend to actually think in. We talk about having a meeting in a month's time, or how many days it is until Christmas (certainly my boys do, anyway). We don't tend to consciously bring time zones into the equation - which is a good job, as we'll see later.

Now just to make things clear, I'm not planning on talking about recurrent events - things like "the second Tuesday and the last Wednesday of every month". I'm not planning on supporting recurrences in Noda Time, and having worked on the calendar part of Google Mobile Sync for quite a while, I can tell you that they're not fun. But even without recurrences, life is tricky.

Introducing periods and period arithmetic

The problem is that our units are inconsistent. I mentioned before that "a month" is an ambiguous length of time... but it doesn't just change by the month, but potentially by the year as well: February is either 28 or 29 days long depending on the year. (I'm only considering the ISO calendar for the moment; that gives enough challenges to start with.)

If we have inconsistent units, we need to keep track of those units during arithmetic, and even request that the arithmetic be performed using specific units. So, it doesn't really make sense to ask "how long is the period between June 10th 2010 and October 13th 2010" but it does make sense to ask "how many days are there between June 10th 2010 and October 13th 2010" or "how many years, months and days are there between June 10th 2010 and October 13th 2010".

Once you've got a period - which I'll describe as a collection of unit/value pairs, e.g. "0 years, 4 months and 3 days" (for the last example above) you can still give unexpected behaviour. If you add that period to your original start date, you should get the original end date... but if you advance the start date by one day, you may not advance the end date by one day. It depends on how you handle things like "one month after January 30th 2010" - some valid options are:

  • Round down to the end of the month: February 28th
  • Round up to the start of the next month: March 1st
  • Work out how far we've overshot, and apply that to the next month: March 2nd
  • Throw an exception

All of these are justifiable. Currently, Noda Time will always take the first approach. I believe that JSR-310 (the successor to Joda Time) will allow the behaviour to be resolved according to a strategy provided by the user... it's unclear to me at the moment whether we'll want to go that far in Noda Time.

Arithmetic in Noda Time is easily described, but the consequences can be subtle. When adding or subtracting a period from something like a LocalDate, we simply iterate over all of the field/value pairs in the period, starting with the most significant, and add each one in turn. When finding the difference between two LocalDate values with a given set of field types (e.g. "months and days") we get as close as we can without overshooting using the most significant field, then the next field etc.

The "without overshooting" part means that if you add the result to the original start value, the result will always either be the target end value (if sufficiently fine-grained fields are available) or somewhere between the original start and the target end value. So "June 2nd 2010 to October 1st 2010 in months" gives a result of "3 months" even though if we chose "4 months" we'd only overshoot by a tiny amount.

Now we know what approach we're taking, let's look at some consequences.

Asymmetry and other oddities

It's trivial to show some assymetry just using a period of a single month. For example:

  • January 28th 2010 + 1 month = February 28th 2010
  • January 29th 2010 + 1 month = February 28th 2010
  • January 30th 2010 + 1 month = February 28th 2010
  • February 28th 2010 - 1 month = January 28th 2010

It gets even more confusing when we add days into the mix:

  • January 28th 2010 + 1 month + 1 day = March 1st 2010
  • January 29th 2010 + 1 month + 1 day = March 1st 2010
  • March 1st 2010 - 1 month - 1 day = January 31st 2010

And leap years:

  • March 30th 2013 - 1 year - 1 month - 10 days = February 19th 2012 (as "February 30th 2012" is truncated to February 29th 2012)
  • March 30th 2012 - 1 year - 1 month - 10 days = February 18th 2012 (as "February 30th 2011" is truncated to February 28th 2011)

Then we need to consider how rounding works when finding the difference between days... (forgive the pseudocode):

  • Between(January 31st 2010, February 28th 2010, Months & Days) = ?
  • Between(February 28th 2010, January 31st 2010, Months & Days) = -28 days

The latter case is relatively obvious - because if you take a whole month of February 28th 2010 you end up with January 28th 2010, which is an overshoot... but what about the first case?

Should we return the determine the number of months by "the largest number such that start + period <= end"? If so, we get a result of "1 month" - which makes sense given the first set of results in this section.

What worries me most about this situation is that I honestly don't know offhand what the current implementation will do. I think it would be best to return "28 days" as there isn't genuinely a complete month between the two... <tappety tappety>

Since writing the previous paragraph, I've tested it, and it returns 1 month and 0 days. I don't know how hard it would be to change this behaviour or whether we want to. Whatever we do, however, we need to document it.

That's really at the heart of this: we must make Noda Time predictable. Where there are multiple feasible results, there should be a simple way of doing the arithmetic by hand and getting the same results as Noda Time. Of course, picking the best option out of the ones available would be good - but I'd rather be consistent and predictable than "usually right" be unpredictably so.

Think it's bad so far? It gets worse...

ZonedDateTime: send in the time zones... (well maybe next year?)

I've described the "instant time line" and its simplicity.

I've described the local date/time complexities, where there's a calendar but there's no time zone.

So far, the two worlds have been separate: you can't add a Duration to a LocalDateTime (etc), and you can't add a Period to an Instant. Unfortunately, sooner or later many applications will need ZonedDateTime.

Now, you can think of ZonedDateTime in two different ways:

  • It's an Instant which knows about a calendar and a time zone
  • It's a LocalDateTime which knows about a time zone and the offset from UTC

The "offset from UTC" part sounds redundant at first - but during daylight saving transitions the same LocalDateTime occurs at two different instants; the time zone is the same in both cases, but the offset is different.

The latter way of thinking is how we actually represent a ZonedDateTime internally, but it's important to know that a ZonedDateTime still unambiguously maps to an Instant.

So, what should we be able to do with a ZonedDateTime in terms of arithmetic? I think the answer is that we should be able to add both Periods and Durations to a ZonedDateTime - but expect them to give different results.

When we add a Duration, that should work out the Instant represented by the current DateTime, advance it by the given duration, and return a new ZonedDateTime based on that result with the same calendar and time zone. In other words, this is saying, "If I were to wait for the given duration, what date/time would I see afterwards?"

When we add a Period, that should add it to the LocalDateTime represented by the ZonedDateTime, and then return a new ZonedDateTime with the result, the original time zone and calendar, and whatever offset is suitable for the new LocalDateTime. (That's deliberately woolly - I'll come back to it.) This is the sort of arithmetic a real person would probably perform if you asked them to tell you what time it would be "three hours from now". Most people don't take time zones into account...

In most cases, where a period can be represented as a duration (for example "three hours") the two forms of addition will give the same result. Around daylight saving transitions, however, they won't. Let's consider some calculations on Sunday November 7th 2010 in the "Pacific/Los_Angeles" time zone. It had a daylight saving transition from UTC-7 to UTC-8 at 2am local time. In other words, the clock went 1:58, 1:59, 1:00. Let's start at 12:30am (local time, offset = -7) and add a few different values:

  • 12:30am + 1 hour duration = 1:30am, offset = -7
  • 12:30am + 2 hours duration = 1:30am, offset = -8
  • 12:30am + 3 hours duration = 2:30am, offset = -8
  • 12:30am + 1 hour period = 1:30am, offset = ???
  • 12:30am + 2 hour period = 2:30am, offset = -8
  • 12:30am + 3 hour period = 3:30am, offset = -8

The ??? value is the most problematic one, because 1:30 occurs twice... when thinking of the time in a calendar-centric way, what should the result be? Options here:

  • Always use the earlier offset
  • Always use the later offset
  • Use the same offset as the start date/time
  • Use the offset in the direction of travel (so adding one hour from 12:30am would give 1:30am with an offset of -7, but subtracting one hour from 2:30am would give 1:30am with an offset of -8)
  • Throw an exception
  • Allow the user to pass in an argument which represents a strategy for resolving this

This is currently unimplemented in Noda Time, so I could probably choose whatever behaviour I want, but frankly none of them has much appeal.

At the other daylight saving transition, when the clocks go forward, we have the opposite problem: adding one hour to 12:30am can't give 1:30am because that time never occurs. Options in this case include:

  • Return the first valid time after the transition (this has problems if we're subtracting time, where we'd presumably want to return the latest valid time before the transition... but the transition has an exclusive lower bound, so there's no such "latest valid time" really)
  • Add the offset difference, so we'd skip to 2:30am
  • Throw an exception
  • Allow the user to pass in a strategy

Again, nothing particularly appeals.

All of this is just involved in adding a period to a ZonedDateTime - then the same problems occur all over again when trying to find the period between them. What's the difference (as a Period rather than a simple Duration) between 1:30am with an offset of -7 and 1:30am with an offset of -8? Nothing, or an hour? Again, at the moment I really don't know the best course of action.

Conclusion

This post has ended up being longer than I'd expected, but hopefully you've got a flavour of the challenges we're facing. Even without time zones getting involved, date and time arithmetic is pretty silly - and with time zones, it becomes very hard to reason about - and to work out what the "right" result to be returned by an API should be, let alone implement it.

Above all, it's important to me that Noda Time is predictable and clearly documented. Very often, if a library doesn't behave exactly the way you want it to, but you can tell what it's going to do, you can work around that - but if you're having to experiment to guess the behaviour, you're on a hiding to nothing.

Saturday, 10 April 2010

Documentation with Sandcastle - a notebook

(Posted to both my main code blog and the Noda Time blog.)

I apologise in advance if this blog post becomes hard to get good information from. It's a record of trying to get Sandcastle to work for Noda Time; as much as anything it's meant to be an indication of how smooth or otherwise the process of getting started with Sandcastle is. My aim is to be completely honest. If I make stupid mistakes, they'll be documented here. If I have decisions to make, they'll be documented here.

I should point out that I considered using NDoc (it just didn't make sense to use a dead project) and docu (I'm not keen on the output style, and it threw an exception when I tried running it on Noda Time anyway). I didn't try any other options as I'm unaware of them. Hmm.

Starting point and aims

My eventual aim is to include "build the docs" as a task in the build procedure for Noda Time. I don't have much in the way of preconceived ideas of what the output should be: my guess is a CHM file and something to integrate into MSDN, as well as some static web pages. Ideally I'd like to be able to put the web pages on the Google Code project page, but I don't know how feasible that will be. If the best way forward turns out to be something completely different, that's fine.

(I've mailed Scott Hanselman and Matt Hawley about the idea of having an ASP.NET component of some form which could dynamically generate all this stuff on the fly - you'd just need to upload the .xml and .dll files, and let it do the rest. I'm not expecting that idea to be useful for Noda Time in the near future, but you never know.)

Noda Time has some documentation, but there are plenty of public members which don't have any XML documentation at all at the moment. Obviously there's a warning available for this so we'll be able to make sure that eventually everything's documented, but we also need to be able to build documentation before we reach that point.

Step 0: building the XML file

The build project doesn't currently even create the .xml file, so that's the first port of call - just a case of ticking a box and then changing the default filename slightly... because for some bizarre reason, Visual Studio defaults to creating a ".XML" file instead of ".xml". Why? Where else are capitals used in file extensions?

Rebuild the solution, gaze in fear at the 496 warnings generated, and we have everything we should need from Visual Studio. My belief is that I should now be able to close Visual Studio and not reopen it (with the Noda Time solution, anyway) during the course of this blog post.

Step 1: building Sandcastle

First real choice: which version of Sandcastle do I go for? There was a binary release on May 29th 2008, a source release on July 2nd 2008, and three commits to source control since then, the latest of which was in July 2009. Personally I like the idea of not having to actually install anything: tools which can just be run without installation are nicer for Open Source projects, particularly if you can check the binaries into source control with appropriate licence files. That way anyone can build after just fetching. On the other hand, I'm not sure how well the Sandcastle licence fits in with the Apache 2 licence we're using for Noda Time. I can investigate that later.

What the heck, let's try building it from source. It's probably easier to go from that to the installed version than the other way round. Change set 26202 downloaded and unpacked... now how do we build it, and what do we need to build? Okay, there's a solution file, which opens up in VS2008 (unsurprising and not a problem). Gosh, what a lot of projects (but no unit tests?) - still, everything builds with nary a warning. I've no idea what to do with it now, but it's a start. It looks like it's copied four executables and a bunch of DLLs into the ProductionTools directory, which is promising.

Shockingly, it's only just occurred to me to check for some documentation to see whether or not I'm doing the right thing. Looking at the Sandcastle web page, it seems I'm not missing much. Well, I was aware that this might be the case.

Step 2: Sandcastle Help File Builder

I've heard about SHFB from a few places, and it certainly sounds like it's the way to go - it even has a getting started guide and installation instructions! It looks like there's a heck of a lot of documentation for something sounds like it should be simple, but hey, let's dive in. (I know it sounds inconsistent to go from complaining about no documentation to complaining about too much - but I'm really going from complaining about no documentation to complaining about something being so complicated that it needs a lot of documentation. I'm very grateful to the SHFB team for documenting everything, even if I plan to read it on a Just-In-Time basis.)

A few notes from the requirements page:

  • It looks like I'll need to install the HTML Help Workshop if I want CHM files; the Help 2 compiler should already be part of the VS2008 SDK which I'm sure is already installed. I have no idea where Help 3 fits into this :(
  • It looks like I need a DXROOT environment variable pointing at my Sandcastle "installation". I wonder what that means in my home-built version? I'll assume it just means the Development directory containing ProductionTools and ProductionTransforms.
  • There's a further set of patches available in the Sandcastle Styles project. Helpfully, this says it includes all the updates in the July 2009 source code, and can be applied to the binary installation from May 2008. It's not clear, however, whether it can also be applied to a home-built version of Sandcastle. Given that I can get all the latest stuff in conjunction with an installed version, it sounds like it's worth installing the binary release after all. (Done, and patches installed.)
  • It sounds like I need to install the H2 Viewer and H2Reg. (I suspect that H2Reg will be something we direct our users at rather than shipping and running ourselves; I don't intend to have an MSI-style "installer" for Noda Time at the moment, although the recent CoApp announcement sounds interesting. It's too early to worry about that for the moment though.)
  • We're not documenting a web site project, so I'm not bothering with "Custom Web Code Providers". I've installed quite enough by this point, thank you very much. Oh, except I haven't installed SHFB itself yet. I'd better do that now...

Step 3: creating a Help File Builder project

This feels like it could be reasonably straightforward, so long as I don't try to do anything fancy. Let's follow (roughly) the instructions. (I'm doing it straight to Noda Time rather than using the example project.)

Open the GUI, create a new project, add a documentation source of NodaTime.csproj... and hit Build Project. Wow, this takes quite a while - and this is a pretty beefy laptop. However, it seems to work! I have a CHM file which looks like it includes all the right stuff. Hoorah! It's a pretty huge CHM file (just over 3MB) for a relatively small project, but never mind.

Let's build it again, this time with all the output enabled (Help 1, Help 2, MSHelpViewer and Website).

Hmm... no MS Help 2 compiler found. Maybe I didn't have the VS2008 SDK installed after all. After a bit of hunting, it's here. Time to install it - and make sure it doesn't mess up the Sandcastle installation, as the SHFB docs warned me about. Yikes - 109MB. Ah well.

Okay, so after the SDK installation, rebuild the help... which will take even longer of course, as it's now building four different output formats. 3 minutes 18 seconds in the end... not too bad, but not something I'll want to do after every build :)

Step 4: checking the results

  • Help 1 (CHM): looks fine, if old-fashioned :)
  • Help 2 (HxS): via H2Viewer, looks fine - I'm not sure whether I dare to integrate it with MSDN just yet though.
  • ASP.NET web site: works even in Chrome
  • Static HTML: causes Chrome to flicker, constantly reloading. Works fine in Firefox. Maybe I need to submit a bug report.

I'm not entirely sure which output option corresponds to which result here; in particular, is "Website" the static one or the ASP.NET one? What's MSHelpViewer? It's easy enough to find out of course - I'll just experiment at a later date.

Step 5: building from the command line

I can't decide whether this is crucial (as it should be part of a continuous build server) or irrelevant (as there are so many tools to install, I may never get the ability to run a CB server with everything installed). However, it would certainly be nice.

Having set SHFBROOT appropriately, running msbuild gives this error:

SHFB : error BE0067: Unable to obtain assembly name from project file '[...]' using Configuration 'Debug', Platform 'X64'

Using Debug is definitely correct, but X64 sounds wrong... I suspect I want AnyCPU instead. Fortunately, this can be set in the SHFB project file (it was previously just defaulting). Once that's been fixed, the build works with one warning: BHT0001: Unable to get executing project: Unable to obtain internal reference. Supposedly this may indicate a problem in SHFB itself... I shall report it later on. It doesn't seem to affect the ability to produce help though.

Conclusion

That wasn't quite as painful as I'd feared. I'm nearly ready to check in the SHFB project file now - but I need to work out a few other things first, and probably create a specific "XML" configuration for the main project itself. I'm somewhat alarmed at the number of extra bits and pieces that I had to install though - and the lack of any mention of Help 3 is also a bit worrying.

I've just remembered one other option that I haven't tried, too - MonoDoc. I may have another look at that at a later date, although the fact that it needs a GTK# help viewer isn't ideal.

I still think the Open Source community for .NET has left a hole at the moment. It may be that SHFB + Sandcastle is as good as it gets, given the limitations of how much needs to be installed to build MS help files. I'd still like to see a better way of providing docs for web sites though... ideally one which doesn't involve shipping hundreds of files around when only two are really required.

Friday, 9 April 2010

Time zones, singletons, deployment options, names and source control

Yesterday (April 8th), the province of San Luis in Argentina declared that it wouldn't be observing a daylight saving transition after all. It was due to go back to standard time on April 11th. Thanks for the heads-up, guys. (The rest of Argentina isn't observing daylight saving at all at the moment - another decision which came with little warning.)

This highlights an important fact about time zones: they're more volatile than you might expect. This affects Noda Time in interesting ways.

Switching time zone provider

Obviously we ship with a time zone database (based on zoneinfo data) but when we eventually get to the stage of having public releases, we don't want to have to ship another release at the whim of politicians. We already have the code to load a time zone database (in our own compressed format) from a resource; we ought to expose the ability to do it from an arbitrary stream as well. (The code is there, we just need to expose it.)

So, we can then ship new time zone files and developers could use them... if they decide to care. There are two ways they could do this. First (either way) they need to load the database as an IDateTimeZoneProvider. After that, they could pass that around as a dependency everywhere that requires a time zone based on an ID. Alternatively, they could add it to the "system" time zone provider, which will make DateTimeZones.ForID(...) work.

Normally, I would absolutely say it's worth going with the dependency injection approach... but do you really want that dependency everywhere? It's not something you expect to need to mock out for tests (unlike a clock, for example). I'm torn between the "wrongness" of the singleton system provider list and the pragmatic approach it provides. Ultimately, I suspect we'll make sure that both options work (I believe we'll need to expose the built-in database as a provider, which I don't think we do at the moment) and let individual teams decide what works best for them. Even that feels slightly wrong though.

There's another awkward decision around practicality, too. If you're not going to use the built-in database, why do you even want it to be present?

Deployment options

It's extremely convenient that NodaTime comes as a single DLL with no extra dependencies. You can just drop it in, no problem. It doesn't need to read any files - just the built-in resources. That's handy in terms of sandboxing and permissions... but doesn't lend itself to having a split between code and data.

We could potentially have two different versions: a "skinny" build which doesn't include the time zone database, and a "fat" build which does. The skinny build would have automatically try to load the time zone database from some well-known place... but how?

It could be a dependent assembly - but that means it's got to be strongly named (as Noda Time itself is) which could make it hard for developers to build their own time zone database assembly if they want it to be trusted by a release build (which will be signed with a private snk file only available to core Noda Time developers).

It could be a satellite assembly, which comes with the same problems but may be slightly simpler in terms of the code to load it - and may even mean that there's no difference in the code used by the fat and skinny builds to load the "default" database.

It could be a file, but then we get back into the business of permissions and trust, as well as using reflection to find where the heck the assembly is in the first place (blech).

In short, it's all a bit of a mess - but I'm hoping we'll find some pleasant way of coping with it which makes it a breeze to replace the database without any inefficiency. Of course, even having two builds (per platform, don't forget!) is a pain to start with. More testing etc. That isn't even the limit of the problem...

Changing names

This isn't the first bit of time zone fun I've had this week. Without going into details, I've been caught out by the fact that Asia/Calcutta changed to Asia/Colombo at some point.

Currently, I don't believe Noda Time has any support for this sort of change. Depending on what's in the zoneinfo database, we may have some aliases - but I don't believe we've got any "translation" support; a sort of "if I've got this time zone name, what else might it be called?" piece of functionality.

At some point, I need to think about this further. The idea that your data could become invalid at any moment (and that new data may be unreadable by old systems) scares me.

Versioned time zone databases - and dates

All of this makes me think of source control. We're used to the idea of "a specific path, at a particular revision" (where a revision can be a number or a hash or whatever). The mess of changing time zones suggests we need to apply the same sort of idea to dates and times. So we might have a zoned instant represented by "2010-04-12T18:42:00 America/Argentina/San_Luis@2010-04-07T00:00:00Z". In other words, "the instant that we thought would be represented by April 12th 2010, 6:42pm in San Luis, when we considered it on April 7th".

Somehow, I can't see that taking off. It's worth reading Tony Finch's thoughts on the same issue. I don't know to what extent I agree with his suggestions, but they're interesting ideas.

We really have screwed things up, haven't we?

Monday, 22 February 2010

Performance matters

Over the Christmas holidays, I wrote a small benchmarking framework for Noda Time. It's not very clever - it just finds appropriately attributed methods and runs them repeatedly. There's always the overhead of a delegate invocation, which I don't even try to take account of or remove.

Until now, I haven't actually done anything with the results - but over the last few days, I've made significant improvements to the time taken to construct instances of LocalDateTime (which have a calendar system but no time zone) and ZonedDateTime (which are the same but with a time zone). The latter is particularly hairy - it needs to do more work than you might expect, in order to cope with daylight saving time transitions. (A local date and time can correspond to 0, 1 or 2 instants on the time line; we're careful to throw an exception for 0, and return the later possible result when there's ambiguity.)

Recent improvements

Joda Time already had caching built in, but we've modified it slightly (and probably will again); in particular, while a time zone cached its offset at any UTC instant, it didn't cache transitions. These can be relatively painful to calculate, as they involve working out the exact time of "the last Sunday of the month" and so on. Caching the transitions as well improved some performance more than 30-fold (yes, a 3000% improvement). I should point out that currently Joda Time doesn't use the transitions as often as we do; which is probably why they don't cache them. I should point out that this very caching makes the benchmarks slightly less useful: as we're building the same date and time repeatedly, the cache is always going to be hit after the first iteration - which sounds like real world performance will be a lot worse, until you realise that the cache is based on the assumption that most applications will be using a relatively narrow set of dates and times; so long as most of your uses fall within the same 50 year period, you're very likely to hit the cache too.

The other big win was applying caching in a new situation. By far the most commonly-used calendar system (so commonly used that we dare to default to it; not something we do with time zones) is IsoCalendarSystem. The tricky bit when it comes to working out the representation of a local date and time is the "start of the right month" problem: days of the month, hours, minutes and so on are just a matter of multiplication (remember that the time zone doesn't apply to the local date and time). I've made the assumption that the vast majority of values used will be between 1900 and 2100 - so I precalculate the start of each month in that range, for easy access later. This leads to another fairly dramatic improvement - about 300% in the cost of creating a LocalDateTime. (This was already a much faster operation than creating a ZonedDateTime, so I wasn't expecting quite as much improvement.)

The final win is a real micro-optimisation which I wouldn't normally put in. You can create a LocalDateTime by specifying the time down to the minute, second, millisecond or tick. Until now, we've only had one method in the ICalendarSystem interface which was used for all four of these calls - we just supplied 0 as the argument for anything we hadn't been told. However, each value has to then be validated (two comparisons), multiplied by a constant and added to the number of ticks. If you've only specified the time down to the minute, that's a whole 12 operations, as well as another 16 bytes of stack space, being wasted! Yes, it really does sound ridiculous - but adding overloads specific to each of these cases leads to another 20% improvement, which is enough to make it worthwhile, in my view.

We're not your bottleneck

Micro-optimisation is a lot of fun, but almost never worth doing at an application level. However, it makes a lot more sense for a library. I have no idea how Noda Time is going to be used: but I don't want it to become a problem. If anyone is ever able to take a profiling report from a real application, and say "Hmm... Noda Time appears to be taking up a significant portion of our CPU time; maybe we'd better hard code that area or go back to the BCL" then that will represent a failure in my view.

That's why I'm willing to make changes which seem a little strange to start with, in order to push the performance further than I normally would. I wouldn't do so in interfaces which are likely to be used directly (most users will never care about ICalendarSystem - advanced users may wish to specify which calendar system to use, but it'll primarily be code within Noda Time which actually calls methods within it) but if we can build an API which is not only easy to use but also lightning fast, that will be a real point of pride.

On my main laptop at home, I can now create about 18 million LocalDateTime values per second, and 5 million ZonedDateTimes (in the America/Los_Angeles time zone). I reckon that for the moment at least, that's fast enough.