Tuesday, 30 November 2010

The joys of date/time arithmetic

(Cross-posted to my main blog and the Noda Time blog, in the hope that the overall topic is still of interest to those who aren't terribly interested in Noda Time per se.)

I've been looking at the "period" part of Noda Time recently, trying to redesign the API to simplify it somewhat. This part of the API is what we use to answer questions such as:

  • What will the date be in 14 days?
  • How many hours are there between now and my next birthday?
  • How many years, months and days have I been alive for?

I've been taking a while to get round to this because there are some tricky choices to make. Date and time arithmetic is non-trivial - not because of complicated rules which you may be unaware of, but simply because of the way calendaring systems work. As ever, time zones make life harder too. This post won't talk very much about the Noda Time API details, but will give the results of various operations as I currently expect to implement them.

The simple case: arithmetic on the instant time line

One of the key concepts to understand when working with time is that the usual human "view" on time isn't the only possible one. We don't have to break time up into months, days, hours and so on. It's entirely reasonable (in many cases, at least) to consider time as just a number which progresses linearly. In the case of Noda Time, it's the number of ticks (there are 10 ticks in a microsecond, 10,000 ticks in a millisecond, and 10 million ticks in a second) since midnight on January 1st 1970 UTC.

Leaving relativity aside, everyone around the world can agree on an instant, even if they disagree about everything else. If you're talking over the phone (using a magic zero-latency connection) you may think you're in different years, using different calendar systems, in different time zones - but still both think of "now" as "634266985845407773 ticks".

That makes arithmetic really easy - but also very limited. You can only add or subtract numbers of ticks, effectively. Of course you can derive those ticks from some larger units which have a fixed duration - for example, you could convert "3 hours" into ticks - but some other concepts don't really apply. How would you add a month? The instant time line has no concept of months, and in most calendars different months have different durations (28-31 days in the ISO calendar, for example). Even the idea of a day is somewhat dubious - it's convenient to treat a day as 24 hours, but you need to at least be aware that when you translate an instant into a calendar that a real person would use, days don't always last for 24 hours due to daylight savings.

Anyway, the basic message is that it's easy to do arithmetic like this. In Noda Time we have the Instant structure for the position on the time line, and the Duration structure as a number of ticks which can be added to an Instant. This is the most appropriate pair of concepts to use to measure how much time has passed, without worrying about daylight savings and so on: ideal for things like timeouts, cache purging and so on.

Things start to get messy: local dates, times and date/times

The second type of arithmetic is what humans tend to actually think in. We talk about having a meeting in a month's time, or how many days it is until Christmas (certainly my boys do, anyway). We don't tend to consciously bring time zones into the equation - which is a good job, as we'll see later.

Now just to make things clear, I'm not planning on talking about recurrent events - things like "the second Tuesday and the last Wednesday of every month". I'm not planning on supporting recurrences in Noda Time, and having worked on the calendar part of Google Mobile Sync for quite a while, I can tell you that they're not fun. But even without recurrences, life is tricky.

Introducing periods and period arithmetic

The problem is that our units are inconsistent. I mentioned before that "a month" is an ambiguous length of time... but it doesn't just change by the month, but potentially by the year as well: February is either 28 or 29 days long depending on the year. (I'm only considering the ISO calendar for the moment; that gives enough challenges to start with.)

If we have inconsistent units, we need to keep track of those units during arithmetic, and even request that the arithmetic be performed using specific units. So, it doesn't really make sense to ask "how long is the period between June 10th 2010 and October 13th 2010" but it does make sense to ask "how many days are there between June 10th 2010 and October 13th 2010" or "how many years, months and days are there between June 10th 2010 and October 13th 2010".

Once you've got a period - which I'll describe as a collection of unit/value pairs, e.g. "0 years, 4 months and 3 days" (for the last example above) you can still give unexpected behaviour. If you add that period to your original start date, you should get the original end date... but if you advance the start date by one day, you may not advance the end date by one day. It depends on how you handle things like "one month after January 30th 2010" - some valid options are:

  • Round down to the end of the month: February 28th
  • Round up to the start of the next month: March 1st
  • Work out how far we've overshot, and apply that to the next month: March 2nd
  • Throw an exception

All of these are justifiable. Currently, Noda Time will always take the first approach. I believe that JSR-310 (the successor to Joda Time) will allow the behaviour to be resolved according to a strategy provided by the user... it's unclear to me at the moment whether we'll want to go that far in Noda Time.

Arithmetic in Noda Time is easily described, but the consequences can be subtle. When adding or subtracting a period from something like a LocalDate, we simply iterate over all of the field/value pairs in the period, starting with the most significant, and add each one in turn. When finding the difference between two LocalDate values with a given set of field types (e.g. "months and days") we get as close as we can without overshooting using the most significant field, then the next field etc.

The "without overshooting" part means that if you add the result to the original start value, the result will always either be the target end value (if sufficiently fine-grained fields are available) or somewhere between the original start and the target end value. So "June 2nd 2010 to October 1st 2010 in months" gives a result of "3 months" even though if we chose "4 months" we'd only overshoot by a tiny amount.

Now we know what approach we're taking, let's look at some consequences.

Asymmetry and other oddities

It's trivial to show some assymetry just using a period of a single month. For example:

  • January 28th 2010 + 1 month = February 28th 2010
  • January 29th 2010 + 1 month = February 28th 2010
  • January 30th 2010 + 1 month = February 28th 2010
  • February 28th 2010 - 1 month = January 28th 2010

It gets even more confusing when we add days into the mix:

  • January 28th 2010 + 1 month + 1 day = March 1st 2010
  • January 29th 2010 + 1 month + 1 day = March 1st 2010
  • March 1st 2010 - 1 month - 1 day = January 31st 2010

And leap years:

  • March 30th 2013 - 1 year - 1 month - 10 days = February 19th 2012 (as "February 30th 2012" is truncated to February 29th 2012)
  • March 30th 2012 - 1 year - 1 month - 10 days = February 18th 2012 (as "February 30th 2011" is truncated to February 28th 2011)

Then we need to consider how rounding works when finding the difference between days... (forgive the pseudocode):

  • Between(January 31st 2010, February 28th 2010, Months & Days) = ?
  • Between(February 28th 2010, January 31st 2010, Months & Days) = -28 days

The latter case is relatively obvious - because if you take a whole month of February 28th 2010 you end up with January 28th 2010, which is an overshoot... but what about the first case?

Should we return the determine the number of months by "the largest number such that start + period <= end"? If so, we get a result of "1 month" - which makes sense given the first set of results in this section.

What worries me most about this situation is that I honestly don't know offhand what the current implementation will do. I think it would be best to return "28 days" as there isn't genuinely a complete month between the two... <tappety tappety>

Since writing the previous paragraph, I've tested it, and it returns 1 month and 0 days. I don't know how hard it would be to change this behaviour or whether we want to. Whatever we do, however, we need to document it.

That's really at the heart of this: we must make Noda Time predictable. Where there are multiple feasible results, there should be a simple way of doing the arithmetic by hand and getting the same results as Noda Time. Of course, picking the best option out of the ones available would be good - but I'd rather be consistent and predictable than "usually right" be unpredictably so.

Think it's bad so far? It gets worse...

ZonedDateTime: send in the time zones... (well maybe next year?)

I've described the "instant time line" and its simplicity.

I've described the local date/time complexities, where there's a calendar but there's no time zone.

So far, the two worlds have been separate: you can't add a Duration to a LocalDateTime (etc), and you can't add a Period to an Instant. Unfortunately, sooner or later many applications will need ZonedDateTime.

Now, you can think of ZonedDateTime in two different ways:

  • It's an Instant which knows about a calendar and a time zone
  • It's a LocalDateTime which knows about a time zone and the offset from UTC

The "offset from UTC" part sounds redundant at first - but during daylight saving transitions the same LocalDateTime occurs at two different instants; the time zone is the same in both cases, but the offset is different.

The latter way of thinking is how we actually represent a ZonedDateTime internally, but it's important to know that a ZonedDateTime still unambiguously maps to an Instant.

So, what should we be able to do with a ZonedDateTime in terms of arithmetic? I think the answer is that we should be able to add both Periods and Durations to a ZonedDateTime - but expect them to give different results.

When we add a Duration, that should work out the Instant represented by the current DateTime, advance it by the given duration, and return a new ZonedDateTime based on that result with the same calendar and time zone. In other words, this is saying, "If I were to wait for the given duration, what date/time would I see afterwards?"

When we add a Period, that should add it to the LocalDateTime represented by the ZonedDateTime, and then return a new ZonedDateTime with the result, the original time zone and calendar, and whatever offset is suitable for the new LocalDateTime. (That's deliberately woolly - I'll come back to it.) This is the sort of arithmetic a real person would probably perform if you asked them to tell you what time it would be "three hours from now". Most people don't take time zones into account...

In most cases, where a period can be represented as a duration (for example "three hours") the two forms of addition will give the same result. Around daylight saving transitions, however, they won't. Let's consider some calculations on Sunday November 7th 2010 in the "Pacific/Los_Angeles" time zone. It had a daylight saving transition from UTC-7 to UTC-8 at 2am local time. In other words, the clock went 1:58, 1:59, 1:00. Let's start at 12:30am (local time, offset = -7) and add a few different values:

  • 12:30am + 1 hour duration = 1:30am, offset = -7
  • 12:30am + 2 hours duration = 1:30am, offset = -8
  • 12:30am + 3 hours duration = 2:30am, offset = -8
  • 12:30am + 1 hour period = 1:30am, offset = ???
  • 12:30am + 2 hour period = 2:30am, offset = -8
  • 12:30am + 3 hour period = 3:30am, offset = -8

The ??? value is the most problematic one, because 1:30 occurs twice... when thinking of the time in a calendar-centric way, what should the result be? Options here:

  • Always use the earlier offset
  • Always use the later offset
  • Use the same offset as the start date/time
  • Use the offset in the direction of travel (so adding one hour from 12:30am would give 1:30am with an offset of -7, but subtracting one hour from 2:30am would give 1:30am with an offset of -8)
  • Throw an exception
  • Allow the user to pass in an argument which represents a strategy for resolving this

This is currently unimplemented in Noda Time, so I could probably choose whatever behaviour I want, but frankly none of them has much appeal.

At the other daylight saving transition, when the clocks go forward, we have the opposite problem: adding one hour to 12:30am can't give 1:30am because that time never occurs. Options in this case include:

  • Return the first valid time after the transition (this has problems if we're subtracting time, where we'd presumably want to return the latest valid time before the transition... but the transition has an exclusive lower bound, so there's no such "latest valid time" really)
  • Add the offset difference, so we'd skip to 2:30am
  • Throw an exception
  • Allow the user to pass in a strategy

Again, nothing particularly appeals.

All of this is just involved in adding a period to a ZonedDateTime - then the same problems occur all over again when trying to find the period between them. What's the difference (as a Period rather than a simple Duration) between 1:30am with an offset of -7 and 1:30am with an offset of -8? Nothing, or an hour? Again, at the moment I really don't know the best course of action.

Conclusion

This post has ended up being longer than I'd expected, but hopefully you've got a flavour of the challenges we're facing. Even without time zones getting involved, date and time arithmetic is pretty silly - and with time zones, it becomes very hard to reason about - and to work out what the "right" result to be returned by an API should be, let alone implement it.

Above all, it's important to me that Noda Time is predictable and clearly documented. Very often, if a library doesn't behave exactly the way you want it to, but you can tell what it's going to do, you can work around that - but if you're having to experiment to guess the behaviour, you're on a hiding to nothing.