A few times after tweeting about Noda Time, people have asked why they should use Noda Time - they believe that the .NET date and time support is already good enough. Now obviously I haven't seen their code, but I suspect that pretty much any code base doing any work with dates and times will be clearer using Noda Time - and quite possibly more correct, due to the way that Noda Time forces you into making some decisions which are murky in .NET. This post is about the shortcomings of the .NET date and time API. Obviously I'm biased, and I hope this post isn't seen as disrespectful to the BCL team - aside from anything else, they work under a different set of constraints regarding COM interop etc.
What's does a DateTime mean?
When there's a Stack Overflow question about DateTime not doing quite what the questioner expected, I often find myself wondering what a particular value is meant to represent, exactly. It sounds simple - it's a date and time, right? But it gets rather more complicated as soon as you start thinking about it more carefully. For example, assuming the clock doesn't tick between the two property invocations, what should the value of "mystery" be in the following snippet?
DateTime local = DateTime.Now;
bool mystery = local == utc;
I honestly don't know what this will do. There are three options which all make a certain amount of sense:
- It should always be true: the two values are associated with the same instant in time, it's just that one is expressed locally and one is expressed universally
- It should always be false: the two values represent different kinds of data, so are automatically unequal
- It should return true if your local time zone is currently in sync with UTC, i.e. when time zones are disregarded completely, the two values are equal
I don't care much what the actual behaviour is - the fact that the behaviour is unobvious is a symptom of a deeper problem. It all comes back to the DateTime.Kind property which allows a DateTime to represent one of three kinds of value:
- DateTimeKind.Utc: A UTC date and time
- DateTimeKind.Local: A date and time which is a local time for the system the code is executing on
- DateTimeKind.Unspecified: Um, tricky. Depends on what you do with it.
The value of the property affects various different operations in different ways. For example, if you call ToUniversalTime() on an "unspecified" DateTime, it will assume that you really meant it as a local value before. On the other hand, if you call ToLocalTime() on an "unspecified" DateTime, it will assume that you really meant it as a UTC value before. That's one model of behaviour.
If you construct a DateTimeOffset from a DateTime and a TimeSpan, the behaviour is somewhat different:
- A UTC value is simple - you've given it UTC, and you want to represent "UTC + the specified offset"
- A local value is only sometimes valid: the constructor validates that the offset from UTC at the specified local time in the system default time zone is the same as the offset you've specified.
- An unspecified value is always valid, and represents the local time in some unspecified time zone, such that the offset is valid at the time.
I don't know about you, but this sort of thing gives me the semantic heebie-jeebies. It's like having a "number" type which has a sequence of digits - but you have to ask another property whether those digits are hex or decimal, and the answer can sometimes be "Well, what do you think?"
Of course, in .NET 1.1, DateTimeKind didn't even exist. This didn't mean the problem didn't exist - it means that the confusing behaviour which tries to make sense of a type which represents different kinds of value couldn't even try to be consistent. It had to be based on the context: it was as if it were permanently Unspecified.
Doesn't DateTimeOffset fix this?
Okay, so now we know we don't like DateTime much. Does DateTimeOffset help us? Well, somewhat. A DateTimeOffset value has a very definite meaning: it's a local date and time with a specific offset from UTC. I should probably take a moment to explain what I mean by "local" date and times - and instants - at this point.
A local date and time isn't tied to any particular time zone. At this moment, is it before or after "10pm on August 20th 2011"? It depends where you are in the world. (I'm leaving aside any non-ISO calendar representations for the moment, by the way.) So a DateTimeOffset contains a time-zone-independent component (that "10pm on ..." part) but also an offset from UTC - which means it can be converted to an instant on what I think of as the time line. Ignoring relativity, everyone on the planet experiences a a particular instant simultaneously. If I click my fingers (infinitely quickly!) then any particular event in the universe happened before that instant, at that instant or after that instant. Whether you were in a particular time zone or not is irrelevant. In that respect instants are global compared to the local date and time which any particular individual may have observed at a particular instant.
(Still with me? Keep going - I'm hoping that the previous paragraph will end up being the hardest in this post. It's a hugely important conceptual point though.)
So a DateTimeOffset maps to an instant, but also deals with a local date and time. That means it's not really an ideal type if we only want to represent a local date and time - but then neither is DateTime. A DateTime with a kind of DateTimeKind.Local isn't really local in the same sense - it's tied to the default time zone of the system it's running on. A DateTime with a kind of DateTimeKind.Unspecified is closer in some cases - such as when constructing a DateTimeOffset - but the semantics are odd in other cases, as described above. So neither DateTimeOffset nor DateTime are good types to use for genuinely local date and time values.
DateTimeOffset also isn't a good type to use if you want to tie yourself to a specific time zone, because it has no idea of the time zone which gave the relevant offset in the first place. As of .NET 3.5 there's a pretty reasonable TimeZoneInfo class, but no type which talks about "a local time in a particular time zone". So with DateTimeOffset you know what that particular time is in some unspecified time zone, but you don't know what the local time will be a minute later, as the offset for that time zone could change (usually due to daylight saving time changes).
What about dates and times?
So far I've only been talking about "date and time" values. What about date values and time values - values which only have one component or the other. It's more common to want to represent a date than a time, but both are common enough to be worth considering.
Now yes, you can use a DateTime for a date - heck, there's even the DateTime.Date property which will return the date for a particular date and time... but as another DateTime which happens to be at midnight. That's not at all the same as having a separate type which is readily identifiable as "just a date" (and likewise "just a time of day" - .NET uses TimeSpan for that, which again doesn't really feel quite right to me).
What about time zones themselves? Surely TimeZoneInfo is fine there.
As I said before, TimeZoneInfo isn't bad. It suffers from two major problems and some minor ones:
First, it's all based on Windows time zone IDs. That's natural enough - but it's not what the rest of the world uses. Every non-Windows system I've ever seen is based on the Olson (aka tz aka zoneinfo) time zone database, and the IDs assigned there. You may have seen IDs such as "Europe/London" or "America/Los_Angeles" - those are Olson identifiers. Talk to a web service offering geo information, chances are it'll talk in Olson identifiers. Interact with another calendaring system, chances are it'll talk in Olson identifiers. Now there are problems there too in terms of identifier stability, which the Unicode Consortium tries to address with CLDR... but at least you've got a good chance. It would be nice if TimeZoneInfo offered some kind of mapping between the two identifier schemes, or somewhere else in .NET did. (Noda Time knows about both sets of identifiers, although the mapping isn't publicly accessible just yet. This will be fixed before release.)
Second, it's based on DateTime and DateTimeOffset, which means you've got to be careful when you use it - if you assume one kind of DateTime when you're actually giving or receiving another kind, you may have problems. It's reasonably well documented, but frankly explaining this sort of thing is intrinsically hard enough without having to put everything in terms which are inconsistent.
Then there are a few issues around ambiguous or invalid local date and time values. These occur due to daylight saving changes: if the clock goes forward (e.g. from 1am to 2am) that introduces some invalid local date and time values (e.g. 1.30am doesn't occur on that day). If the clock goes backward (e.g. from 2am to 1am) that introduces ambiguities: 1.30am occurs twice. You can explicitly ask TimeZoneInfo whether a particular value is invalid or ambiguous, but it's easy to miss that it's even a possibility. If you try to convert a local value to a UTC value via a time zone, it will throw an exception if it's invalid but silently assume standard time (as opposed to daylight saving time) if it's ambiguous. That sort of decision leads developers to not even consider the possibilities involved. Speaking of which...
This all sounds too complicated.
You may be thinking at this point, "You're making a big deal out of nothing. I don't want to think about this stuff - why are you trying to make everything so complicated? I've been using the .NET API for years, and not had problems." If so, I suspect there are three broad possibilities:
- You're far, far smarter than I am, and understood all of these intricacies through intuition. Your code always makes use of the right kind of DateTime, uses DateTimeOffset appropriately, and will always do the right thing with invalid or ambiguous local date and time values. No doubt you also write lock-free multi-threaded code sharing state in a way which is as efficient as possible but still rock solid. What the heck are you doing reading this in the first place?
- You have run into these issues, but have mostly forgotten them - after all, they've only sucked away 10 minutes of your life at a time, as you experimented to get something that appeared to work (or at least made the unit tests pass; the unit tests which may well be conceptually wrong too). Maybe you've wondered about it, but decided that the problem was with you rather than the API.
- You've never seen the problems, but only because you don't bother testing your code, which has so far only ever run in a single time zone, on computers which are always turned off at night (thus missing all daylight saving transitions). In some ways you're lucky, but you've got a time zone.
Okay, that was somewhat facetious, but it really is a problem. If you've never really thought about the difference between "local" times and "global" instants before, you should have done. It's an important distinction - similar to the distinction between binary floating point and decimal floating point types. Failures can be subtle, hard to diagnose, hard to explain, hard to correct, pervasive, and easy to reintroduce at another point of the program.
Handling date and time values is intrinsically tricky. There are nasty cases to think about like days which don't start at midnight due to daylight saving changes (for example, Sunday October 17th 2010 in Brazil started at 1am). If you're particularly unlucky you'll have to work with multiple calendar systems (Gregorian, Julian, Coptic, Buddhist etc). If you deal with dates and time around the start of the 20th century you may see some very odd time zone transitions as countries went from strictly-longitudinal offsets to mostly "round" values (e.g. Paris in 1911). You may need to deal with governments changing time zone transitions with only a couple of weeks' notice. You may need to deal with time zone identifiers changing (e.g. Asia/Calcutta to Asia/Kolcata).
All of this is on top of the actual business rules you're trying to implement, of course. They may be complicated too. Given all this complexity, you should at least have an API which allows you to express what you mean relatively clearly.
So is Noda Time perfect then?
Of course not. Noda Time suffers several problems:
- Despite all of the above, I'm a rank amateur when it comes to the theory of date and time. Leap seconds baffle me. The thought of a Julian-Gregorian calendar with a cutover point makes me want to cry, which is why I haven't quite implemented it yet. As far as I'm aware, no-one involved in Noda Time is an expert - although Stephen Colebourne, the author of Joda Time and lead of JSR-310 lurks on the mailing list. (Point of trivia: He was present at my first presentation on Noda Time. I asked if anyone happened to know the difference between the Gregorian calendar and the ISO-8601 calendar. He raised his hand and gave the correct answer, obviously. I asked how he happened to know it, and he replied, "I'm Stephen Colebourne." I nearly collapsed.)
- We haven't finished yet. A beautifully designed API is useless if it isn't implemented.
- There are bound to be bugs - the BCL team's code is exercised on hundreds of thousands of machines around the world all the time. Errors are likely to be picked up quickly.
- We don't have any resources - we're a small group of active developers doing this for fun. I'm not saying that for pity (it's great fun) but for the inevitable issues around the amount of time that can be spent on features, documentation etc.
- We're not part of the BCL. Want to use Noda Time in a LINQ to SQL (or even NHibernate) query? Good luck with that. Even if we succeed beyond my expectations, I'm not expecting other open source projects to take a dependency on us for ages.
Having said that, I am pleased with the overall design. We've tried to keep a balance between flexibility and providing one simple way of achieving any particular goal (with more to do, of course). I'll write another post some time about the design style we've been gradually evolving towards, comparing it with both Joda Time and .NET. The best outcome is the set of types to come out of it, each of which has a reasonably clear role. I won't bore you with all the details here - see other posts, documentation etc.
Ironically, the best outcome for the world would probably be for the BCL team to pick up on this post and decide to overhaul the API radically for .NET 6 (I'm assuming the ship has effectively sailed on .NET 5). While I'm enjoying doing this, I'm sure there are other projects I'd enjoy too - and frankly date and time is too important a concept to rest on my shoulders for the .NET community for long.
I hope I've persuaded you that the .NET API has significant flaws. I may have also persuaded you that Noda Time is worth looking at more closely, but that's a secondary goal really. If you truly understand the flaws in the built-in types - in particular the semantic ambiguity around DateTime - then you're more likely to use those types carefully and accurately in your code. That alone is enough to make me happy.