Saturday, 20 August 2011

What's wrong with DateTime anyway?

A few times after tweeting about Noda Time, people have asked why they should use Noda Time - they believe that the .NET date and time support is already good enough. Now obviously I haven't seen their code, but I suspect that pretty much any code base doing any work with dates and times will be clearer using Noda Time - and quite possibly more correct, due to the way that Noda Time forces you into making some decisions which are murky in .NET. This post is about the shortcomings of the .NET date and time API. Obviously I'm biased, and I hope this post isn't seen as disrespectful to the BCL team - aside from anything else, they work under a different set of constraints regarding COM interop etc.

What's does a DateTime mean?

When there's a Stack Overflow question about DateTime not doing quite what the questioner expected, I often find myself wondering what a particular value is meant to represent, exactly. It sounds simple - it's a date and time, right? But it gets rather more complicated as soon as you start thinking about it more carefully. For example, assuming the clock doesn't tick between the two property invocations, what should the value of "mystery" be in the following snippet?

DateTime utc = DateTime.UtcNow;
DateTime local = DateTime.Now;
bool mystery = local == utc;

I honestly don't know what this will do. There are three options which all make a certain amount of sense:

  • It should always be true: the two values are associated with the same instant in time, it's just that one is expressed locally and one is expressed universally
  • It should always be false: the two values represent different kinds of data, so are automatically unequal
  • It should return true if your local time zone is currently in sync with UTC, i.e. when time zones are disregarded completely, the two values are equal

I don't care much what the actual behaviour is - the fact that the behaviour is unobvious is a symptom of a deeper problem. It all comes back to the DateTime.Kind property which allows a DateTime to represent one of three kinds of value:

  • DateTimeKind.Utc: A UTC date and time
  • DateTimeKind.Local: A date and time which is a local time for the system the code is executing on
  • DateTimeKind.Unspecified: Um, tricky. Depends on what you do with it.

The value of the property affects various different operations in different ways. For example, if you call ToUniversalTime() on an "unspecified" DateTime, it will assume that you really meant it as a local value before. On the other hand, if you call ToLocalTime() on an "unspecified" DateTime, it will assume that you really meant it as a UTC value before. That's one model of behaviour.

If you construct a DateTimeOffset from a DateTime and a TimeSpan, the behaviour is somewhat different:

  • A UTC value is simple - you've given it UTC, and you want to represent "UTC + the specified offset"
  • A local value is only sometimes valid: the constructor validates that the offset from UTC at the specified local time in the system default time zone is the same as the offset you've specified.
  • An unspecified value is always valid, and represents the local time in some unspecified time zone, such that the offset is valid at the time.

I don't know about you, but this sort of thing gives me the semantic heebie-jeebies. It's like having a "number" type which has a sequence of digits - but you have to ask another property whether those digits are hex or decimal, and the answer can sometimes be "Well, what do you think?"

Of course, in .NET 1.1, DateTimeKind didn't even exist. This didn't mean the problem didn't exist - it means that the confusing behaviour which tries to make sense of a type which represents different kinds of value couldn't even try to be consistent. It had to be based on the context: it was as if it were permanently Unspecified.

Doesn't DateTimeOffset fix this?

Okay, so now we know we don't like DateTime much. Does DateTimeOffset help us? Well, somewhat. A DateTimeOffset value has a very definite meaning: it's a local date and time with a specific offset from UTC. I should probably take a moment to explain what I mean by "local" date and times - and instants - at this point.

A local date and time isn't tied to any particular time zone. At this moment, is it before or after "10pm on August 20th 2011"? It depends where you are in the world. (I'm leaving aside any non-ISO calendar representations for the moment, by the way.) So a DateTimeOffset contains a time-zone-independent component (that "10pm on ..." part) but also an offset from UTC - which means it can be converted to an instant on what I think of as the time line. Ignoring relativity, everyone on the planet experiences a a particular instant simultaneously. If I click my fingers (infinitely quickly!) then any particular event in the universe happened before that instant, at that instant or after that instant. Whether you were in a particular time zone or not is irrelevant. In that respect instants are global compared to the local date and time which any particular individual may have observed at a particular instant.

(Still with me? Keep going - I'm hoping that the previous paragraph will end up being the hardest in this post. It's a hugely important conceptual point though.)

So a DateTimeOffset maps to an instant, but also deals with a local date and time. That means it's not really an ideal type if we only want to represent a local date and time - but then neither is DateTime. A DateTime with a kind of DateTimeKind.Local isn't really local in the same sense - it's tied to the default time zone of the system it's running on. A DateTime with a kind of DateTimeKind.Unspecified is closer in some cases - such as when constructing a DateTimeOffset - but the semantics are odd in other cases, as described above. So neither DateTimeOffset nor DateTime are good types to use for genuinely local date and time values.

DateTimeOffset also isn't a good type to use if you want to tie yourself to a specific time zone, because it has no idea of the time zone which gave the relevant offset in the first place. As of .NET 3.5 there's a pretty reasonable TimeZoneInfo class, but no type which talks about "a local time in a particular time zone". So with DateTimeOffset you know what that particular time is in some unspecified time zone, but you don't know what the local time will be a minute later, as the offset for that time zone could change (usually due to daylight saving time changes).

What about dates and times?

So far I've only been talking about "date and time" values. What about date values and time values - values which only have one component or the other. It's more common to want to represent a date than a time, but both are common enough to be worth considering.

Now yes, you can use a DateTime for a date - heck, there's even the DateTime.Date property which will return the date for a particular date and time... but as another DateTime which happens to be at midnight. That's not at all the same as having a separate type which is readily identifiable as "just a date" (and likewise "just a time of day" - .NET uses TimeSpan for that, which again doesn't really feel quite right to me).

What about time zones themselves? Surely TimeZoneInfo is fine there.

As I said before, TimeZoneInfo isn't bad. It suffers from two major problems and some minor ones:

First, it's all based on Windows time zone IDs. That's natural enough - but it's not what the rest of the world uses. Every non-Windows system I've ever seen is based on the Olson (aka tz aka zoneinfo) time zone database, and the IDs assigned there. You may have seen IDs such as "Europe/London" or "America/Los_Angeles" - those are Olson identifiers. Talk to a web service offering geo information, chances are it'll talk in Olson identifiers. Interact with another calendaring system, chances are it'll talk in Olson identifiers. Now there are problems there too in terms of identifier stability, which the Unicode Consortium tries to address with CLDR... but at least you've got a good chance. It would be nice if TimeZoneInfo offered some kind of mapping between the two identifier schemes, or somewhere else in .NET did. (Noda Time knows about both sets of identifiers, although the mapping isn't publicly accessible just yet. This will be fixed before release.)

Second, it's based on DateTime and DateTimeOffset, which means you've got to be careful when you use it - if you assume one kind of DateTime when you're actually giving or receiving another kind, you may have problems. It's reasonably well documented, but frankly explaining this sort of thing is intrinsically hard enough without having to put everything in terms which are inconsistent.

Then there are a few issues around ambiguous or invalid local date and time values. These occur due to daylight saving changes: if the clock goes forward (e.g. from 1am to 2am) that introduces some invalid local date and time values (e.g. 1.30am doesn't occur on that day). If the clock goes backward (e.g. from 2am to 1am) that introduces ambiguities: 1.30am occurs twice. You can explicitly ask TimeZoneInfo whether a particular value is invalid or ambiguous, but it's easy to miss that it's even a possibility. If you try to convert a local value to a UTC value via a time zone, it will throw an exception if it's invalid but silently assume standard time (as opposed to daylight saving time) if it's ambiguous. That sort of decision leads developers to not even consider the possibilities involved. Speaking of which...

This all sounds too complicated.

You may be thinking at this point, "You're making a big deal out of nothing. I don't want to think about this stuff - why are you trying to make everything so complicated? I've been using the .NET API for years, and not had problems." If so, I suspect there are three broad possibilities:

  • You're far, far smarter than I am, and understood all of these intricacies through intuition. Your code always makes use of the right kind of DateTime, uses DateTimeOffset appropriately, and will always do the right thing with invalid or ambiguous local date and time values. No doubt you also write lock-free multi-threaded code sharing state in a way which is as efficient as possible but still rock solid. What the heck are you doing reading this in the first place?
  • You have run into these issues, but have mostly forgotten them - after all, they've only sucked away 10 minutes of your life at a time, as you experimented to get something that appeared to work (or at least made the unit tests pass; the unit tests which may well be conceptually wrong too). Maybe you've wondered about it, but decided that the problem was with you rather than the API.
  • You've never seen the problems, but only because you don't bother testing your code, which has so far only ever run in a single time zone, on computers which are always turned off at night (thus missing all daylight saving transitions). In some ways you're lucky, but you've got a time zone.

Okay, that was somewhat facetious, but it really is a problem. If you've never really thought about the difference between "local" times and "global" instants before, you should have done. It's an important distinction - similar to the distinction between binary floating point and decimal floating point types. Failures can be subtle, hard to diagnose, hard to explain, hard to correct, pervasive, and easy to reintroduce at another point of the program.

Handling date and time values is intrinsically tricky. There are nasty cases to think about like days which don't start at midnight due to daylight saving changes (for example, Sunday October 17th 2010 in Brazil started at 1am). If you're particularly unlucky you'll have to work with multiple calendar systems (Gregorian, Julian, Coptic, Buddhist etc). If you deal with dates and time around the start of the 20th century you may see some very odd time zone transitions as countries went from strictly-longitudinal offsets to mostly "round" values (e.g. Paris in 1911). You may need to deal with governments changing time zone transitions with only a couple of weeks' notice. You may need to deal with time zone identifiers changing (e.g. Asia/Calcutta to Asia/Kolcata).

All of this is on top of the actual business rules you're trying to implement, of course. They may be complicated too. Given all this complexity, you should at least have an API which allows you to express what you mean relatively clearly.

So is Noda Time perfect then?

Of course not. Noda Time suffers several problems:

  • Despite all of the above, I'm a rank amateur when it comes to the theory of date and time. Leap seconds baffle me. The thought of a Julian-Gregorian calendar with a cutover point makes me want to cry, which is why I haven't quite implemented it yet. As far as I'm aware, no-one involved in Noda Time is an expert - although Stephen Colebourne, the author of Joda Time and lead of JSR-310 lurks on the mailing list. (Point of trivia: He was present at my first presentation on Noda Time. I asked if anyone happened to know the difference between the Gregorian calendar and the ISO-8601 calendar. He raised his hand and gave the correct answer, obviously. I asked how he happened to know it, and he replied, "I'm Stephen Colebourne." I nearly collapsed.)
  • We haven't finished yet. A beautifully designed API is useless if it isn't implemented.
  • There are bound to be bugs - the BCL team's code is exercised on hundreds of thousands of machines around the world all the time. Errors are likely to be picked up quickly.
  • We don't have any resources - we're a small group of active developers doing this for fun. I'm not saying that for pity (it's great fun) but for the inevitable issues around the amount of time that can be spent on features, documentation etc.
  • We're not part of the BCL. Want to use Noda Time in a LINQ to SQL (or even NHibernate) query? Good luck with that. Even if we succeed beyond my expectations, I'm not expecting other open source projects to take a dependency on us for ages.

Having said that, I am pleased with the overall design. We've tried to keep a balance between flexibility and providing one simple way of achieving any particular goal (with more to do, of course). I'll write another post some time about the design style we've been gradually evolving towards, comparing it with both Joda Time and .NET. The best outcome is the set of types to come out of it, each of which has a reasonably clear role. I won't bore you with all the details here - see other posts, documentation etc.

Ironically, the best outcome for the world would probably be for the BCL team to pick up on this post and decide to overhaul the API radically for .NET 6 (I'm assuming the ship has effectively sailed on .NET 5). While I'm enjoying doing this, I'm sure there are other projects I'd enjoy too - and frankly date and time is too important a concept to rest on my shoulders for the .NET community for long.

Conclusion

I hope I've persuaded you that the .NET API has significant flaws. I may have also persuaded you that Noda Time is worth looking at more closely, but that's a secondary goal really. If you truly understand the flaws in the built-in types - in particular the semantic ambiguity around DateTime - then you're more likely to use those types carefully and accurately in your code. That alone is enough to make me happy.

Friday, 19 August 2011

First NuGet package of Noda Time available - feedback please!

Having been stuck in a bit of an API dilemma recently and having also found time to release Unconstrained Melody as a NuGet package, I figured it would be worth doing the same for Noda Time, in the hope of getting more potential "customers" to have a look. Even though fetching the source and building it is pretty painless, it's more than I'd expect for a casual onlooker, where NuGet makes the whole process pretty simple.

So, while you're reading the rest of this post, why not download the Noda Time 0.1 packages? There are three to pick up, although you only really need the first:

  • NodaTime: the core library
  • NodaTime.Experimental: our sort of sandbox; when there are multiple ways of attacking a problem, I'll probably try to have them all in here at the same time. This only works for new classes and things which can be implemented as extension methods of course - but that should cover a lot of ground. This is more for comment than production use, basically.
  • NodaTime.Testing: Extra library which is designed for testing code which uses Noda Time - for example, a fake clock to be used where you're passing in IClock for dependency injection.

What do I need to know?

It's worth being aware of the core concepts in Noda Time:

  • Instant: defines a universal point in time (we don't handle relativity) without reference to a time zone or calendar system. (Internally it's the number of ticks since the Unix epoch, but don't let that concern you too much.)
  • CalendarSystem: A way of breaking down time into years, months, days, hours, minutes etc. You'll almost certainly only want to use the ISO calendar system, which is the universal default in Noda Time.
  • DateTimeZone: Essentially, a mapping between UTC and local time. But full of subtleties like local times that don't exist or are ambiguous...
  • LocalDateTime: A date and time in a particular calendar but not in a particular time zone. Something like "15th January 2010, 10:30am (ISO)". This is the type you're actually most likely to perform arithmetic - it makes sense to do things like adding a week or a month. Closely related are LocalDate and LocalTime, for situations where you only need a date or only need a time of day respectively.
  • ZonedDateTime: A date and time in a particular calendar and time zone. Think of it as a local date and time + DateTimeZone + offset from UTC. A ZonedDateTime can be mapped unambiguously to an instant.

All of the above are immutable in Noda Time. All except CalendarSystem and DateTimeZone are structs.

Those are the important concepts. There are some more details on the wiki page, but hopefully that's enough to get you going.

What's in the package?

Currently, I'm reasonably confident in the API and implementation of:

  • CalendarSystem (ISO, Gregorian, Julian and Coptic are implemented; a few more to come)
  • DateTimeZone (Olson identifiers; we have a mapping for the Windows identifiers, but I don't think we expose them yet.)
  • ZonedDateTime: Conversion to and from a LocalDateTime (with various options for handling awkward situations)
  • LocalDateTime / LocalDate / LocalTime: Various conversions between them (and construction of course). Arithmetic adding periods (e.g. "this date/time plus two weeks") is implemented but may not be the final API or semantics; manipulation is still up in the air. There are some extension methods in the experimental package to say localDateTime.WithMonthOfYear(10) etc - other options include a builder API, more direct ways of adding periods localDateTime.AddDays(10) and more. Feedback very welcome on this.
  • Instant/Duration/Interval: These are fairly primitive types which don't have much need for extra work

LocalDateTime and ZonedDateTime have conversions to/from .NET types (DateTime and DateTimeOffset) - that may be your best bet for getting "into" the API, unless you want to construct values explicitly.

What's not there yet?

The big feature missing from all of this is formatting and parsing. We have an implementation for Instant and Offset, but those are fairly limited types - obviously ZonedDateTime and LocalDateTime are the big ones here. It's worth understanding the basic plan though...

For each appropriate type, we're going to support the framework convention of ToString on the type itself, as well as static Parse and TryParse methods. However, I personally regard those as legacy approaches. There'll also be INodaFormatter<T> and INodaParser<T> which define interfaces for formatting and parsing - with concrete implementations like LocalDateTimeParser etc. These will be thread-safe and reusable, so you'll be able to set up the parser or formatter once for each pattern you want to use. I see this as giving better maintainability and performance, as well feeling like better OO. Thoughts on this general principle would be welcome - the actual interfaces are there now, but somewhat in flux.

We have yet to decide how to expose what I'm calling pseudo-mutators - methods which get you from one value to another, e.g. the WithYearOfMonth method mentioned earlier. What we're really missing is use cases, so if you can dump a list of feature requests based on what you actually want to do that would be wonderful.

There are no doubt some other features we'll want for v1.0 - and some that might make it, but may not. If there's something you regard as a must-have, holler about it.

Where's the documentation?

The API documentation is actually pretty reasonable - at least, some exists for pretty much everything (no warnings about missing summaries) but it may very well be skimpy in places, and there's just a chance some of it is out of date. Before the full release we'll do a proper pass through, of course. This documentation is built nightly (or on demand) so by the time you read it, it may not reflect the package you've downloaded, if I've applied some fixes etc. I wouldn't expect any really big changes in the very near future though. Before release I'll set up versioned documentation.

The project wiki is in a worse state - the key concepts page is reasonably accurate, but obviously there's a lot of work to be done overall.

How do I give feedback?

The main point of releasing 0.1 as a NuGet package is to get feedback. Lots of feedback. This should be done on the project issues page. If it's a non-trivial matter which you'd like to discuss further, then it'd be great if you'd join the Google group / mailing list, but don't feel you have to get involved to leave feedback.

While high quality feedback is obviously desirable, at this point I'd really just like lots of it. So in particular:

  • If the NuGet packaging is wrong, please let me know - I'm a complete newbie on this.
  • I'm not going to care much whether you report something as a bug or a feature request.
  • If you don't have time to see if your feedback is already covered elsewhere, don't bother.
  • Even if it's only a vague idea, I still want to hear it.
  • Positive feedback is useful too - say what you like as well as what you don't.

Obviously I'll go through and prune the feedback, mark duplicates etc, but that's work that I can do afterwards. If you find any "barriers to entry" when it comes to giving feedback, let me know by email (skeet@pobox.com) and I'll do what I can do fix it.