Saturday 10 April 2010

Documentation with Sandcastle - a notebook

(Posted to both my main code blog and the Noda Time blog.)

I apologise in advance if this blog post becomes hard to get good information from. It's a record of trying to get Sandcastle to work for Noda Time; as much as anything it's meant to be an indication of how smooth or otherwise the process of getting started with Sandcastle is. My aim is to be completely honest. If I make stupid mistakes, they'll be documented here. If I have decisions to make, they'll be documented here.

I should point out that I considered using NDoc (it just didn't make sense to use a dead project) and docu (I'm not keen on the output style, and it threw an exception when I tried running it on Noda Time anyway). I didn't try any other options as I'm unaware of them. Hmm.

Starting point and aims

My eventual aim is to include "build the docs" as a task in the build procedure for Noda Time. I don't have much in the way of preconceived ideas of what the output should be: my guess is a CHM file and something to integrate into MSDN, as well as some static web pages. Ideally I'd like to be able to put the web pages on the Google Code project page, but I don't know how feasible that will be. If the best way forward turns out to be something completely different, that's fine.

(I've mailed Scott Hanselman and Matt Hawley about the idea of having an ASP.NET component of some form which could dynamically generate all this stuff on the fly - you'd just need to upload the .xml and .dll files, and let it do the rest. I'm not expecting that idea to be useful for Noda Time in the near future, but you never know.)

Noda Time has some documentation, but there are plenty of public members which don't have any XML documentation at all at the moment. Obviously there's a warning available for this so we'll be able to make sure that eventually everything's documented, but we also need to be able to build documentation before we reach that point.

Step 0: building the XML file

The build project doesn't currently even create the .xml file, so that's the first port of call - just a case of ticking a box and then changing the default filename slightly... because for some bizarre reason, Visual Studio defaults to creating a ".XML" file instead of ".xml". Why? Where else are capitals used in file extensions?

Rebuild the solution, gaze in fear at the 496 warnings generated, and we have everything we should need from Visual Studio. My belief is that I should now be able to close Visual Studio and not reopen it (with the Noda Time solution, anyway) during the course of this blog post.

Step 1: building Sandcastle

First real choice: which version of Sandcastle do I go for? There was a binary release on May 29th 2008, a source release on July 2nd 2008, and three commits to source control since then, the latest of which was in July 2009. Personally I like the idea of not having to actually install anything: tools which can just be run without installation are nicer for Open Source projects, particularly if you can check the binaries into source control with appropriate licence files. That way anyone can build after just fetching. On the other hand, I'm not sure how well the Sandcastle licence fits in with the Apache 2 licence we're using for Noda Time. I can investigate that later.

What the heck, let's try building it from source. It's probably easier to go from that to the installed version than the other way round. Change set 26202 downloaded and unpacked... now how do we build it, and what do we need to build? Okay, there's a solution file, which opens up in VS2008 (unsurprising and not a problem). Gosh, what a lot of projects (but no unit tests?) - still, everything builds with nary a warning. I've no idea what to do with it now, but it's a start. It looks like it's copied four executables and a bunch of DLLs into the ProductionTools directory, which is promising.

Shockingly, it's only just occurred to me to check for some documentation to see whether or not I'm doing the right thing. Looking at the Sandcastle web page, it seems I'm not missing much. Well, I was aware that this might be the case.

Step 2: Sandcastle Help File Builder

I've heard about SHFB from a few places, and it certainly sounds like it's the way to go - it even has a getting started guide and installation instructions! It looks like there's a heck of a lot of documentation for something sounds like it should be simple, but hey, let's dive in. (I know it sounds inconsistent to go from complaining about no documentation to complaining about too much - but I'm really going from complaining about no documentation to complaining about something being so complicated that it needs a lot of documentation. I'm very grateful to the SHFB team for documenting everything, even if I plan to read it on a Just-In-Time basis.)

A few notes from the requirements page:

  • It looks like I'll need to install the HTML Help Workshop if I want CHM files; the Help 2 compiler should already be part of the VS2008 SDK which I'm sure is already installed. I have no idea where Help 3 fits into this :(
  • It looks like I need a DXROOT environment variable pointing at my Sandcastle "installation". I wonder what that means in my home-built version? I'll assume it just means the Development directory containing ProductionTools and ProductionTransforms.
  • There's a further set of patches available in the Sandcastle Styles project. Helpfully, this says it includes all the updates in the July 2009 source code, and can be applied to the binary installation from May 2008. It's not clear, however, whether it can also be applied to a home-built version of Sandcastle. Given that I can get all the latest stuff in conjunction with an installed version, it sounds like it's worth installing the binary release after all. (Done, and patches installed.)
  • It sounds like I need to install the H2 Viewer and H2Reg. (I suspect that H2Reg will be something we direct our users at rather than shipping and running ourselves; I don't intend to have an MSI-style "installer" for Noda Time at the moment, although the recent CoApp announcement sounds interesting. It's too early to worry about that for the moment though.)
  • We're not documenting a web site project, so I'm not bothering with "Custom Web Code Providers". I've installed quite enough by this point, thank you very much. Oh, except I haven't installed SHFB itself yet. I'd better do that now...

Step 3: creating a Help File Builder project

This feels like it could be reasonably straightforward, so long as I don't try to do anything fancy. Let's follow (roughly) the instructions. (I'm doing it straight to Noda Time rather than using the example project.)

Open the GUI, create a new project, add a documentation source of NodaTime.csproj... and hit Build Project. Wow, this takes quite a while - and this is a pretty beefy laptop. However, it seems to work! I have a CHM file which looks like it includes all the right stuff. Hoorah! It's a pretty huge CHM file (just over 3MB) for a relatively small project, but never mind.

Let's build it again, this time with all the output enabled (Help 1, Help 2, MSHelpViewer and Website).

Hmm... no MS Help 2 compiler found. Maybe I didn't have the VS2008 SDK installed after all. After a bit of hunting, it's here. Time to install it - and make sure it doesn't mess up the Sandcastle installation, as the SHFB docs warned me about. Yikes - 109MB. Ah well.

Okay, so after the SDK installation, rebuild the help... which will take even longer of course, as it's now building four different output formats. 3 minutes 18 seconds in the end... not too bad, but not something I'll want to do after every build :)

Step 4: checking the results

  • Help 1 (CHM): looks fine, if old-fashioned :)
  • Help 2 (HxS): via H2Viewer, looks fine - I'm not sure whether I dare to integrate it with MSDN just yet though.
  • ASP.NET web site: works even in Chrome
  • Static HTML: causes Chrome to flicker, constantly reloading. Works fine in Firefox. Maybe I need to submit a bug report.

I'm not entirely sure which output option corresponds to which result here; in particular, is "Website" the static one or the ASP.NET one? What's MSHelpViewer? It's easy enough to find out of course - I'll just experiment at a later date.

Step 5: building from the command line

I can't decide whether this is crucial (as it should be part of a continuous build server) or irrelevant (as there are so many tools to install, I may never get the ability to run a CB server with everything installed). However, it would certainly be nice.

Having set SHFBROOT appropriately, running msbuild gives this error:

SHFB : error BE0067: Unable to obtain assembly name from project file '[...]' using Configuration 'Debug', Platform 'X64'

Using Debug is definitely correct, but X64 sounds wrong... I suspect I want AnyCPU instead. Fortunately, this can be set in the SHFB project file (it was previously just defaulting). Once that's been fixed, the build works with one warning: BHT0001: Unable to get executing project: Unable to obtain internal reference. Supposedly this may indicate a problem in SHFB itself... I shall report it later on. It doesn't seem to affect the ability to produce help though.

Conclusion

That wasn't quite as painful as I'd feared. I'm nearly ready to check in the SHFB project file now - but I need to work out a few other things first, and probably create a specific "XML" configuration for the main project itself. I'm somewhat alarmed at the number of extra bits and pieces that I had to install though - and the lack of any mention of Help 3 is also a bit worrying.

I've just remembered one other option that I haven't tried, too - MonoDoc. I may have another look at that at a later date, although the fact that it needs a GTK# help viewer isn't ideal.

I still think the Open Source community for .NET has left a hole at the moment. It may be that SHFB + Sandcastle is as good as it gets, given the limitations of how much needs to be installed to build MS help files. I'd still like to see a better way of providing docs for web sites though... ideally one which doesn't involve shipping hundreds of files around when only two are really required.

Friday 9 April 2010

Time zones, singletons, deployment options, names and source control

Yesterday (April 8th), the province of San Luis in Argentina declared that it wouldn't be observing a daylight saving transition after all. It was due to go back to standard time on April 11th. Thanks for the heads-up, guys. (The rest of Argentina isn't observing daylight saving at all at the moment - another decision which came with little warning.)

This highlights an important fact about time zones: they're more volatile than you might expect. This affects Noda Time in interesting ways.

Switching time zone provider

Obviously we ship with a time zone database (based on zoneinfo data) but when we eventually get to the stage of having public releases, we don't want to have to ship another release at the whim of politicians. We already have the code to load a time zone database (in our own compressed format) from a resource; we ought to expose the ability to do it from an arbitrary stream as well. (The code is there, we just need to expose it.)

So, we can then ship new time zone files and developers could use them... if they decide to care. There are two ways they could do this. First (either way) they need to load the database as an IDateTimeZoneProvider. After that, they could pass that around as a dependency everywhere that requires a time zone based on an ID. Alternatively, they could add it to the "system" time zone provider, which will make DateTimeZones.ForID(...) work.

Normally, I would absolutely say it's worth going with the dependency injection approach... but do you really want that dependency everywhere? It's not something you expect to need to mock out for tests (unlike a clock, for example). I'm torn between the "wrongness" of the singleton system provider list and the pragmatic approach it provides. Ultimately, I suspect we'll make sure that both options work (I believe we'll need to expose the built-in database as a provider, which I don't think we do at the moment) and let individual teams decide what works best for them. Even that feels slightly wrong though.

There's another awkward decision around practicality, too. If you're not going to use the built-in database, why do you even want it to be present?

Deployment options

It's extremely convenient that NodaTime comes as a single DLL with no extra dependencies. You can just drop it in, no problem. It doesn't need to read any files - just the built-in resources. That's handy in terms of sandboxing and permissions... but doesn't lend itself to having a split between code and data.

We could potentially have two different versions: a "skinny" build which doesn't include the time zone database, and a "fat" build which does. The skinny build would have automatically try to load the time zone database from some well-known place... but how?

It could be a dependent assembly - but that means it's got to be strongly named (as Noda Time itself is) which could make it hard for developers to build their own time zone database assembly if they want it to be trusted by a release build (which will be signed with a private snk file only available to core Noda Time developers).

It could be a satellite assembly, which comes with the same problems but may be slightly simpler in terms of the code to load it - and may even mean that there's no difference in the code used by the fat and skinny builds to load the "default" database.

It could be a file, but then we get back into the business of permissions and trust, as well as using reflection to find where the heck the assembly is in the first place (blech).

In short, it's all a bit of a mess - but I'm hoping we'll find some pleasant way of coping with it which makes it a breeze to replace the database without any inefficiency. Of course, even having two builds (per platform, don't forget!) is a pain to start with. More testing etc. That isn't even the limit of the problem...

Changing names

This isn't the first bit of time zone fun I've had this week. Without going into details, I've been caught out by the fact that Asia/Calcutta changed to Asia/Colombo at some point.

Currently, I don't believe Noda Time has any support for this sort of change. Depending on what's in the zoneinfo database, we may have some aliases - but I don't believe we've got any "translation" support; a sort of "if I've got this time zone name, what else might it be called?" piece of functionality.

At some point, I need to think about this further. The idea that your data could become invalid at any moment (and that new data may be unreadable by old systems) scares me.

Versioned time zone databases - and dates

All of this makes me think of source control. We're used to the idea of "a specific path, at a particular revision" (where a revision can be a number or a hash or whatever). The mess of changing time zones suggests we need to apply the same sort of idea to dates and times. So we might have a zoned instant represented by "2010-04-12T18:42:00 America/Argentina/San_Luis@2010-04-07T00:00:00Z". In other words, "the instant that we thought would be represented by April 12th 2010, 6:42pm in San Luis, when we considered it on April 7th".

Somehow, I can't see that taking off. It's worth reading Tony Finch's thoughts on the same issue. I don't know to what extent I agree with his suggestions, but they're interesting ideas.

We really have screwed things up, haven't we?