The nroff formatted source file to the man page for the curl command line tool was some 110K and consisted of more than 2500 lines by the time this overhaul, or disentanglement if you will, started. At the moment of me writing this, the curl version in git right now, supports 204 command line options.
Working with such a behemoth of a document has gotten a bit daunting to people and the nroff formatting itself is quirky and esoteric. For some time I've also been interested in creating some sort of system that would allow us to generate a single web page for each individual command line option. And then possibly allow for expanded descriptions in those single page versions.
To avoid having duplicated info, I decided to create a new system in which we can document each individual command line option in a separate file and from that collection of hundreds of files we can generate the big man page, we can generate the "curl --help" output and we can create all those separate pages suitable for use to render web pages. And we can automate some of the nroff syntax to make it less error-prone and cause less sore eyes for the document editors!
With this system we also get a unified handling of things added in certain curl versions, affecting only specific protocols or dealing with references like "see also" mentions. It gives us a whole lot of meta-data for the command line options if you will and this will allow us to do more fun things going forward I'm sure.
You'll find the the new format documented, and you can check out the existing files to get a quick glimpse on how it works. As an example, look at the --resolve documentation source.
Today I generated the first full curl.1 replacement and pushed to git, but eventually that file will be removed from git and instead generated at build time by the regular build system. No need to commit a generated file in the long term.
As I mentioned before, the curl documentation needs improvement. As a first step I converted the man page for curl_easy_setopt into no less than 210(!) individual man pages. One new for each option the function supports.
The man page was originally (a few days ago) almost 3000 lines, and now with them all split up we end up with a lot more text. This because the new format encourages more text per option and each page now has to detail itself more. This should also make each option much easier to google/search and to link to when we help users understand the options.
I've made some server-side scripts to generate html versions of them all, I generate a list of all options we have and the examples we host on the web site now have all mentions of the options linked directly to these new pages.
The curl_easy_setopt man page will then get most explanations cut out and mainly be used as an index with the options grouped into logical sections to help users find the options they want to use. I could cut out almost 2500 lines.
The new man pages add about 7500 lines of documentation (excluding the headers in each file)...
Many moons ago I created a little tool I named roffit. It is just a tiny perl script that converts a man page written in the nroff format to good-looking HTML. I should perhaps also add that I didn't find any decent alternatives then so I wrote up my own version. I've been using it since in projects such as curl, c-ares and libssh2 to produce web versions of the docs.
It has just done its job and I haven't had any needs to fiddle with it. The project page lists it as last modified in 2004, even though I actually moved it to a sourceforge CVS repo back in 2007.
Just a few days ago, I got emailed and was notified that Debian has it included as a package in the distribution and someone was annoyed on some particular flaws.
This resulted in a bunch of bugs getting submitted to the Debian bug tracker, I started up the brand new roffit-devel mailing list to easier host roffit discussions and I switched over the CVS repo to a git one on github.
If you like seeing man pages turned into web pages, consider joining up and help us improve this thing!
There's one thing the GNU project has done wrong (and thus the followers of it, like the Debian Linux distribution and others) and it is with their stupid preference to not provide proper man pages but instead insist that the user runs "info [whatever]". In Debian you also very often have to install a separate doc package to get those info files, and I fail to see the logic in providing tools and libs etc without the proper docs. (and in fact in many cases the info page shows the man page until you get that proper package installed!)
Man pages may not be the best format in the world for docs, but I rather have a proper man page for all commands and then I'll go html online for extended information. Info is just plain annoying and we should bury it. The sooner the better!
And yes, it is not a coincidence that no project I'm actively driving as a proper contributor are producing any Info documents...