Tag Archives: documentation

Making world-class docs takes effort

Here are six requirements that I have on a project for it to reach my gold approval for stellar docs. Then something about what I’ve done recently to further improve the docs for curl.

Your docs belong in the code repository

It needs to be next to the code so that authors and contributors can update/read the docs while working on the code or docs. Providing it in a separate repository or otherwise separated will undoubtedly lead to discrepancies sooner or later. Similar to how all wikis are always wrong.

Your docs is not extracted from code

Admit it. If you browse around you will realize that the best documented projects you find never provide that docs generated directly from code. Such generated docs can still provide value, but you will not reach gold level without more effort.

Separated docs encourage more writing and writing by people who are otherwise scared of touching the code or thinking that fixing spelling errors in the docs isn’t worth patching the code for. It also allows for other and better formatting on the docs.

Your docs features examples

Users always ask for (more) examples. You can never go wrong by providing examples. If your docs don’t have enough examples, you’re not doing the docs good enough yet.

You document every API call you provide

Fewer things make me sigh more than when I have to dig up the source code and from that try to figure out exactly how an external and publicly provided API works. Yet this still happens regularly even for libraries that have been released and maintained for decades.

In one fairly recent instance I reported such an omission in a popular library. It took them over two years to add it.

Long-living libraries should also provide information about from which versions certain functionality or options exist or can be expected to work or not work.

Your docs is easily accessible and browsed

Good docs means that we can find what we’re looking for and that the documentation flows and is easily read and understood. Ideally, even simple google searches for API details in your library should lead us to suitable entry points.

Preferably, the documentation should also be provided for proper off-line reading, meaning man pages or something similar that can be browsed when disconnected from the Internet.

Your docs should be easy to contribute to

The docs should be easy for contributors to help out with (independently from the code if desired). That also includes that they should be easy for contributors to build and render locally so that they can test and view their updates while working on them.

Documentation in the curl project

I want the documentation for curl and libcurl to be known, recognized and widely admitted to be world-class.

I want the curl documentation to be of a quality and content to make users not able to find competitors or similar projects with better docs.

Documentation in curl is not an after-thought. It is not a second-tier component. It is a crucial and important foundation that allows users to use, trust and rely on our products. We require that new changes or improved functionality are provided with the corresponding updated and accurate documentation.

We also try to verify and check the docs as much as possible with scanners , tests and tools.

Non-stop iterating is key

I maintain that our documentation is as good as it is today a lot thanks to us very rigidly sticking to our guiding principles: compatibility and not breaking existing behavior. Documentation we wrote decades ago is still valid. It gives us plenty of time to keep refining and polishing the documentation of a feature that doesn’t change. No documentation was perfect already at the first attempt, but after numerous iterations and improvements chances are it is better. Time is on our side. And we are never done, documentation can always be improved.

I’m putting in the work

A few days ago I talked documentation with someone and when doing so I thought about what guiding principles I think we should put on project documentation. What I’ve listed above basically.

In then dawned on me that the current man curl.1 man page is actually not featuring that many examples, in spite of it being 3535 lines long. I pondered a little bit on how best add some, and then dove in and extended our system that generates it from the hundreds of individual files that each describe a single command line option. They should of course all offer at least one example!

Having 242 command line options (as of right now, it will be more soon), that sudden idea that seemed simple enough turned into a quite gruesome work and I spent many hours walking over the options to make up examples. I also made sure that our build system now returns an error if there’s a command option without an example in the documentation! This way, we can be sure that also all future command line options will have examples in the man page.

This made the curl.1 man page grew with over 1200 lines!

libcurl options too

A few years ago I did a manual effort and made sure most man pages for libcurl options include examples, but I never made that into a test or anything so there’s nothing that forces us to stick to this.

Having started this journey, I decided now was the time to add that requirement to the scripts. I extended test case 1173, which already scans all man pages to verify some basic syntax, to also check that man pages for options feature an EXAMPLE section.

There are at this moment 374 stand-alone man pages for libcurl options. Only ten of them were detected to not feature good enough examples and it wasn’t very hard to fix this.

Consistency is good

Having just extended the man page checker, another idea came up.

I made the script also check that each man page features the correct set of sections, in the required order! I’m a true believer in consistency and that using the same set in the same order will make the docs easier to read and find information in, and checking for these things will make sure that all future additions will be forced stick to the same.

The mandatory sections for libcurl option man pages are right now, in this order:

NAME
SYNOPSIS
DESCRIPTION
PROTOCOLS
EXAMPLE
AVAILABILITY
RETURN VALUE
SEE ALSO

These man pages are allowed to have other sections as well, and they can be placed anywhere among the mandatory ones, but the eight section headers that has to be there has to be in that order.

Cross-references

While at it, I also extended the man page scanner to check that all references in all curl man pages to libcurl options are verified to actually refer to existing options, to find typos. Ironically this extra check turned out finding exactly no such typos in the current 463 man pages!

Future

The outcome of the work I write about here will of course be merged asap and will be part of future releases and on the website.

We should keep thinking of more ways to improve the documentation and for more ways to verify and cross-reference things mentioned in the docs to increase its accuracy and detect typos.

If you find a problem, an inferior wording or just something you think we should improve in any curl documentation, file a bug or a PR! We also try to make this as easy as possible for users to do directly from the curl website by providing bug-reporting direct links from the documentation pages.

Updates

I added the sixth “rule” days after the initial post after Willy Tarreau’s feedback.

The libcurl transfer state machine

I’ve worked hard on making the presentation I ended up calling libcurl under the hood. A part of that presentation is spent on explaining the main libcurl transfer state machine and here I’ll try to document some of what, in a written form. Understanding the main transfer state machine in libcurl could be valuable and interesting for anyone who wants to work on libcurl internals and maybe improve it.

Background

The state is kept in easy handle in the struct field called mstate. The source file for this state machine is called multi.c.

An easy handle is always in exactly one of these states for as long as it exists.

This transfer state machine is designed to work for all protocols libcurl supports, but basically no protocol will transition through all states. As you can see in the drawing, there are many different possible transitions from a lot of the states.

libcurl transfer state machine

(click the image for a larger version)

Start

A transfer starts up there above the surface in the INIT state. That’s a yellow box next to the little start button. Basically the boat shows how it goes from INIT to the right over to MSGSENT with it’s finish flag, but the real path is all done under the surface.

The yellow boxes (states) are the ones that exist before or when a connection is setup. The striped background is for all states that has a single and specific connectdata struct associated with the transfer.

CONNECT

If there’s a connection limit, either in total or per host etc, the transfer can get sent to the PENDING state to wait for conditions to change. If not, the state probably moves on to one of the blue ones to resolve host name and connect to the server etc. If a connection could be reused, it can shortcut immediately over to the green DO state.

The green states are all about setting up the connection to a state of fully connected, authenticated and logged in. Ready to send the first request.

DO

The green DO states are all about sending the request with one or more commands so that the file transfer can begin. There are several such states to properly support all protocols but also for historical reasons. We could probably remove a state there by some clever reorgs if we wanted.

PERFORMING

When a request has been issued and the transfer starts, it transitions over to PERFORMING. In the white states data is flowing. Potentially a lot. Potentially in both or either direction. If during the transfer curl finds out that the transfer is faster than allowed, it will move into RATELIMITING until it has cooled down a bit.

DONE

All the post-transfer states are red in the picture. The DONE is the first of them and after having done what it needs to round up the transfer, it disassociates with the connection and moves to COMPLETED. There’s no stripes behind that state. Disassociate here means that the connection is returned back to the connection pool for later reuse, or in the worst case if deemed that it can’t be reused or if the application has instructed it so, closed.

As you’ll note, there’s no disconnect anywhere in the state machine. This is simply because the disconnect is not really a part of the transfer at all.

COMPLETED

This is the end of the road. In this state a message will be created and put in the outgoing queue for the application to read, and then as a final last step it moves over to MSGSENT where nothing more happens.

A typical handle remains in this state until the transfer is reused and restarted, in which it will be set back to the INIT state again and the journey begins again. Possibly with other transfer parameters and URL this time. Or perhaps not.

State machines within each state

What this state diagram and explanation doesn’t show is of course that in each of these states, there can be protocol specific handling and each of those functions might in themselves of course have their own state machines to control what to do and how to handle the protocol details.

Each protocol in libcurl has its own “protocol handler” and most of the protocol specific stuff in libcurl is then done by calls from the generic parts to the protocol specific parts with calls like protocol_handler->proto_connect() that calls the protocol specific connection procedure.

This allows the generic state machine described in this blog post to not really know the protocol specifics and yet all the currently support 26 transfer protocols can be supported.

libcurl under the hood – the video

Here’s the full video of libcurl under the hood.

If you want to skip directly to the state machine diagram and the following explanation, go here.

Credits

Image by doria150 from Pixabay

curl help remodeled

curl 4.8 was released in 1998 and contained 46 command line options. curl --help would list them all. A decent set of options.

When we released curl 7.72.0 a few weeks ago, it contained 232 options… and curl --help still listed all available options.

What was once a long list of options grew over the decades into a crazy long wall of text shock to users who would enter this command and option, thinking they would figure out what command line options to try next.

–help me if you can

We’ve known about this usability flaw for a while but it took us some time to figure out how to approach it and decide what the best next step would be. Until this year when long time curl veteran Dan Fandrich did his presentation at curl up 2020 titled –help me if you can.

Emil Engler subsequently picked up the challenge and converted ideas surfaced by Dan into reality and proper code. Today we merged the refreshed and improved --help behavior in curl.

Perhaps the most notable change in curl for many users in a long time. Targeted for inclusion in the pending 7.73.0 release.

help categories

First out, curl --help will now by default only list a small subset of the most “important” and frequently used options. No massive wall, no shock. Not even necessary to pipe to more or less to see proper.

Then: each curl command line option now has one or more categories, and the help system can be asked to just show command line options belonging to the particular category that you’re interested in.

For example, let’s imagine you’re interested in seeing what curl options provide for your HTTP operations:

$ curl --help http
Usage: curl [options…]
http: HTTP and HTTPS protocol options
--alt-svc Enable alt-svc with this cache file
--anyauth Pick any authentication method
--compressed Request compressed response
-b, --cookie Send cookies from string/file
-c, --cookie-jar Write cookies to after operation
-d, --data HTTP POST data
--data-ascii HTTP POST ASCII data
--data-binary HTTP POST binary data
--data-raw HTTP POST data, '@' allowed
--data-urlencode HTTP POST data url encoded
--digest Use HTTP Digest Authentication
[...]

list categories

To figure out what help categories that exists, just ask with curl --help category, which will show you a list of the current twenty-two categories: auth, connection, curl, dns, file, ftp, http, imap, misc, output, pop3, post, proxy, scp, sftp, smtp, ssh, telnet ,tftp, tls, upload and verbose. It will also display a brief description of each category.

Each command line option can be put into multiple categories, so the same one may be displayed in both in the “http” category as well as in “upload” or “auth” etc.

--help all

You can of course still get the old list of every single command line option by issuing curl --help all. Handy for grepping the list and more.

“important”

The meta category “important” is what we use for the options that we show when just curl --help is issued. Presumably those options should be the most important, in some ways.

Credits

Code by Emil Engler. Ideas and research by Dan Fandrich.

Reporting documentation bugs in curl got easier

After I watched a talk by Marcus Olsson about docs as code (at foss-sthlm on December 12 2019), I got inspired to provide links on the curl web site to make it easier for users to report bugs on documentation.

Starting today, there are two new links on the top right side of all libcurl API function call documentation pages.

File a bug about this page – takes the user directly to a new issue in the github issue tracker with the title filled in with the name of the function call, and the label preset to ‘documentation’. All there’s left is for the user to actually provide a description of the problem and pressing submit (and yeah, a github account is also required).

View man page source – instead takes the user over to browsing that particular man pages’s source file in the github source code repository.

Click the image for full resolution version.

Since this is also already live on the site, you can also browse the documentation there. Like for example the curl_easy_init man page.

If you find mistakes or omissions in the docs – no matter how big or small – feel free to try out these links!

Credits

Image by Pexels from Pixabay

libcurl video tutorial

I’ve watched how my thirteen year old son goes about to acquire information about things online. I am astonished how he time and time again deliberately chooses to get it from a video on YouTube rather than trying to find the best written documentation for whatever he’s looking for. I just have to accept that some people, even some descendants in my own family tree, prefer video as a source of information. And I realize he’s not alone.

So therefore, I bring you, the…

libcurl video tutorial

My intent is to record a series of short and fairly independent episodes, each detailing a specific libcurl area. A particular “thing”, feature, option or area of the APIs. Each episode is also thoroughly documented and all the source code seen on the video is available on the site so that viewers can either follow along while viewing, or go back to the code afterward as a reference. Or both!

I’ve done the four first episodes so far, and they range from five minutes to nineteen minutes a piece. I expect that it might take me a while to just complete the list of episodes I could come up with myself. I also hope and expect that readers and viewers will think of other areas that I could cover so the list of video episodes could easily expand over time.

Feedback

If you have comments on the episodes. If you have suggestion of what to improve or subjects to cover, head over to the libcurl-video-tutorials github page and file an issue or two!

Video setup

I use a Debian Linux installation to develop on. I figure it should be similar enough to many other systems.

Video wise, in each episode I show you my text editor for code, a terminal window for building the code, running what we build in the episode and also for looking up man page information etc. And a small image of myself. Behind those three squares, there’s a photo of a forest (taken by me).

I plan to make each episode use the same basic visuals.

In the initial “setup” episode I create a generic Makefile, which we can reuse in several (all?) other episodes to build the example code easily and swiftly. I’ve previously learned that people consider Makefiles difficult, or sometimes even magic, to work with so I wanted to get that out of the way from the start and then focus more directly on actual C code that uses libcurl.

Receive data episode

Here’s the “receive data” episode as an example of how this can look.

The link: https://bagder.github.io/libcurl-video-tutorials/

HTTP/3 Explained

I’m happy to tell that the booklet HTTP/3 Explained is now ready for the world. It is entirely free and open and is available in several different formats to fit your reading habits. (It is not available on dead trees.)

The book describes what HTTP/3 and its underlying transport protocol QUIC are, why they exist, what features they have and how they work. The book is meant to be readable and understandable for most people with a rudimentary level of network knowledge or better.

These protocols are not done yet, there aren’t even any implementation of these protocols in the main browsers yet! The book will be updated and extended along the way when things change, implementations mature and the protocols settle.

If you find bugs, mistakes, something that needs to be explained better/deeper or otherwise want to help out with the contents, file a bug!

It was just a short while ago I mentioned the decision to change the name of the protocol to HTTP/3. That triggered me to refresh my document in progress and there are now over 8,000 words there to help.

The entire HTTP/3 Explained contents are available on github.

If you haven’t caught up with HTTP/2 quite yet, don’t worry. We have you covered for that as well, with the http2 explained book.

Easier HTTP requests with h2c

I spend a large portion of my days answering questions and helping people use curl and libcurl. With more than 200 command line options it certainly isn’t always easy to find the correct ones, in combination with the Internet and protocols being pretty complicated things at times… not to mention the constant problem of bad advice. Like code samples on stackoverflow that repeats non-recommended patterns.

The notorious -X abuse is a classic example, or why not the widespread disease called too much use of the –insecure option (at a recent count, there were more than 118,000 instances of “curl –insecure” uses in code hosted by github alone).

Sending HTTP requests with curl

HTTP (and HTTPS) is by far the most used protocol out of the ones curl supports. curl can be used to issue just about any HTTP request you can think of, even if it isn’t always immediately obvious exactly how to do it.

h2c to the rescue!

h2c is a new command line tool and associated web service, that when passed a complete HTTP request dump, converts that into a corresponding curl command line. When that curl command line is then run, it will generate exactly(*) the HTTP request you gave h2c.

h2c stands for “headers to curl”.

Many times you’ll read documentation somewhere online or find a protocol/API description showing off a full HTTP request. “This is what the request should look like. Now send it.” That is one use case h2c can help out with.

Example use

Here we have an HTTP request that does Basic authentication with the POST method and a small request body. Do you know how to tell curl to send it?

The request:

POST /receiver.cgi HTTP/1.1
Host: example.com
Authorization: Basic aGVsbG86eW91Zm9vbA==
Accept: */*
Content-Length: 5
Content-Type: application/x-www-form-urlencoded

hello

I save the request above in a text file called ‘request.txt’ and ask h2c to give the corresponding curl command line:

$ ./h2c < request.txt
curl --http1.1 --header User-Agent: --user "hello:youfool" --data-binary "hello" https://example.com/receiver.cgi

If we add “–trace-ascii dump” to that command line, run it, and then inspect the dump file after curl has completed, we can see that it did indeed issue the HTTP request we asked for!

Web Site

Maybe you don’t want to install another command line tool written by me in your system. The solution is the online version of h2c, which is hosted on a separate portion of the official curl web site:

https://curl.haxx.se/h2c/

The web site lets you paste a full HTTP request into a text form and the page then shows the corresponding curl command line for that request.

h2c “as a service”

Inception alert: you can also use the web version of h2c by sending over a HTTP request to it using curl. You’ll then get nothing but the correct curl command line output on stdout.

To send off the same file we used above:

curl --data-urlencode http@request.txt https://curl.haxx.se/h2c/

or of course if you rather want to pass your HTTP request to curl on stdin, that’s equally easy:

cat request.txt | curl --data-urlencode http@- https://curl.haxx.se/h2c/

Early days, you can help!

h2c was created just a few days ago. I’m sure there are bugs, issues and quirks to iron out. You can help! Files issues or submit pull-requests!

(*) = barring bugs, there are still some edge cases where the exact HTTP request won’t be possible to repeat, but where we instead will attempt to do “the right thing”.

curl: read headers from file

Starting in curl 7.55.0 (since this commit), you can tell curl to read custom headers from a file. A feature that has been asked for numerous times in the past, and the answer has always been to write a shell script to do it. Like this:

#!/bin/sh
while read line; do
  args="$args -H '$line'";
done
curl $args $URL

That’s now a response of the past (or for users stuck on old curl versions). We can now instead tell curl to read headers itself from a file using the curl standard @filename way:

$ curl -H @headers https://example.com

… and this also works if you want to just send custom headers to the proxy you do CONNECT to:

$ curl --proxy-headers @headers --proxy proxy:8080 https://example.com/

(this is a pure curl tool change that doesn’t affect libcurl, the library)

curl man page disentangled

The nroff formatted source file to the man page for the curl command line tool was some 110K and consisted of more than 2500 lines by the time this overhaul, or disentanglement if you will, started. At the moment of me writing this, the curl version in git right now, supports 204 command line options.

Working with such a behemoth of a document has gotten a bit daunting to people and the nroff formatting itself is quirky and esoteric. For some time I’ve also been interested in creating some sort of system that would allow us to generate a single web page for each individual command line option. And then possibly allow for expanded descriptions in those single page versions.

To avoid having duplicated info, I decided to create a new system in which we can document each individual command line option in a separate file and from that collection of hundreds of files we can generate the big man page, we can generate the “curl –help” output and we can create all those separate pages suitable for use to render web pages. And we can automate some of the nroff syntax to make it less error-prone and cause less sore eyes for the document editors!

With this system we also get a unified handling of things added in certain curl versions, affecting only specific protocols or dealing with references like “see also” mentions. It gives us a whole lot of meta-data for the command line options if you will and this will allow us to do more fun things going forward I’m sure.

You’ll find the the new format documented, and you can check out the existing files to get a quick glimpse on how it works. As an example, look at the –resolve documentation source.

Today I generated the first full curl.1 replacement and pushed to git, but eventually that file will be removed from git and instead generated at build time by the regular build system. No need to commit a generated file in the long term.