improving the curl docs, step 1

Saturday, June 21st, 2014

As I mentioned before, the curl documentation needs improvement. As a first step I converted the man page for curl_easy_setopt into no less than 210(!) individual man pages. One new for each option the function supports.

The man page was originally (a few days ago) almost 3000 lines, and now with them all split up we end up with a lot more text. This because the new format encourages more text per option and each page now has to detail itself more. This should also make each option much easier to google/search and to link to when we help users understand the options.

I’ve made some server-side scripts to generate html versions of them all, I generate a list of all options we have and the examples we host on the web site now have all mentions of the options linked directly to these new pages.

The curl_easy_setopt man page will then get most explanations cut out and mainly be used as an index with the options grouped into logical sections to help users find the options they want to use. I could cut out almost 2500 lines.

The new man pages add about 7500 lines of documentation (excluding the headers in each file)…

curl the next few years

Thursday, June 19th, 2014

Roadmap of things Daniel Stenberg and Steve Holme want to work on next. It is intended to serve as a guideline for others for information, feedback and possible participation.

If you agree, disagree or would like to add stuff you want to work on, please join us on the curl-library list! This “roadmap” is likely to change over time. We’ll keep the updated ROADMAP in git.

New stuff – libcurl

  1. http2 test suite

  2. http2 multiplexing/pipelining

  3. SPDY

  4. SRV records

  5. HTTPS to proxy

  6. make sure there’s an easy handle passed in to curl_formadd(), curl_formget() and curl_formfree() by adding replacement functions and deprecating the old ones to allow custom mallocs and more

  7. HTTP Digest authentication via Windows SSPI

  8. GSSAPI authentication in the email protocols

  9. add support for third-party SASL libraries such as Cyrus SASL – may need to move existing native and SSPI based authentication into vsasl folder after reworking HTTP and SASL code

  10. SASL authentication in LDAP

  11. Simplify the SMTP email interface so that programmers don’t have to construct the body of an email that contains all the headers, alternative content, images and attachments – maintain raw interface so that programmers that want to do this can

  12. Allow the email protocols to return the capabilities before authenticating. This will allow an application to decide on the best authentication mechanism

  13. Allow Windows threading model to be replaced by Win32 pthreads port

  14. Implement a dynamic buffer size to allow SFTP to use much larger buffers and possibly allow the size to be customizable by applications. Use less memory when handles are not in use?

New stuff – curl

  1. Embed a language interpreter (lua?). For that middle ground where curl isn’t enough and a libcurl binding feels “too much”. Build-time conditional of course.

  2. Simplify the SMTP command line so that the headers and multi-part content don’t have to be constructed before calling curl

Improve

  1. build for windows (considered hard by many users)

  2. curl -h output (considered overwhelming to users)

  3. we have  > 160 command line options, is there a way to redo things to simplify or improve the situation as we are likely to keep adding features/options in the future too

  4. docs (considered “bad” by users but how do we make it better?)

  5. authentication framework (consider merging HTTP and SASL authentication to give one API for protocols to call)

  6. Perform some of the clean up from the TODO document, removing old definitions and such like that are currently earmarked to be removed years ago

Remove

  1. cmake support (nobody maintains it)

  2. makefile.vc files as there is no point in maintaining two sets of Windows makefiles. Note: These are currently being used by the Windows autobuilds

The curl and libcurl 2014 survey

Thursday, June 19th, 2014

Reading through the answers to the curl project’s survey “curl and libcurl 2014″ is very interesting and educational.

After having lead and participated in this project for so long I have my own picture of what we’re good and bad at. That’s not exactly the same image I get when I read the survey responses. That’s of course the educating part and I really want to learn from this poll and see where to put in some efforts and attempt to improve. At the same time I’ve been working for a while to put together a roadmap for the project, and the survey will help guide us with that work as well.

The full generated summary of the answers can be found on the site, but I thought I do the extra effort here and try to extrapolate data, compare and try to get to the real story that lurks in the shadows.

Over the almost 10 days the poll was open, we received 194 responses. I was hoping for more participation, but on the other hand I don’t think more people would’ve given a much different view. My only concern would be that I’m not sure exactly how well we reached out.

Almost all curl users use it for HTTP and HTTPS. Sure, we also use a lot of other protocols and in fact all supported protocols did up having at least two users according to the survey, but only a single digit percentage did not mark HTTP and HTTPS as protocols they use. The least used supported protocol gopher, is used among 1.5% of the users who responded.

FTPS and SFTP are basically equally much used and they are the 4th and 5th most used protocols. HTTP, HTTPS and FTP are clearly our most popular protocols.

Only one in five users use curl on a single platform. All others use it on two or more, and one if four use it on four or more with an unexpectedly high 11% saying they use it on 5 or more platforms! That’s a pretty strong message to me that our multi-platform strategy is important.

Our users have been with us for a long time. Half of the users have been using curl for five years or more! A fifth has been with us for 8 years or more! And yet there seems to be a healthy amount of newcomers finding us as 14% is within their first year.

The above numbers combined, I’m not surprised but only happy to see that 4 out of 5 users are also involved in other open source projects. curl is just one piece in a large ecosystem and I think it is good that we all participate in several projects so that we learn and cross-pollinate where possible!

Less than half of the respondents are subscribed to a curl mailing list, and curl-library is the most popular one. This also reflects in subscriber numbers on the actual mailing lists where curl-library with its 1400+ members has almost twice as many subscribers as curl-users. One way to view this is that we are old enough, established enough and working enough so that users don’t have to subscribe to our lists to keep up. The less optimistic way to see it could be that this is because we haven’t reached out good enough or that our mailing list culture/setup isn’t welcoming enough.

Perhaps most surprising to me: that several persons got upset and reacted strongly to the question about how good we treat “female and other minorities” in the project. To me there’s no doubt that female contributors are a minority in the curl community and I want to learn if we’re doing our best to be inclusive and open to all possible contributors. Or at least how good/bad people think we are doing.

29% of the respondents have contributed patches, meaning 56 individuals. I think that tells more about the ones who took part of the survey than it measures participation level among “regular users”.

Documentation

A big revelation for me was the question where I asked people to identify the “worst parts” of the project. The image here below is the look of the summary.

It quite clearly identifies “documentation” as the area in most need of improvements.

I don’t think the amount of docs is the problem. After discussing with people I think the primary issues are:

  • Some collections of docs are just too big and hard to find in, like the curl man page and the curl_easy_setopt man pages. We need to split them up and/or rearrange somehow to help people find the info they need. Work has started on this. I’ll follow up with details later.
  • We get slightly bad “reviews” on this when people confuse the libcurl bindings’ lack of docs to be our problem. Lots of libcurl bindings are not very good documented – but they are separate projects not controlled or documented by us. I don’t know what we can do to help that situation. Suggestions are very welcome!
  • We don’t have much step-by-step tutorials on how to get started and how to knit things together. We mostly provide reference manuals. I will appreciate help with improving this!

cURL

Firefox and partial content

Monday, June 16th, 2014

Firefox BallOne of the first bugs that fell into my lap when I started working for Mozilla not a very long time ago, was bug 237623. Anyone involved in Mozilla knows a bug in that range is fairly old (we just recently passed one million filed bugs). This particular bug was filed in March 2004 and there are (right now) 26 other bugs marked as duplicates of this. Today, the fix for this problem has landed.

The core of the problem is that when a HTTP server sends contents back to a client, it can send a header along indicating the size of the data in the response. The header is called “Content-Length:”. If the connection gets broken during transfer for whatever reason and the browser hasn’t received as much data as was initially claimed to be delivered, that’s a very good hint that something is wrong and the transfer was incomplete.

The perhaps most annoying way this could be seen is when you download a huge DVD image or something and for some reason the connection gets cut off after only a short time, way before the entire file is downloaded, but Firefox just silently accept that as the end of the transfer and think everything was fine and dandy.

What complicates the issue is the eternal problem: not everything abides to the protocol. This said, if there are frequent violators of the protocol we can’t strictly fail on each case of problem we detect but we must instead do our best to handle it anyway.

Is Content-Length a frequently violated HTTP response header?

Let’s see…

  1. Back in the HTTP 1.0 days, the Content-Length header was not very important as the connection was mostly shut down after each response anyway. Alas, clients/browsers would swiftly learn to just wait for the disconnect anyway.
  2. Back in the old days, there were cases of problems with “large files” (files larger than 2 or 4GB) which every now and then caused the Content-Length: header to turn into negative or otherwise confused values when it wrapped. That’s not really happening these days anymore.
  3. With HTTP 1.1 and its persuasive use of persistent connections it is important to get the size right, as otherwise the chain of requests get messed up and we end up with tears and sad faces
  4. In curl’s HTTP parser we’ve always been strictly abiding to this header and we’ve bailed out hard on mismatches. This is a very rare error for users to get and based on this (admittedly unscientific data) I believe that there is not a widespread use of servers sending bad Content-Length headers.
  5. It seems Chrome at least in some aspects is already much more strict about this header.

My fix for this problem takes a slightly careful approach and only enforces the strictness for HTTP 1.1 or later servers. But then as a bonus, it has grown to also signal failure if a chunked encoded transfer ends without the ending trailer or if a SPDY or http2 transfer gets prematurely stopped.

This is basically a 6-line patch at its core. The rest is fixing up old test cases, added new tests etc.

As a counter-point, Eric Lawrence apparently worked on adding stricter checks in IE9 three years ago as he wrote about in Content-Length in the Real World. They apparently subsequently added the check again in IE10 which seems to have caused some problems for them. It remains to be seen how this change affects Firefox users out in the real world. I believe it’ll be fine.

This patch also introduces the error code for a few other similar network situations when the connection is closed prematurely and we know there are outstanding data that never arrived, and I got the opportunity to improve how Firefox behaves when downloading an image and it gets an error before the complete image has been transferred. Previously (when a partial transfer wasn’t an error), it would always throw away the image on an error and instead show the “image not found” picture. That really doesn’t make sense I believe, as a partial image is better than that default one – especially when a large portion of the image has been downloaded already.

Follow-up effects

Other effects of this change that possibly might be discovered and cause some new fun reports: prematurely cut off transfers of javascript or CSS will discard the entire javascript/CSS file. Previously the partial file would be used.

Of course, I doubt that these are the files that are as commonly cut off as many other file types but still on a very slow and bad connection it may still happen and the new behavior will make Firefox act as if the file wasn’t loaded at all, instead of previously when it would happily used the portions of the files that it had actually received. Partial CSS and partial javascript of course could lead to some “fun” effects of brokenness.

Http2 interim meeting NYC

Sunday, June 8th, 2014

On June 5th, around thirty people sat down around a huge table in a conference room on the 4th floor in the Google offices in New York City, with a heavy rain pouring down outside.

It was time for another IETF http2 interim meeting. The attendees were all participants in the HTTPbis work group and came from a wide variety of companies and countries. The major browser vendors were represented there, and so were operators and big service providers and some proxy people. Most of the people who have been speaking up on the mailing list over the last year or so, unfortunately with a couple of people notably absent. (And before anyone asks, yes we are a group where the majority is old males like me.)

Most people present knew many of the others already, which helped to create a friendly familiar spirit and we quickly got started on the Thursday morning working our way through the rather long lits of issues to deal with. When we had our previous interim meeting in London, I think most of us though we would’ve been further along today but recent development and discussions on the list had actually brought back a lot of issues we though we were already done with and we now reiterated a whole slew of subjects. We weren’t allowed to take photographs indoors so you won’t see any pictures of this opportunity from me here.

Google offices building logo

We did close many issues and I’ll just quickly mention some of the noteworthy ones here…

Extensions

We started out with the topic of “extensions”. Should we revert the decision from Zurich (where it was decided that we shouldn’t allow extensions in http2) or was the current state of the protocol the right one? The arguments for allowing extensions included that we’d keep getting requests for new things to add unless we have a way and that some of the recent stuff we’ve added really could’ve been done as extensions instead. An argument against it is that it makes things much simpler and reliable if we just document exactly what the protocol has and is, and removing “optional” behavior from the protocol has been one of the primary mantas along the design process.

The discussion went back and forth for a long time, and after almost three hours we had kind of a draw. Nobody was firmly against “the other” alternative but the two sides also seemed to have roughly the same amount of support. Then it was yet again time for the coin toss to guide us. Martin brought out an Australian coin and … the next protocol draft will allow extensions. Again. This also forces implementation to have to read and skip all unknown frames it receives compared to the existing situation where no unknown frames can ever occur.

BLOCKED as an extension

A rather given first candidate for an extension was the BLOCKED frame. At the time BLOCKED was added to the protocol it was explicitly added into the spec because we didn’t have extensions – and it is now being lifted out into one.

ALTSVC as an extension

What received slightly more resistance was the move to move out the ALTSVC frame as well. It was argued that the frame isn’t mandatory to support and therefore easily can be made into an extension.

Simplified padding

Another small change of the wire format since draft-12 was the removal of the high byte for padding to simplify. It reduces the amount you can pad a single frame but you can easily pad more using other means if you really have to, and there were numbers presented that said that 255 bytes were enough with HTTP 1.1 already so probably it will be enough for version 2 as well.

Schedule

There will be a new draft out really soon: draft -13. Martin, our editor of the spec, says he’ll be able to ship it in a week. That is intended to be the last draft, intended for implementation and it will then be expected to get deployed rather widely to allow us all in the industry to see how it works and be able to polish details or wordings that may still need it.

We had numerous vendors and HTTP stack implementers in the room and when we discussed schedule for when various products will be able to see daylight. If we all manage to stick to the plans. we may just have plenty of products and services that support http2 by the September/October time frame. If nothing major is found in this latest draft, we’re looking at RFC status not too far into 2015.

Meeting summary

I think we’re closing in for real now and I have good hopes for the protocol and our progress to a really wide scale deployment across the Internet. The HTTPbis group is an awesome crowd to work with and I had a great time. Our hosts took good care of us and made sure we didn’t lack any services or supplies. Extra thanks go to those of you who bought me dinners and to those who took me out to good beer places!

My http2 document

Yeah, it will now become somewhat out of date and my plan is to update it once the next draft ships. I’ll also do another http2 presentation already this week so I hope to also post an updated slide set soonish. Stay tuned!

Wireshark

My plan is to cooperate with the other Wireshark hackers and help making sure we have the next draft version supported in Wireshark really soon after its published.

curl and nghttp2

Most of the differences introduced are in the binary format so nghttp2 will need to be updated again – it is the library curl uses for the wire format of http2. The curl parts will need some adjustments, for example for Content-Encoding gzip that no longer is implicit but there should be little to do in the curl code for this draft bump.

Why SFTP is still slow in curl

Wednesday, May 14th, 2014

Okay, there’s no point in denying this fact: SFTP transfers in curl and libcurl are much slower than if you just do them with your ordinary OpenSSH sftp command line tool or similar. The difference in performance can even be quite drastic.

Why is this so and what can we do about it? And by “we” I fully get that you dear reader think that I or someone else already deeply involved in the curl project should do it.

Background

I once blogged a lengthy post on how I modified libssh2 to do SFTP transfers much faster. curl itself uses libssh2 to do SFTP so there’s at least a good start. The problem is only that the speedup we did in libssh2 was because of SFTP’s funny protocol design so we had to:

  1. send off requests for a (large) set of data blocks at once, each block being N kilobytes big
  2. using a several hundred kilobytes big buffer (when downloading the received data would be stored in the big buffer)
  3. then return as soon as there’s one block (or more) that has returned from the server with data
  4. over time and in a loop, there are then blocks constantly in transit and a number of blocks always returning. By sending enough outgoing requests in the “outgoing pipe”, the “incoming pipe” and CPU can be kept fairly busy.
  5. never wait until the entire receive buffer is complete before we go on, but instead use a sliding buffer so that we avoid “halting points” in the transfer

This is more or less what the sftp tool does. We’ve also done experiments with using libssh2 directly and then we can reach quite decent transfer speeds.

libcurl

The libcurl transfer core is basically the same no matter which protocol that is being transferred. For a normal download this is what it does:

  1. waits for data to become available
  2. read as much data as possible into a 16KB buffer
  3. send the data to the application
  4. goto 1

So, there are two problems with this approach when it comes to the SFTP problems as described above.

The first one is that a 16KB buffer is very small in SFTP terms and immediately becomes a bottle neck in itself. In several of my experiments I could see how a buffer of 128, 256 or even 512 kilobytes would be needed to get high bandwidth high latency transfers to really fly.

The second being that with a fixed buffer it will come to a point every 16KB byte where it needs to wait for that specific response to come back before it can continue and ask for the next 16KB of data. That “sync point” is really not helping performance either – especially not when it happens so often as every 16KB.

A solution?

For someone who just wants a quick-fix and who builds their own libcurl, rebuild with CURL_MAX_WRITE_SIZE set to 256000 or something like that and you’ll get a notable boost. But that’s neither a nice nor clean fix.

A proper fix should first of all only be applied for SFTP transfers, thus deciding at run-time if it is necessary or not. Then it should dynamically provide a larger buffer and thirdly, for upload it should probably make the buffer “sliding” as in the libssh2 example code sftp_write_sliding.c.

This is also already mentioned in the TODO document as “Modified buffer size approach“.

There’s clearly room for someone to step forward and help us improve in this area. Welcome!

curl dot-to-dot

#MeraKrypto

Tuesday, April 29th, 2014

A whole range of significant Swedish network organizations (ISOC, SNUS, DFRI and SUNET) organized a full-day event today, managed by the great mr Olle E Johansson. The event, called “MeraKrypto” (MoreCrypto would be the exact translation), was a day with introductions to TLS and a lot of talks around TLS and other encryption and security related topics.

I was there and held a talk on the topic of “curl and TLS” and I basically talked some basics around what curl and libcurl are, how we do TLS, some common problems and hwo verifying the server cert is a common usage mistake and then I continued on to quickly mention how http2 and TLS relate..See my slides below, but please be aware that as usual you may not grasp the whole thing only by the (English) slides. The event was fully booked so there was around one hundred peeps in the audience and there were a lot of interested minds that asked good questions proving they really understood the topics.

The discussion almost got heated during the talk about how companies do MITMing of SSL sessions and this guy from Bluecoat pretty much single-handedly argued for the need for this and how “it fills a useful purpose”.

It was a great afternoon!

The event was streamed live and recorded on video. I’ll post a link as soon as it gets available to me.

curl and TLS #MeraKrypto from Daniel Stenberg

Wireshark dissector work

Thursday, April 24th, 2014

WiresharkRecently I cloned the Wireshark git repository and started updating the http2 dissector. That’s the piece of code that gets called to analyze a stream of data that Wireshark thinks is http2.

The current http2 dissector was left at draft-09 state, while the current draft at the time was number 11 and there have been several changes on the binary format since so any reasonably updated client or server would send or receive byte streams that Wireshark couldn’t properly display.

I never wrote any dissector code before but I must say Wireshark didn’t disappoint. It was straight forward and mostly downright easy to fix most of the wrong details. I’m not pretending to be a master at this nor is the dissector code anywhere near “finished” yet but I still enjoyed the API and how to write a thing like this.

I’ve since dissected plain-text http2 streams that I’ve done with curl+nghttp2 and I’ve also used the SSLKEYLOGFILE trick with Firefox to automatically decrypt the TLS session and have the dissector figure out the underlying http2 parts.

If there’s any little snag to mention, it is the fact that they insist on getting patches submitted directly to gerrit instead of any mailing list or similar. This required me to create a gerrit account, and really figure out how to push my stuff from git to there, instead of the more traditional and simpler approach of just sending my patch to a mailing list or possibly submitting it to a bug/patch tracker somewhere with my browser.

Call me old-style but in fact the hip way of today with a pull-request github style would also have been much easier. Here’s what my gerrit submission looks like. But I get it, gerrit does push a little more work over to the submitter and I figure that once a submitter such as myself finally has fixed all the nits in the patch it is very easy for the project to actually merge it. I actually got someone else to help me point out how to even find the link to view the code review after the first one was submitted on the site… (when I post this, my patch has not yet been accepted or merged into the wireshark git repo)

Here’s a basic screenshot showing a trace of Firefox requesting https://nghttp2.org using http2. Click it for the full thing.

wireshark-screenshot

.. and what happens this morning my time? There’s a brand new http2 draft-12 out with more changes on the on-the-wire format! Well to be honest, that really wasn’t a surprise. I’ll get the new stuff supported too, but I’ll do that in a separate patch as I prefer to hold off until I see a live stream by at least one implementation to test against.

curl and proxy headers

Friday, April 4th, 2014

Starting in the next curl release, 7.37.0, the curl tool supports the new command line option –proxy-header. (Completely merged at this commit.)

It works exactly like –header does, but will only include the headers in requests sent to a proxy, while the opposite is true for –header: that will only be sent in requests that will go to the end server. But of course, if you use a HTTP proxy and do a normal GET for example, curl will include headers for both the proxy and the server in the request. The bigger difference is when using CONNECT to a proxy, which then only will use proxy headers.

libcurl

For libcurl, the story is slightly different and more complicated since we’re having things backwards compatible there. The new libcurl still works exactly like the former one by default.

CURLOPT_PROXYHEADER is the new option that is the new proxy header option that should be set up exactly like CURLOPT_HTTPHEADER is

CURLOPT_HEADEROPT is then what an application uses to set how libcurl should use the two header options. Again, by default libcurl will keep working like before and use the CURLOPT_HTTPHEADER list in all HTTP requests. To change that behavior and use the new functionality instead, set CURLOPT_HEADEROPT to CURLHEADER_SEPARATE.

Then, the header lists will be handled as separate. An application can then switch back to the old behavior with a unified header list by using CURLOPT_HEADEROPT set to CURLHEADER_UNIFIED.

Presentation: what is http2

Wednesday, April 2nd, 2014

We had the 14th meetup with foss-sthlm yesterday and I talked about http2 for the almost 100 attendees in the audience. See my slides at slideshare:

Http2 from Daniel Stenberg