Category Archives: cURL and libcurl

curl and/or libcurl related

Video perhaps?

I decided to try to do a short video about my current work right now and make it available for you all. I try to keep it short (5-7 minutes) and I’m certainly no pro at it, but I will try to make a weekly one for a while and see if it gets any fun. I’m going to read your comments and responses to this very eagerly and that will help me decide how I will proceed on this experiment.

Enjoy.

Credits in the curl project

Friends!

When we receive patches, improvements, suggestions, advice and whatever that lead to a change in curl or libcurl, I make an effort to log the contributor’s name in association with that change. Ideally, I add a line in the commit message. We use “Reported-by: <full name>” quite frequently but also other forms of “…-by: <full name>” too like when there was an original patch by someone or testing and similar. It shouldn’t matter what the nature of the contribution is, if it helped us it is a contribution and we say thanks!

curl-give-credits

I want all patch providers and all of us who have push rights to use this approach so that we give credit where credit is due. Giving credit is the only payment we can offer in this project and we should do it with generosity.

The green bars on the right show the results from the question how good we are at giving credit in the project from the 2014 curl survey, where 5 is really good and 1 is really bad. Not too shabby, but I’d say we can do even better! (59% checked the top score, 15% checked the 3′)

I have a script called contributors.sh that extracts all contributors since a tag (typically the previous release) and I use that to get a list of names to thank in the RELEASE-NOTES file for the pending curl release. Easy and convenient.

After every release (which means every 8th week) I then copy the list of names from RELEASE-NOTES into docs/THANKS. So all contributors get remembered and honored after having helped us in one way or another.

When there’s no name

When contributors don’t provide a real name but only a nick name like foobar123, user_5678 and so on I tend to consider that as request to not include the person’s name anywhere and hence I tend to not include it in the THANKS or RELEASE-NOTES. This also sometimes the result of me not always wanting to bother by asking people over and over again for their real name in case they want to be given proper and detailed credit for what they’ve provided to us.

Unfortunately, a notable share of all contributions we get to the project are provided by people “hiding” behind a made up handle. I’m fine with that as long as it truly is what the helpers’ actually want.

So please, if you help us out, we will happily credit you, but please tell us your name!

keep-calm-and-improve-curl

My home setup

I work in my home office which is upstairs in my house, perhaps 20 steps from my kitchen and the coffee refill. I have a largish desk with room for a number of computers. The photo below shows the three meter beauty. My two kids have their two machines on the left side while I use the right side of it for my desktop and laptop.

Daniel's home office

Many computers

The kids use my old desktop computer with a 20″ Dell screen and my old 15.6″ dual-core Asus laptop. My wife has her laptop downstairs and we have a permanent computer installed underneath the TV for media (an Asus VivoPC).

My desktop computer

I’m primarily developing C and C++ code and I’m frequently compiling rather large projects – repeatedly. I use a desktop machine for my ordinary development, equipped with a fairly powerful 3.5GHz quad-core Core-I7 CPU, I have my OS, my home dir and all source code put on an SSD. I have a larger HDD for larger and slower content. With ccache and friends, this baby can build Firefox really fast. I put my machine together from parts myself as I couldn’t find a suitable one focused on horse power but yet a “normal” 2D graphics card that works Fractal Designfine with Linux. I use a Radeon HD 5450 based ASUS card, which works fine with fully open source drivers.

I have two basic 24 inch LCD monitors (Benq and Dell) both using 1920×1200 resolution. I like having lots of windows up, nothing runs full-screen. I use KDE as desktop and I edit everything in Emacs. Firefox is my primary browser. I don’t shut down this machine, it runs a few simple servers for private purposes.

My machines (and my kids’) all run Debian Linux, typically of the unstable flavor allowing me to get new code reasonably fast.

Func KB-460 keyboardMy desktop keyboard is a Func KB-460, mechanical keyboard with some funky extra candy such as red backlight and two USB ports. Both my keyboard and my mouse are wired, not wireless, to take away the need for batteries or recharging etc in this environment. My mouse is a basic and old Logitech MX 310.

I have a crufty old USB headset with a mic, that works fine for hangouts and listening to music when the rest of the family is home. I have Logitech webcam thing sitting on the screen too, but I hardly ever use it for anything.

When on the move

I need to sometimes move around and work from other places. Going to conferences or even our regular Mozilla work weeks. Hence I also have a laptop that is powerful enough to build Firefox is a sane amount of time. I have Lenovo Thinkpad w540a Lenovo Thinkpad W540 with a 2.7GHz quad-core Core-I7, 16GB of RAM and 512GB of SSD. It has the most annoying touch pad on it. I don’t’ like that it doesn’t have the explicit buttons so for example both-clicking (to simulate a middle-click) like when pasting text in X11 is virtually impossible.

On this machine I also run a VM with win7 installed and associated development environment so I can build and debug Firefox for Windows on it.

I have a second portable. A small and lightweight netbook, an Eeepc S101, 10.1″ that I’ve been using when I go and just do presentations at places but recently I’ve started to simply use my primary laptop even for those occasions – primarily because it is too slow to do anything else on.

I do video conferences a couple of times a week and we use Vidyo for that. Its Linux client is shaky to say the least, so I tend to use my Nexus 7 tablet for it since the Vidyo app at least works decently on that. It also allows me to quite easily change location when it turns necessary, which it sometimes does since my meetings tend to occur in the evenings and then there’s also varying amounts of “family activities” going on!

Backup

For backup, I have a Synology NAS equipped with 2TB of disk in a RAIDSynology DS211j stashed downstairs, on the wired in-house gigabit Ethernet. I run an rsync job every night that syncs the important stuff to the NAS and I run a second rsync that also mirrors relevant data over to a friends house just in case something terribly bad would go down. My NAS backup has already saved me really good at least once.

Printer

HP Officejet 8500ANext to the NAS downstairs is the house printer, also attached to the gigabit even if it has a wifi interface of its own. I just like increasing reliability to have the “fixed services” in the house on wired network.

The printer also has scanning capability which actually has come handy several times. The thing works nicely from my Linux machines as well as my wife’s windows laptop.

Internet

fiber cableI have fiber going directly into my house. It is still “just” a 100/100 connection in the other end of the fiber since at the time I installed this they didn’t yet have equipment to deliver beyond 100 megabit in my area. I’m sure I’ll upgrade this to something more impressive in the future but this is a pretty snappy connection already. I also have just a few milliseconds latency to my primary servers.

Having the fast uplink is perfect for doing good remote backups.

Router  and wifi

dlink DIR 635I have a lowly D-Link DIR 635 router and wifi access point providing wifi for the 2.4GHz and 5GHz bands and gigabit speed on the wired side. It was dead cheap it just works. It NATs my traffic and port forwards some ports through to my desktop machine.

The router itself can also update the dyndns info which ultimately allows me to use a fixed name to my home machine even without a fixed ip.

Frequent Wifi users in the household include my wife’s laptop, the TV computer and all our phones and tablets.

Telephony

Ping Communication Voice Catcher 201EWhen I installed the fiber I gave up the copper connection to my home and since then I use IP telephony for the “land line”. Basically a little box that translates IP to old phone tech and I keep using my old DECT phone. We basically only have our parents that still call this number and it has been useful to have the kids use this for outgoing calls up until they’ve gotten their own mobile phones to use.

It doesn’t cost very much, but the usage is dropping over time so I guess we’ll just give it up one of these days.

Mobile phones and tablets

I have a Nexus 5 as my daily phone. I also have a Nexus 7 and Nexus 10 that tend to be used by the kids mostly.

I have two Firefox OS devices for development/work.

I’m eight months in on my Mozilla adventure

I started working for Mozilla in January 2014. Here’s some reflections from my first time as Mozilla employee.

Working from home

I’ve worked completely from home during some short periods before in my life so I had an idea what it would be like. So far, it has been even better than I had anticipated. It suits me so well it is almost scary! No commutes. No delays due to traffic. No problems ever with over-crowded trains or buses. No time wasted going to work and home again. And I’m around when my kids get home from school and it’s easy to receive deliveries all days. I don’t think I ever want to work elsewhere again… 🙂

Another effect of my work place is also that I probably have become somewhat more active on social networks and IRC. If I don’t use those means, I may spent whole days without talking to any humans.

Also, I’m the only Mozilla developer in Sweden – although we have a few more employees in Sweden. (Update: apparently this is wrong and there’s’ also a Mats here!)

Daniel's home office

The freedom

I have freedom at work. I control and decide a lot of what I do and I get to do a lot of what I want at work. I can work during the hours I want. As long as I deliver, my employer doesn’t mind. The freedom isn’t just about working hours but I also have a lot of control and saying about what I want to work on and what I think we as a team should work on going further.

The not counting hours

For the last 16 years I’ve been a consultant where my customers almost always have paid for my time. Paid by the hour I spent working for them. For the last 16 years I’ve counted every single hour I’ve worked and made sure to keep detailed logs and tracking of whatever I do so that I can present that to the customer and use that to send invoices. Counting hours has been tightly integrated in my work life for 16 years. No more. I don’t count my work time. I start work in the morning, I stop work in the evening. Unless I work longer, and sometimes I start later. And sometimes I work on the weekend or late at night. And I do meetings after regular “office hours” many times. But I don’t keep track – because I don’t have to and it would serve no purpose!

The big code base

I work with Firefox, in the networking team. Firefox has about 10 million lines C and C++ code alone. Add to that everything else that is other languages, glue logic, build files, tests and lots and lots of JavaScript.

It takes time to get acquainted with such a large and old code base, and lots of the architecture or traces of the original architecture are also designed almost 20 years ago in ways that not many people would still call good or preferable.

Mozilla is using Mercurial as the primary revision control tool, and I started out convinced I should too and really get to learn it. But darn it, it is really too similar to git and yet lots of words are intermixed and used as command but they don’t do the same as for git so it turns out really confusing and yeah, I felt I got handicapped a little bit too often. I’ve switched over to use the git mirror and I’m now a much happier person. A couple of months in, I’ve not once been forced to switch away from using git. Mostly thanks to fancy scripts and helpers from fellow colleagues who did this jump before me and already paved the road.

C++ and code standards

I’m a C guy (note the absence of “++”). I’ve primarily developed in C for the whole of my professional developer life – which is approaching 25 years. Firefox is a C++ fortress. I know my way around most C++ stuff but I’m not “at home” with C++ in any way just yet (I never was) so sometimes it takes me a little time and reading up to get all the C++-ishness correct. Templates, casting, different code styles, subtleties that isn’t in C and more. I’m slowly adapting but some things and habits are hard to “unlearn”…

The publicness and Bugzilla

I love working full time for an open source project. Everything I do during my work days are public knowledge. We work a lot with Bugzilla where all (well except the security sensitive ones) bugs are open and public. My comments, my reviews, my flaws and my patches can all be reviewed, ridiculed or improved by anyone out there who feels like doing it.

Development speed

There are several hundred developers involved in basically the same project and products. The commit frequency and speed in which changes are being crammed into the source repository is mind boggling. Several hundred commits daily. Many hundred and sometimes up to a thousand new bug reports are filed – daily.

yet slowness of moving some bugs forward

Moving a particular bug forward into actually getting it land and included in pending releases can be a lot of work and it can be tedious. It is a large project with lots of legacy, traditions and people with opinions on how things should be done. Getting something to change from an old behavior can take a whole lot of time and massaging and discussions until they can get through. Don’t get me wrong, it is a good thing, it just stands in direct conflict to my previous paragraph about the development speed.

In the public eye

I knew about Mozilla before I started here. I knew Firefox. Just about every person I’ve ever mentioned those two brands to have known about at least Firefox. This is different to what I’m used to. Of course hardly anyone still fully grasp what I’m actually doing on a day to day basis but I’ve long given up on even trying to explain that to family and friends. Unless they really insist.

Vitriol and expectations of high standards

I must say that being in the Mozilla camp when changes are made or announced has given me a less favorable view on the human race. Almost anything or any chance is received by a certain amount of users that are very aggressively against the change. All changes really. “If you’ll do that I’ll be forced to switch to Chrome” is a very common “threat” – as if that would A) work B) be a browser that would care more about such “conservative loonies” (you should consider that my personal term for such people)). I can only assume that the Chrome team also gets a fair share of that sort of threats in the other direction…

Still, it seems a lot of people out there and perhaps especially in the Free Software world seem to hold Mozilla to very high standards. This is both good and bad. This expectation of being very good also comes from people who aren’t even Firefox users – we must remain the bright light in a world that goes darker. In my (biased) view that tends to lead to unfair criticisms. The other browsers can do some of those changes without anyone raising an eyebrow but when Mozilla does similar for Firefox, a shitstorm breaks out. Lots of those people criticizing us for doing change NN already use browser Y that has been doing NN for a good while already…

Or maybe I’m just not seeing these things with clear enough eyes.

How does Mozilla make money?

Yeps. This is by far the most common question I’ve gotten from friends when I mention who I work for. In fact, that’s just about the only question I get from a lot of people… (possibly because after that we get into complicated questions such as what exactly do I do there?)

curl and IETF

I’m grateful that Mozilla allows me to spend part of my work time working on curl.

I’m also happy to now work for a company that allows me to attend to IETF/httpbis and related activities much better than ever I’ve had the opportunity to in the past. Previously I’ve pretty much had to spend spare time and my own money, which has limited my participation a great deal. The support from Mozilla has allowed me to attend to two meetings so far during the year, in London and in NYC and I suspect there will be more chances in the future.

Future

I only just started. I hope to grab on to more and bigger challenges and tasks as I get warmer and more into everything. I want to make a difference. See you in bugzilla.

libressl vs boringssl for curl

I tried to use two OpenSSL forks yesterday with curl. I built both from source first (of course, as I wanted the latest and greatest) an interesting thing in itself since both projects have modified the original build system so they’re now three different ways.

libressl 2.0.0 installed and built flawlessly with curl and I’ve pushed a change that shows LibreSSL instead of OpenSSL when doing curl -V etc.

boringssl didn’t compile from git until I had manually fixed a minor nit, and then it has no “make install” target at all so I had manually copy the libs and header files to a place suitable for curl’s configure to detect. Then the curl build failed because boringssl isn’t API compatiable with some of the really old DES stuff – code we use for NTLM. I asked Adam Langley about it and he told me that calling code using DES “needs a tweak” – but I haven’t yet walked down that road so I don’t know how much of a nuisance that actually is or isn’t.

Summary: as an openssl replacement, libressl wins this round over boringssl with 3 – 0.

improving the curl docs, step 1

As I mentioned before, the curl documentation needs improvement. As a first step I converted the man page for curl_easy_setopt into no less than 210(!) individual man pages. One new for each option the function supports.

The man page was originally (a few days ago) almost 3000 lines, and now with them all split up we end up with a lot more text. This because the new format encourages more text per option and each page now has to detail itself more. This should also make each option much easier to google/search and to link to when we help users understand the options.

I’ve made some server-side scripts to generate html versions of them all, I generate a list of all options we have and the examples we host on the web site now have all mentions of the options linked directly to these new pages.

The curl_easy_setopt man page will then get most explanations cut out and mainly be used as an index with the options grouped into logical sections to help users find the options they want to use. I could cut out almost 2500 lines.

The new man pages add about 7500 lines of documentation (excluding the headers in each file)…

curl the next few years

Roadmap of things Daniel Stenberg and Steve Holme want to work on next. It is intended to serve as a guideline for others for information, feedback and possible participation.

If you agree, disagree or would like to add stuff you want to work on, please join us on the curl-library list! This “roadmap” is likely to change over time. We’ll keep the updated ROADMAP in git.

New stuff – libcurl

  1. http2 test suite

  2. http2 multiplexing/pipelining

  3. SPDY

  4. SRV records

  5. HTTPS to proxy

  6. make sure there’s an easy handle passed in to curl_formadd(), curl_formget() and curl_formfree() by adding replacement functions and deprecating the old ones to allow custom mallocs and more

  7. HTTP Digest authentication via Windows SSPI

  8. GSSAPI authentication in the email protocols

  9. add support for third-party SASL libraries such as Cyrus SASL – may need to move existing native and SSPI based authentication into vsasl folder after reworking HTTP and SASL code

  10. SASL authentication in LDAP

  11. Simplify the SMTP email interface so that programmers don’t have to construct the body of an email that contains all the headers, alternative content, images and attachments – maintain raw interface so that programmers that want to do this can

  12. Allow the email protocols to return the capabilities before authenticating. This will allow an application to decide on the best authentication mechanism

  13. Allow Windows threading model to be replaced by Win32 pthreads port

  14. Implement a dynamic buffer size to allow SFTP to use much larger buffers and possibly allow the size to be customizable by applications. Use less memory when handles are not in use?

New stuff – curl

  1. Embed a language interpreter (lua?). For that middle ground where curl isn’t enough and a libcurl binding feels “too much”. Build-time conditional of course.

  2. Simplify the SMTP command line so that the headers and multi-part content don’t have to be constructed before calling curl

Improve

  1. build for windows (considered hard by many users)

  2. curl -h output (considered overwhelming to users)

  3. we have  > 160 command line options, is there a way to redo things to simplify or improve the situation as we are likely to keep adding features/options in the future too

  4. docs (considered “bad” by users but how do we make it better?)

  5. authentication framework (consider merging HTTP and SASL authentication to give one API for protocols to call)

  6. Perform some of the clean up from the TODO document, removing old definitions and such like that are currently earmarked to be removed years ago

Remove

  1. cmake support (nobody maintains it)

  2. makefile.vc files as there is no point in maintaining two sets of Windows makefiles. Note: These are currently being used by the Windows autobuilds

The curl and libcurl 2014 survey

Reading through the answers to the curl project‘s survey “curl and libcurl 2014” is very interesting and educational.

After having lead and participated in this project for so long I have my own picture of what we’re good and bad at. That’s not exactly the same image I get when I read the survey responses. That’s of course the educating part and I really want to learn from this poll and see where to put in some efforts and attempt to improve. At the same time I’ve been working for a while to put together a roadmap for the project, and the survey will help guide us with that work as well.

The full generated summary of the answers can be found on the site, but I thought I do the extra effort here and try to extrapolate data, compare and try to get to the real story that lurks in the shadows.

Over the almost 10 days the poll was open, we received 194 responses. I was hoping for more participation, but on the other hand I don’t think more people would’ve given a much different view. My only concern would be that I’m not sure exactly how well we reached out.

Almost all curl users use it for HTTP and HTTPS. Sure, we also use a lot of other protocols and in fact all supported protocols did up having at least two users according to the survey, but only a single digit percentage did not mark HTTP and HTTPS as protocols they use. The least used supported protocol gopher, is used among 1.5% of the users who responded.

FTPS and SFTP are basically equally much used and they are the 4th and 5th most used protocols. HTTP, HTTPS and FTP are clearly our most popular protocols.

Only one in five users use curl on a single platform. All others use it on two or more, and one if four use it on four or more with an unexpectedly high 11% saying they use it on 5 or more platforms! That’s a pretty strong message to me that our multi-platform strategy is important.

Our users have been with us for a long time. Half of the users have been using curl for five years or more! A fifth has been with us for 8 years or more! And yet there seems to be a healthy amount of newcomers finding us as 14% is within their first year.

The above numbers combined, I’m not surprised but only happy to see that 4 out of 5 users are also involved in other open source projects. curl is just one piece in a large ecosystem and I think it is good that we all participate in several projects so that we learn and cross-pollinate where possible!

Less than half of the respondents are subscribed to a curl mailing list, and curl-library is the most popular one. This also reflects in subscriber numbers on the actual mailing lists where curl-library with its 1400+ members has almost twice as many subscribers as curl-users. One way to view this is that we are old enough, established enough and working enough so that users don’t have to subscribe to our lists to keep up. The less optimistic way to see it could be that this is because we haven’t reached out good enough or that our mailing list culture/setup isn’t welcoming enough.

Perhaps most surprising to me: that several persons got upset and reacted strongly to the question about how good we treat “female and other minorities” in the project. To me there’s no doubt that female contributors are a minority in the curl community and I want to learn if we’re doing our best to be inclusive and open to all possible contributors. Or at least how good/bad people think we are doing.

29% of the respondents have contributed patches, meaning 56 individuals. I think that tells more about the ones who took part of the survey than it measures participation level among “regular users”.

Documentation

A big revelation for me was the question where I asked people to identify the “worst parts” of the project. The image here below is the look of the summary.

It quite clearly identifies “documentation” as the area in most need of improvements.

I don’t think the amount of docs is the problem. After discussing with people I think the primary issues are:

  • Some collections of docs are just too big and hard to find in, like the curl man page and the curl_easy_setopt man pages. We need to split them up and/or rearrange somehow to help people find the info they need. Work has started on this. I’ll follow up with details later.
  • We get slightly bad “reviews” on this when people confuse the libcurl bindings’ lack of docs to be our problem. Lots of libcurl bindings are not very good documented – but they are separate projects not controlled or documented by us. I don’t know what we can do to help that situation. Suggestions are very welcome!
  • We don’t have much step-by-step tutorials on how to get started and how to knit things together. We mostly provide reference manuals. I will appreciate help with improving this!

cURL

Http2 interim meeting NYC

On June 5th, around thirty people sat down around a huge table in a conference room on the 4th floor in the Google offices in New York City, with a heavy rain pouring down outside.

It was time for another IETF http2 interim meeting. The attendees were all participants in the HTTPbis work group and came from a wide variety of companies and countries. The major browser vendors were represented there, and so were operators and big service providers and some proxy people. Most of the people who have been speaking up on the mailing list over the last year or so, unfortunately with a couple of people notably absent. (And before anyone asks, yes we are a group where the majority is old males like me.)

Most people present knew many of the others already, which helped to create a friendly familiar spirit and we quickly got started on the Thursday morning working our way through the rather long lits of issues to deal with. When we had our previous interim meeting in London, I think most of us though we would’ve been further along today but recent development and discussions on the list had actually brought back a lot of issues we though we were already done with and we now reiterated a whole slew of subjects. We weren’t allowed to take photographs indoors so you won’t see any pictures of this opportunity from me here.

Google offices building logo

We did close many issues and I’ll just quickly mention some of the noteworthy ones here…

Extensions

We started out with the topic of “extensions”. Should we revert the decision from Zurich (where it was decided that we shouldn’t allow extensions in http2) or was the current state of the protocol the right one? The arguments for allowing extensions included that we’d keep getting requests for new things to add unless we have a way and that some of the recent stuff we’ve added really could’ve been done as extensions instead. An argument against it is that it makes things much simpler and reliable if we just document exactly what the protocol has and is, and removing “optional” behavior from the protocol has been one of the primary mantas along the design process.

The discussion went back and forth for a long time, and after almost three hours we had kind of a draw. Nobody was firmly against “the other” alternative but the two sides also seemed to have roughly the same amount of support. Then it was yet again time for the coin toss to guide us. Martin brought out an Australian coin and … the next protocol draft will allow extensions. Again. This also forces implementation to have to read and skip all unknown frames it receives compared to the existing situation where no unknown frames can ever occur.

BLOCKED as an extension

A rather given first candidate for an extension was the BLOCKED frame. At the time BLOCKED was added to the protocol it was explicitly added into the spec because we didn’t have extensions – and it is now being lifted out into one.

ALTSVC as an extension

What received slightly more resistance was the move to move out the ALTSVC frame as well. It was argued that the frame isn’t mandatory to support and therefore easily can be made into an extension.

Simplified padding

Another small change of the wire format since draft-12 was the removal of the high byte for padding to simplify. It reduces the amount you can pad a single frame but you can easily pad more using other means if you really have to, and there were numbers presented that said that 255 bytes were enough with HTTP 1.1 already so probably it will be enough for version 2 as well.

Schedule

There will be a new draft out really soon: draft -13. Martin, our editor of the spec, says he’ll be able to ship it in a week. That is intended to be the last draft, intended for implementation and it will then be expected to get deployed rather widely to allow us all in the industry to see how it works and be able to polish details or wordings that may still need it.

We had numerous vendors and HTTP stack implementers in the room and when we discussed schedule for when various products will be able to see daylight. If we all manage to stick to the plans. we may just have plenty of products and services that support http2 by the September/October time frame. If nothing major is found in this latest draft, we’re looking at RFC status not too far into 2015.

Meeting summary

I think we’re closing in for real now and I have good hopes for the protocol and our progress to a really wide scale deployment across the Internet. The HTTPbis group is an awesome crowd to work with and I had a great time. Our hosts took good care of us and made sure we didn’t lack any services or supplies. Extra thanks go to those of you who bought me dinners and to those who took me out to good beer places!

My http2 document

Yeah, it will now become somewhat out of date and my plan is to update it once the next draft ships. I’ll also do another http2 presentation already this week so I hope to also post an updated slide set soonish. Stay tuned!

Wireshark

My plan is to cooperate with the other Wireshark hackers and help making sure we have the next draft version supported in Wireshark really soon after its published.

curl and nghttp2

Most of the differences introduced are in the binary format so nghttp2 will need to be updated again – it is the library curl uses for the wire format of http2. The curl parts will need some adjustments, for example for Content-Encoding gzip that no longer is implicit but there should be little to do in the curl code for this draft bump.

Why SFTP is still slow in curl

Okay, there’s no point in denying this fact: SFTP transfers in curl and libcurl are much slower than if you just do them with your ordinary OpenSSH sftp command line tool or similar. The difference in performance can even be quite drastic.

Why is this so and what can we do about it? And by “we” I fully get that you dear reader think that I or someone else already deeply involved in the curl project should do it.

Background

I once blogged a lengthy post on how I modified libssh2 to do SFTP transfers much faster. curl itself uses libssh2 to do SFTP so there’s at least a good start. The problem is only that the speedup we did in libssh2 was because of SFTP’s funny protocol design so we had to:

  1. send off requests for a (large) set of data blocks at once, each block being N kilobytes big
  2. using a several hundred kilobytes big buffer (when downloading the received data would be stored in the big buffer)
  3. then return as soon as there’s one block (or more) that has returned from the server with data
  4. over time and in a loop, there are then blocks constantly in transit and a number of blocks always returning. By sending enough outgoing requests in the “outgoing pipe”, the “incoming pipe” and CPU can be kept fairly busy.
  5. never wait until the entire receive buffer is complete before we go on, but instead use a sliding buffer so that we avoid “halting points” in the transfer

This is more or less what the sftp tool does. We’ve also done experiments with using libssh2 directly and then we can reach quite decent transfer speeds.

libcurl

The libcurl transfer core is basically the same no matter which protocol that is being transferred. For a normal download this is what it does:

  1. waits for data to become available
  2. read as much data as possible into a 16KB buffer
  3. send the data to the application
  4. goto 1

So, there are two problems with this approach when it comes to the SFTP problems as described above.

The first one is that a 16KB buffer is very small in SFTP terms and immediately becomes a bottle neck in itself. In several of my experiments I could see how a buffer of 128, 256 or even 512 kilobytes would be needed to get high bandwidth high latency transfers to really fly.

The second being that with a fixed buffer it will come to a point every 16KB byte where it needs to wait for that specific response to come back before it can continue and ask for the next 16KB of data. That “sync point” is really not helping performance either – especially not when it happens so often as every 16KB.

A solution?

For someone who just wants a quick-fix and who builds their own libcurl, rebuild with CURL_MAX_WRITE_SIZE set to 256000 or something like that and you’ll get a notable boost. But that’s neither a nice nor clean fix.

A proper fix should first of all only be applied for SFTP transfers, thus deciding at run-time if it is necessary or not. Then it should dynamically provide a larger buffer and thirdly, for upload it should probably make the buffer “sliding” as in the libssh2 example code sftp_write_sliding.c.

This is also already mentioned in the TODO document as “Modified buffer size approach“.

There’s clearly room for someone to step forward and help us improve in this area. Welcome!

curl dot-to-dot