Tag Archives: libssh2

Cheers for curl 7.58.0

Here’s to another curl release!

curl 7.58.0 is the 172nd curl release and it contains, among other things, 82 bug fixes thanks to 54 contributors (22 new). All this done with 131 commits in 56 days.

The bug fix rate is slightly lower than in the last few releases, which I tribute mostly to me having been away on vacation for a month during this release cycle. I retain my position as “committer of the Month” and January 2018 is my 29th consecutive month where I’ve done most commits in the curl source code repository. In total, almost 58% of the commits have been done by me (if we limit the count to all commits done since 2014, I’m at 43%). We now count a total of 545 unique commit authors and 1,685 contributors.

So what’s new this time? (full changelog here)

libssh backend

Introducing the pluggable SSH backend, and libssh is now the new alternative SSH backend to libssh2 that has been supported since late 2006. This change alone brought thousands of new lines of code.

Tell configure to use it with –with-libssh and you’re all set!

The libssh backend work was done by Nikos Mavrogiannopoulos, Tomas Mraz, Stanislav Zidek, Robert Kolcun and Andreas Schneider.

Security

Yet again we announce security issues that we’ve found and fixed. Two of them to be exact:

  1. We found a problem with how HTTP/2 trailers was handled, which could lead to crashes or even information leakage.
  2. We addressed a problem for users sending custom Authorization: headers to HTTP servers and who are then redirected to another host that shouldn’t receive those Authorization headers.

Progress bar refresh

A minor thing, but we refreshed the progress bar layout for when no total size is known.

Next?

March 21 is the date set for next release. Unless of course we find an urgent reason to fix and release something before then…

decent durable defect density displayed

Here’s an encouraging graph from our regular Coverity scans of the curl source code, showing that we’ve maintained a fairly low “defect density” over the last two years, staying way below the average density level.
defect density over timeClick the image to view it slightly larger.

Defect density is simply the number of found problems per 1,000 lines of code. As a little (and probably unfair) comparison, right now when curl is flat on 0, Firefox is at 0.47, c-ares at 0.12 and libssh2 at 0.21.

Coverity is still the primary static code analyzer for C code that I’m aware of. None of the flaws Coverity picked up in curl during the last two years were detected by clang-analyzer for example.

Why SFTP is still slow in curl

Okay, there’s no point in denying this fact: SFTP transfers in curl and libcurl are much slower than if you just do them with your ordinary OpenSSH sftp command line tool or similar. The difference in performance can even be quite drastic.

Why is this so and what can we do about it? And by “we” I fully get that you dear reader think that I or someone else already deeply involved in the curl project should do it.

Background

I once blogged a lengthy post on how I modified libssh2 to do SFTP transfers much faster. curl itself uses libssh2 to do SFTP so there’s at least a good start. The problem is only that the speedup we did in libssh2 was because of SFTP’s funny protocol design so we had to:

  1. send off requests for a (large) set of data blocks at once, each block being N kilobytes big
  2. using a several hundred kilobytes big buffer (when downloading the received data would be stored in the big buffer)
  3. then return as soon as there’s one block (or more) that has returned from the server with data
  4. over time and in a loop, there are then blocks constantly in transit and a number of blocks always returning. By sending enough outgoing requests in the “outgoing pipe”, the “incoming pipe” and CPU can be kept fairly busy.
  5. never wait until the entire receive buffer is complete before we go on, but instead use a sliding buffer so that we avoid “halting points” in the transfer

This is more or less what the sftp tool does. We’ve also done experiments with using libssh2 directly and then we can reach quite decent transfer speeds.

libcurl

The libcurl transfer core is basically the same no matter which protocol that is being transferred. For a normal download this is what it does:

  1. waits for data to become available
  2. read as much data as possible into a 16KB buffer
  3. send the data to the application
  4. goto 1

So, there are two problems with this approach when it comes to the SFTP problems as described above.

The first one is that a 16KB buffer is very small in SFTP terms and immediately becomes a bottle neck in itself. In several of my experiments I could see how a buffer of 128, 256 or even 512 kilobytes would be needed to get high bandwidth high latency transfers to really fly.

The second being that with a fixed buffer it will come to a point every 16KB byte where it needs to wait for that specific response to come back before it can continue and ask for the next 16KB of data. That “sync point” is really not helping performance either – especially not when it happens so often as every 16KB.

A solution?

For someone who just wants a quick-fix and who builds their own libcurl, rebuild with CURL_MAX_WRITE_SIZE set to 256000 or something like that and you’ll get a notable boost. But that’s neither a nice nor clean fix.

A proper fix should first of all only be applied for SFTP transfers, thus deciding at run-time if it is necessary or not. Then it should dynamically provide a larger buffer and thirdly, for upload it should probably make the buffer “sliding” as in the libssh2 example code sftp_write_sliding.c.

This is also already mentioned in the TODO document as “Modified buffer size approach“.

There’s clearly room for someone to step forward and help us improve in this area. Welcome!

curl dot-to-dot

Introducing curl_multi_wait

Facebook contributes fix to libcurl’s multi interface to overcome problem with more than 1024 file descriptors.

When we introduced the multi interface to libcurl about (what feels like) one hundred years ago, we went with simple in some ways. One way it shows: an application that wants to do many transfers in parallel asks libcurl to do it, and then it extracts the set of file descriptors (sockets!) from libcurl (using curl_multi_fdset) to wait for as plain fd_sets. fd_set is the variable type made for select(). This API choice made applications pretty much forced to use select. select() has  its fair share of problems, where possibly the biggest one is that it has problems with file descriptors > 1024.

Later on we introduced an enhanced version of the multi interface for libcurl that allows an application to use whatever method it pleases. I tend to refer to that variation as the multi_socket API after its main function curl_multi_socket_action. That’s the high performance, event-driven API.

As you may be aware, event-driven code make things a bit more complicated at times so many people still prefer to use the older and simpler multi interface and thus they were forced to remain using select(). But now that era has ended. Now…

curl_multi_wait() is introduced!

This poll(3)-like function basically works as a replacement for curl_multi_fdset() + select(). Starting in libcurl 7.28.0 (strictly speaking in commit de24d7bd4c03ea3), this is a function that any application can use for this purpose, and thus avoid the problem with many file descriptors.

This new function doesn’t use any struct from the “real” poll() or associated headers to make sure that it works even for systems without a real poll() implementation. It instead uses private curl versions of both the struct and the defines used. An application can of course also tell curl_multi_wait to wait for a set of private file descriptors, just like poll() or select().

The patch set that brought this function was provided by Sara Golemon, a friend from from a related project

cURL

darwin native SSL for curl

I recently mentioned the new schannel support for libcurl that allows libcurl to do SSL natively without the use of any external libraries on Windows.

This “getting native support” obviously triggered Nick Zitzmann who stepped up and sent in Secure Transport support – the native API for doing SSL on Mac OS X and iOS. This ninth supported SSL library is now called ‘darwinssl’ in the curl code base. There have been some follow-up commits too to cleanup things and make use of that API for providing the necessary function calls when doing NTLM too etc.

This functionality is merged in to curl’s master git repository and will be part of the upcoming curl 7.27.0 release, planned to hit the public at the end of July 2012.

It could be noted that if your for example build curl/libcurl to also support SCP and SFTP, you’d be linking with libssh2 for that and libssh2 is still relying on a crypto library that is either OpenSSL or gcrypt so you may in fact still end up linking with a 3rd party crypto library… Nick mentioned in a separate mail how he has looked into making libssh2 use the Secure Transport API, but that he faced some issues regarding big numbers which made him hesitate and consider how to move forward.

schannel support in libcurl

schannel is the API Microsoft provides to allow applications to for example implement SSL natively, without needing any third part library.

On Monday June 11th we merged the 30+ commits Marc Hörsken brought us. This is now the 8th SSL variation supported by libcurl, and I figure this is going to become fairly popular now in the Windows camp coming the next release: curl 7.27.0.

So now my old talk about the seven SSL libraries libcurl supported has become outdated…

It can be worth noting that as long as you build (lib)curl to also support SCP and SFTP, powered by libssh2, that library will still require a separate crypto library and libssh2 supports to get built with either OpenSSL or gcrypt. Marc mentioned that he might work on making that one use schannel as well.

cURL

550M users

(This text has been updated since first post. It used to say 300 million but then I missed all iOS devices…)

Ok, so here’s a little ego game. The rules are very simple: try to figure out all things I’ve written code in (to any noticeable degree) and count how many users the products that use such code might have. Then estimate the total amount of humans that may in fact use my code from time to time.

I’ve been doing software both for fun and professionally for over 20 years (my first code I made available to others was written in 1986 on the C64). But as I look back on what I’ve done at my day job for all this time, most of my labor have been hidden into some sort of devices or equipment that never really were distributed to many customers. I don’t think I’ve ever done software professionally for consumer stuff. My open source code however has found its way into all sorts of things so I decided I could limit this count to open source code I’ve done. It is also slightly easier. Or perhaps less hard. And when it comes to open source, none of my other projects is as popular and widely used as curl. Counting curl users will drown all others.

First some basic stats: the curl.haxx.se web site gets more than 12000 unique visitors every weekday. curl packages are downloaded from there at a rate of roughly 1 million times/year. The site sends over 200GB of data every month. We have no idea how large share of users who get curl from the main site, but a guess is that it is far less than half of the user base. But of course the number of downloads says nothing about how many users there are.

Mac OS X ships with curl (and libcurl?) by default. There are perhaps 86 million macs in the world.

libcurl is used in television sets and Bluray players made by at least five major brands (LG, Panasonic, Philips, Sony and Toshiba). I’m convinced they don’t use it in all models but probably just a few of their higher end internet-connected ones. 10% of the total? It seems in 2009 there were 35 million flat panel TVs sold in the US with a forecast of the sales growing slightly over the years. I figure that would mean perhaps 100 million ones sold in the US the three last years possibly made by these brands (and lets assume that includes some Blu-rays too), and lets say that is half the world market for them, it would make libcurl shipping in 20 million something TVs.

curl and libcurl are installed by default in some Linux distributions but not in all. In Debian it is an optional extra and the popcon overview shows perhaps 70% of Debian users install libcurl (and 56% use libssh2). Lets assume that’s a suitable average for all desktop Linux users. How many are we? Let’s for the sake of the argument say that 3% of all computers using the internet run Linux. Some numbers say there are 2.3 billion internet users. It would make 70 million Linux computers and thus 49 million libcurl installations. Roughly.

Open Office and the recent spin-off LibreOffice are both using libcurl. Open Office said they have 100 million users now in May 2012.

Games: Second Life, Warhammer 40000, Ghost Recon, Need for speed world, Game Face and “Saints Row: The third” all use libcurl. The first game alone boasts over 20 million registered users. I couldn’t find any numbers for any other game I know uses libcurl.

Other embedded uses: libcurl and libssh2 are both announced as supported packages of Wind River Linux, the perhaps most dominant provider of embedded Linux and another leading provider is Montavista which also offers curl and libcurl. How many users? I have absolutely no idea. I’d say more than just a few, but how many? Impossible to tell so let’s ignore that possibly huge install base. Spotify uses or at least used libcurl, and early 2012 they had 15 million users.

Phones. libcurl is shipped in iOS and WebOS and it is used by RIM and Apple for some (to me) unknown purposes. Lots of applications on Android still build and use libcurl, c-ares and libssh2 for their apps but it is just impossible to estimate how many users they get. Apple has sold 250 million iOS devices, at least. (This little number was missed by me in the calculation I first posted.)

ios-credits

Infrastructure. libcurl is used in the Tornado web server made by Friendfeed/Facebook and it is used by significant services at Yahoo.com. How many users of said services? Surely many millions. But really, that would be users of just 2 libcurl users so let’s not rush ahead and count those as direct users!

libcurl powers the very popular PHP/CURL extension that a large amount of PHP-running sites have enabled and use. How many? In 2008, 33% of all internet sites run PHP. Let’s say the share has decreased to 30% since then and the total amount of active sites is now 200M. That makes 60M PHP sites, and if there’s 10% of them using PHP/CURL we’re talking 6 million users.

Development. git, darcs, bazaar and Mercurial are all children of the distribution version control systems (some of them very popular) and they all use libcurl. How many users do they have? Since they’re all working on multiple platforms I would estimate the number of users of them collectively to be in the tens of millions range. Let’s say 10 million.

86 + 20 + 49 + 100 + 20 + 15 + 250 + 6 + 10 = 556 million users

550-million

And yes, of course a lot of these users will be the same actual human. But I may also just have counted all the numbers completely wrong to start with. I would say I’m probably within the correct magnitude!

550 million users out of the world’s 2.3 billion internet users. 1 out of 4 are using something that runs code I wrote. Kind of cool!

Sweden has a population of less than 10 million. 550 million is almost twice the entire USA, four times the population of Russia or almost eight times the population of Germany… As a comparison to some big browsers, a recent article claims Google Chrome has 200 million users in April 2012 which may be around 25% of the browser market and showing that basically none of the individual browsers have a lot more users than 300 million…

Of course I know that every single person who reads this is a knowing or unknowing user… Can you think of any other major users?

Travel for fun or profit

As a protocol geek I love working in my open source projects curl, libssh2, c-ares and spindly. I also participate in a few related IETF working groups around these protocols, and perhaps primarily I enjoy the HTTPbis crowd.

Meanwhile, I’m a consultant during the day and most of my projects and assignments involve embedded systems and primarily embedded Linux. The protocol part of my life tends to be left to get practiced during my “copious” amount of spare time – you know that time after your work, after you’ve spent time with your family and played with your kids and done the things you need to do at home to keep the household in a decent shape. That time when the rest of the family has gone to bed and you should too but if you did when would you ever get time to do that fun things you really want to do?

IETF has these great gatherings every now and then and they’re awesome places to just drown in protocol mumbo jumbo for several days. They’re being hosted by various cities all over the world so often I deem them too far away or too awkward to go to, also a lot because I rarely have any direct monetary gain or compensation for going but rather I’d have to do it as a vacation and pay for it myself.

IETF 83 is going to be held in Paris during March 25-30 and it is close enough for me to want to go and HTTPbis and a few other interesting work groups are having scheduled meetings. I really considered going, at least to meet up with HTTP friends.

Something very rare instead happened that prevents me from going there! My customer (for whom I work full-time since about six months and shall remain nameless for now) asked me to join their team and go visit the large embedded conference ESC in San Jose, California in the exact same week! It really wasn’ t a hard choice for me, since this is my job and being asked to do something because I’m wanted is a nice feeling and position – and they’re paying me to go there. It will also be my first time in California even though I guess I won’t get time to actually see much of it.

I hope to write a follow-up post later on about what I’m currently working with, once it has gone public.

The first month of Spindly

Let me entertain you with some info and updates from the Spindly project. (Unfortunately we don’t have any logo yet so I don’t get to show it off here.)

Since I announced my intention to proceed and write the SPDY library on my own instead of waiting for libspdy to get back to life, I have worked on a number of infrastructure details.

I converted the build to use autotools and libtool to help us really make it a portable library. I made all test cases run without memory leaks and this took some amount of changes of libspdy since it was clearly not written with carefully checking memory and there were also a lot of unnecessarily small mallocs(). Anyone who does malloc() of 8 bytes should reconsider what they’re doing.

Since I’ve had to bugfix the libspdy so much, change structs and APIs and add new functions that were missing I decided that there’s no point in us trying to keep the original libspdy code or code style intact anymore so I’ve re-indented the whole code base to a style I like better than the original style.

I’ve started to write the fundamentals of a client and server demo application that is meant to use the Spindly API to implement both sides. They don’t really do much yet but the basics are in place. I’ve worked more on my idea of what the spindly API should look like. I’ve written the code for a few functions from that API and I’ve also added a few tests for them.

Most of this work has been made by me and me alone with no particular feedback or help by others. I continue to push my changes to github without delay and I occasionally announce stuff on the mailing list to keep interested people up to date. Hopefully this will lead to someone else joining in sooner or later.

The progress has not been very fast, not only because I’ve had to do a lot of thinking about how the API should ideally work to be really useful, but also because I have quite a lot of commitments in other open source projects (primarily curl and libssh2) that require their amount of time, not to mention that my day job of course needs proper attention.

We offer a daily snapshot of the code if you can’t use or don’t want to use git.

Upcoming

I intend to add more functions from the API document, one by one and test cases for each as I go along. In parallel I hope to get the demo client and server to run so that the API proves to actually work properly.

I want the demo client and server also to allow them to run interop tests against other implementations and I want them to be able to speak SPDY with SSL switched off – for debugging reasons. Later on, I hope to be able to use the demo server in the curl test suite so that I can test that the curl SPDY integration works correctly.

We need to either fix “check” (the unit test suite) to work C89 compatible or replace it with something else.

Want to help?

If you want to help, please subscribe to the mailing list, get familiar with the code base, study the API doc and see if it makes sense to you and then help me get that API turned into code…

Three out of one hundred

If I’m not part of the solution, I’m part of the problem and I don’t want to be part of the problem. More specifically, I’m talking about female presence in tech and in particular in open source projects.

3 out of 100I’ve been an open source and free software hacker, contributor and maintainer for almost 20 years. I’m the perfect stereo-type too: a white, hetero, 40+ years old male living in a suburb of a west European city. (I just lack a beard.) I’ve done more than 20,000 commits in public open source code repositories. In the projects I maintain, and have a leading role in, and for the sake of this argument I’ll limit the discussion to curl, libssh2, and c-ares, we’re certainly no better than the ordinary average male-dominated open source projects. We’re basically only men (boys?) talking to other men and virtually all the documentation, design and coding is done by male contributors (to a significant degree).

Sure, we have female contributors in all these projects, but for example in the curl case we have over 850 named contributors and while I’m certainly not sure who is a woman and who is not when I get contributions, there’s only like 10 names in the list that are typically western female names. Let’s say there are 20. or 30. Out of a total of 850 the proportions are devastating no matter what. It might be up to 3%. Three. THREE. I know women are under-represented in technology in general and in open source in particular, but I think 3% is even lower than the already low bad average open source number. (Although, some reports claim the number of female developers in foss is as low as just above 1%, geekfeminism says 1-5%).

Numbers

Three percent. (In a project that’s been alive and kicking for thirteen years…) At this level after this long time, there’s already a bad precedent and it of course doesn’t make it easier to change now. It is also three percent of the contributors when we consider all contributors alike. If we’d count the number of female persons in leading roles in these projects, the amount would be even less.

It could be worth noting that we don’t really have any recent reliable stats for “real world” female share either. Most sources that I find on the Internet and people have quoted in talks tend to repeat old numbers that were extracted using debatable means and questions. The comparisons I’ve seen repeated many times on female participation in FOSS vs commercial software, are very often based on stats that are really not comparable. If someone has reliable and somewhat fresh data, please point them out for me!

“Ghosh, R. A.; Glott, R.; Krieger, B.;
Robles, G. 2002. Free/Libre and Open Source Software: Survey and Study. Part
IV: Survey of Developers. Maastricht: International Institute of Infonomics
/Merit.

A design problem of “the system”

I would blame “the system”. I’m working in embedded systems professionally as a consultant and contract developer. I’ve worked as a professional developer for some 20 years. In my niche, there’s not even 10% female developers. A while ago I went through my past assignments in order to find the last female developer that I’ve worked with, in a project, physically located in the same office. The last time I met a fellow developer at work who was female was early 2007. I’ve worked in 17 (seventeen!) projects since then, without even once having had a single female developer colleague. I usually work in smaller projects with like 5-10 people. So one female in 18 projects makes it something like one out of 130 or so. I’m not saying this is a number that is anything to draw any conclusions from since it just me and my guesstimates. It does however hint that the problem is far beyond “just” FOSS. It is a tech problem. Engineering? Software? Embedded software? Software development? I don’t know, but I know it is present both in my professional life as well as in my volunteer open source work.

Geekfeminism says the share is 10-30% in the “tech industry”. My experience says the share gets smaller and smaller the closer to “the metal” and low level programming you get – but I don’t have any explanation for it.

Fixing the problems

What are we (I) doing wrong? Am I at fault? Is the the way I talk or the way we run these projects in some subtle – or obvious – ways not friendly enough or downright hostile to women? What can or should we change in these projects to make us less hostile? The sad reality is that I don’t think we have any such fatal flaws in our projects that create the obstacles. I don’t think many females ever show up near enough the projects to even get mistreated in the first place.

I have a son and I have a daughter – they’re both still young and unaware of this kind of differences and problems. I hope I will be able to motivate and push and raise them equally. I don’t want to live in a world where my daughter will have a hard time to get into tech just because she’s a girl.