Archive for the ‘cURL and libcurl’ Category

HTTP/2 in curl, status update

Monday, May 4th, 2015

http2 logoI’m right now working on adding proper multiplexing to libcurl’s HTTP/2 code. So far we’ve only done a single stream per connection and while that works fine and is HTTP/2, applications will still want more when switching to HTTP/2 as the multiplexing part is one of the key components and selling features of the new protocol version.

Pipelining means multiplexed

As a starting point, I’m using the “enable HTTP pipelining” switch to tell libcurl it should consider multiplexing. It makes libcurl work as before by default. If you use the multi interface and enable pipelining, libcurl will try to re-use established connections and just add streams over them rather than creating new connections. Yes this means that A) you need to use the multi interface to get the full HTTP/2 stuff and B) the curl tool won’t be able to take advantage of it since it doesn’t use the multi interface! (An old outstanding idea is to move the tool to use the multi interface and this would yet another reason why this could be a good idea.)

We still have some decisions to make about how we want libcurl to act by default – especially when we can expect application to use both HTTP/1.1 and HTTP/2 at the same time. Since we don’t know if the server supports HTTP/2 until after a certain point in the negotiation, we need to decide on how to do when we issue N transfers at once to the same server that might speak HTTP/2… Right now, we get the best HTTP/2 behavior by telling libcurl we only want one connection per host but that is probably not ideal for an application that might use a mix of HTTP/1.1 and HTTP/2 servers.

Downsides with abusing pipelining

There are some drawbacks with using that pipelining switch to allow multiplexing since users may very well want HTTP/2 multiplexing but not HTTP/1.1 pipelining since the latter is just riddled with interop problems.

Also, re-using the same options for limited connections to host names etc for both HTTP/1.1 and HTTP/2 may not at all be what real-world applications want or need.

One easy handle, one stream

libcurl API wise, each HTTP/2 stream is its own easy handle. It makes it simple and keeps the API paradigm very much in the same way it works for all the other protocols. It comes very natural for the libcurl application author. If you setup three easy handles, all identifying a resource on the same server and you tell libcurl to use HTTP/2, it makes perfect sense that all these three transfers are made using a single connection.

As multiplexed data means that when reading from the socket, there is data arriving that belongs to other streams than just a single one. So we need to feed the received data into the different “data buckets” for the involved streams. It gives us a little internal challenge: we get easy handles with no socket activity to trigger a read, but there is data to take care of in the incoming buffer. I’ve solved this so far with a special trigger that says that there is data to take care of, that it should make a read anyway that then will get the data from the buffer.

Server push

HTTP/2 supports server push. That’s a stream that gets initiated from the server side without the client specifically asking for it. A resource the server deems likely that the client wants since it asked for a related resource, or similar. My idea is to support server push with the application setting up a transfer with an easy handle and associated options, but the URL would only identify the server so that it knows on which connection it would accept a push, and we will introduce a new option to libcurl that would tell it that this is an easy handle that should be used for the next server pushed stream on this connection.

Of course there are a few outstanding issues with this idea. Possibly we should allow an easy handle to get created when a new stream shows up so that we can better deal with a dynamic number of  new streams being pushed.

It’d be great to hear from users who have ideas on how to use server push in a real-world application and how you’d imagine it could be used with libcurl.

Work in progress code

My work in progress code for this drive can be found in two places.

First, I do the libcurl multiplexing development in the separate http2-multiplex branch in the regular curl repo:

https://github.com/bagder/curl/tree/http2-multiplex.

Then, I put all my test setup and test client work in a separate repository just in case you want to keep up and reproduce my testing and experiments:

https://github.com/bagder/curl-http2-dev

Feedback?

All comments, questions, praise or complaints you may have on this are best sent to the curl-library mailing list. If you are planning on doing a HTTP/2 capable applications or otherwise have thoughts or ideas about the API for this, please join in and tell me what you think. It is much better to get the discussions going early and work on different design ideas now before anything is set in stone rather than waiting for us to ship something semi-stable as the closer to an actual release we get, the harder it’ll be to change the API.

Not quite working yet

As I write this, I’m repeatedly doing 99 parallel HTTP/2 streams with no data corruption… But there’s a lot more to be done before I’ll call it a victory.

talking curl on the changelog

Friday, May 1st, 2015

The changelog is the name of a weekly podcast on which the hosts discuss open source and stuff.

Last Friday I was invited to participate and I joined hosts Adam and Jerod for an hour long episode about curl. It all started as a response to my post on curl 17 years, so we really got into how things started out and how curl has developed through the years, how much time I’ve spent on it and if I could mention a really great moment in time that stood out over the years?

They day before, they released the little separate teaser we made about about the little known –remote-name-all command line option that basically makes curl default to do -O on all given URLs.

The full length episode can be experienced in all its glory here: https://changelog.com/153/

Summing up the birthday festivities

Sunday, March 22nd, 2015

I blogged about curl’s 17th birthday on March 20th 2015. I’ve done similar posts in the past and they normally pass by mostly undetected and hardly discussed. This time, something else happened.

Primarily, the blog post quickly became the single most viewed blog entry I’ve ever written – and I’ve been doing it for many many years. Already in the first day it was up, I counted more than 65,000 views.

The blog post got more comments than on any other blog post I’ve ever done. Right now they have probably stopped but there are 60 of them now, almost everyone one of them saying congratulations and/or thanks.

The posting also got discussed on both hacker news and reddit, totaling in more than 260 comments. Most of those in positive spirit.

The initial tweet I made about my blog post is the most retweeted and stared tweet I’ve ever posted. At least 87 retweets and 49 favorites (it might even grow a bit more over time). Others subsequently also tweeted the link hundreds of times. I got numerous replies and friendly call-outs on twitter saying “congrats” and “thanks” in many variations.

Spontaneously (ie not initiated or requested by me but most probably because of a comment on hacker news), I also suddenly started to get donations from the curl web site’s donation web page (to paypal). Within 24 hours from my post, I had received 35 donations from friendly fans who donated a total sum of  445 USD. A quick count revealed that the total number of donations ever through the history of curl’s lifetime was 43 before this day. In one day we had basically gotten as many as we had gotten the first 17 years.

Interesting data from this donation “race”: I got donations varying from 1 USD (yes one dollar) to 50 USD and the average donation was then 12.7 USD.

Let me end this summary by thanking everyone who in various ways made the curl birthday extra fun by being nice and friendly and some even donating some of their hard earned money. I am honestly touched by the attention and all the warmth and positiveness. Thank you for proving internet comments can be this good!

Video: My curl talk from FOSDEM 2015

Friday, March 13th, 2015

I mentioned the talk before, and now the video has been made available. About 25 minutes with me presenting curl.

Internet all the things

cURL

curl: embracing github more

Tuesday, March 3rd, 2015

Pull requests and issues filed on github are most welcome!

The curl project has been around for a long time by now and we’ve been through several different version control systems. The most recent switch was when we switched to git from CVS back in 2010. We were late switchers but then we’re conservative in several regards.

When we switched to git we also switched to github for the hosting, after having been self-hosted for many years before that. By using github we got a lot of services, goodies and reliable hosting at no cost. We’ve been enjoying that ever since.

cURLHowever, as we have been a traditional mailing list driving project for a long time, I have previously not properly embraced and appreciated pull requests and issues filed at github since they don’t really follow the old model very good.

Just very recently I decided to stop fighting those methods and instead go with them. A quick poll among my fellow team mates showed no strong opposition and we are now instead going full force ahead in a more github embracing style. I hope that this will lower the barrier and remove friction for newcomers and allow more people to contribute easier.

As an effect of this, I would also like to encourage each and everyone who is interested in this project as a user of libcurl or as a contributor to and hacker of libcurl, to skip over to the curl github home and press the ‘watch’ button to get notified when future requests and issues appear.

We also offer this helpful guide on how to contribute to the curl project!

Bug finding is slow in spite of many eyeballs

Monday, February 23rd, 2015

“given enough eyeballs, all bugs are shallow”

The saying (also known as Linus’ law) doesn’t say that the bugs are found fast and neither does it say who finds them. My version of the law would be much more cynical, something like: “eventually, bugs are found“, emphasizing the ‘eventually’ part.

(Jim Zemlin apparently said the other day that it can work the Linus way, if we just fund the eyeballs to watch. I don’t think that’s the way the saying originally intended.)

Because in reality, many many bugs are never really found by all those given “eyeballs” in the first place. They are found when someone trips over a problem and is annoyed enough to go searching for the culprit, the reason for the malfunction. Even if the code is open and has been around for years it doesn’t necessarily mean that any of all the people who casually read the code or single-stepped over it will actually ever discover the flaws in the logic. The last few years several world-shaking bugs turned out to have existed for decades until discovered. In code that had been read by lots of people – over and over.

So sure, in the end the bugs were found and fixed. I would argue though that it wasn’t because the projects or problems were given enough eyeballs. Some of those problems were found in extremely popular and widely used projects. They were found because eventually someone accidentally ran into a problem and started digging for the reason.

Time until discovery in the curl project

I decided to see how it looks in the curl project. A project near and dear to me. To take it up a notch, we’ll look only at security flaws. Not only because they are the probably most important bugs we’ve had but also because those are the ones we have the most carefully noted meta-data for. Like when they were reported, when they were introduced and when they were fixed.

We have no less than 30 logged vulnerabilities for curl and libcurl so far through-out our history, spread out over the past 16 years. I’ve spent some time going through them to see if there’s a pattern or something that sticks out that we should put some extra attention to in order to improve our processes and code. While doing this I gathered some random info about what we’ve found so far.

On average, each security problem had been present in the code for 2100 days when fixed – that’s more than five and a half years. On average! That means they survived about 30 releases each. If bugs truly are shallow, it is still certainly not a fast processes.

Perhaps you think these 30 bugs are really tricky, deeply hidden and complicated logic monsters that would explain the time they took to get found? Nope, I would say that every single one of them are pretty obvious once you spot them and none of them take a very long time for a reviewer to understand.

Vulnerability ages

This first graph (click it for the large version) shows the period each problem remained in the code for the 30 different problems, in number of days. The leftmost bar is the most recent flaw and the bar on the right the oldest vulnerability. The red line shows the trend and the green is the average.

The trend is clearly that the bugs are around longer before they are found, but since the project is also growing older all the time it sort of comes naturally and isn’t necessarily a sign of us getting worse at finding them. The average age of flaws is aging slower than the project itself.

Reports per year

How have the reports been distributed over the years? We have a  fairly linear increase in number of lines of code but yet the reports were submitted like this (now it goes from oldest to the left and most recent on the right – click for the large version):

vuln-trend

Compare that to this chart below over lines of code added in the project (chart from openhub and shows blanks in green, comments in grey and code in blue, click it for the large version):

curl source code growth

We received twice as many security reports in 2014 as in 2013 and we got half of all our reports during the last two years. Clearly we have gotten more eyes on the code or perhaps users pay more attention to problems or are generally more likely to see the security angle of problems? It is hard to say but clearly the frequency of security reports has increased a lot lately. (Note that I here count the report year, not the year we announced the particular problems, as they sometimes were done on the following year if the report happened late in the year.)

On average, we publish information about a found flaw 19 days after it was reported to us. We seem to have became slightly worse at this over time, the last two years the average has been 25 days.

Did people find the problems by reading code?

In general, no. Sure people read code but the typical pattern seems to be that people run into some sort of problem first, then dive in to investigate the root of it and then eventually they spot or learn about the security problem.

(This conclusion is based on my understanding from how people have reported the problems, I have not explicitly asked them about these details.)

Common patterns among the problems?

I went over the bugs and marked them with a bunch of descriptive keywords for each flaw, and then I wrote up a script to see how the frequent the keywords are used. This turned out to describe the flaws more than how they ended up in the code. Out of the 30 flaws, the 10 most used keywords ended up like this, showing number of flaws and the keyword:

9 TLS
9 HTTP
8 cert-check
8 buffer-overflow

6 info-leak
3 URL-parsing
3 openssl
3 NTLM
3 http-headers
3 cookie

I don’t think it is surprising that TLS, HTTP or certificate checking are common areas of security problems. TLS and certs are complicated, HTTP is huge and not easy to get right. curl is mostly C so buffer overflows is a mistake that sneaks in, and I don’t think 27% of the problems tells us that this is a problem we need to handle better. Also, only 2 of the last 15 flaws (13%) were buffer overflows.

The discussion following this blog post is on hacker news.

Internet all the things

Sunday, February 1st, 2015

I talked in the Embedded devroom at FOSDEM 2015 and the talked was named “Internet all the things – curl everywhere“. Here are my slides from this talk. It was recorded on video and I will post a suitable link to that once it becomes available.

Foto from the event, taken by Olle E Johansson:

Daniel talks curl at FOSDEM

My talks at FOSDEM 2015

Wednesday, January 14th, 2015

fosdem

Sunday 13:00, embedded room (Lameere)

Tile: Internet all the things – using curl in your device

Embedded devices are very often network connected these days. Network connected embedded devices often need to transfer data to and from them as clients, using one or more of the popular internet protocols.

libcurl is the world’s most used and most popular internet transfer library, already used in every imaginable sort of embedded device out there. How did this happen and how do you use libcurl to transfer data to or from your device?

Note that this talk was originally scheduled to be at a different time!

Sunday, 09:00 Mozilla room (UD2.218A)

Title: HTTP/2 right now

HTTP/2 is the new version of the web’s most important and used protocol. Version 2 is due to be out very soon after FOSDEM and I want to inform the audience about what’s going on with the protocol, why it matters to most web developers and users and not the last what its status is at the time of FOSDEM.

curl 7.40.0: unix domain sockets and smb

Thursday, January 8th, 2015

curl and libcurl curl dot-to-dot7.40.0 was just released this morning. There’s a closer look at some of the perhaps more noteworthy changes. As usual, you can find the entire changelog on the curl web site.

HTTP over unix domain sockets

So just before the feature window closed for the pending 7.40.0 release of curl, Peter Wu’s patch series was merged that brings the ability to curl and libcurl to do HTTP over unix domain sockets. This is a feature that’s been mentioned many times through the history of curl but never previously truly implemented. Peter also very nicely adjusted the test server and made two test cases that verify the functionality.

To use this with the curl command line, you specify the socket path to the new –unix-domain option and assuming your local HTTP server listens on that socket, you’ll get the response back just as with an ordinary TCP connection.

Doing the operation from libcurl means using the new CURLOPT_UNIX_SOCKET_PATH option.

This feature is actually not limited to HTTP, you can do all the TCP-based protocols except FTP over the unix domain socket, but it is to my knowledge only HTTP that is regularly used this way. The reason FTP isn’t supported is of course its use of two connections which would be even weirder to do like this.

SMB

SMB is also known as CIFS and is an old network protocol from the Microsoft world access files. curl and libcurl now support this protocol with SMB:// URLs thanks to work by Bill Nagel and Steve Holme.

Security Advisories

Last year we had a large amount of security advisories published (eight to be precise), and this year we start out with two fresh ones already on the 8th day… The ones this time were of course discovered and researched already last year.

CVE-2014-8151 is a way we accidentally allowed an application to bypass the TLS server certificate check if a TLS Session-ID was already cached for a non-checked session – when using the Mac OS SecureTransport SSL backend.

CVE-2014-8150 is a URL request injection. When letting curl or libcurl speak over a HTTP proxy, it would copy the URL verbatim into the HTTP request going to the proxy, which means that if you craft the URL and insert CRLFs (carriage returns and linefeed characters) you can insert your own second request or even custom headers into the request that goes to the proxy.

You may enjoy taking a look at the curl vulnerabilities table.

Bugs bugs bugs

The release notes mention no less than 120 specific bug fixes, which in comparison to other releases is more than average.

Enjoy!

Can curl avoid to be in a future funnily named exploit that shakes the world?

Monday, December 15th, 2014

During this year we’ve seen heartbleed and shellshock strike (and a  few more big flaws that I’ll skip for now). Two really eye opening recent vulnerabilities in projects with many similarities:

  1. Popular corner stones of open source stacks and internet servers
  2. Mostly run and maintained by volunteers
  3. Mature projects that have been around since “forever”
  4. Projects believed to be fairly stable and relatively trustworthy by now
  5. A myriad of features, switches and code that build on many platforms, with some parts of code only running on a rare few
  6. Written in C in a portable style

Does it sound like the curl project to you too? It does to me. Sure, this description also matches a slew of other projects but I lead the curl development so let me stay here and focus on this project.

cURLAre we in jeopardy? I honestly don’t know, but I want to explain what we do in our project in order to minimize the risk and maximize our ability to find problems on our own before they become serious attack vectors somewhere!

previous flaws

There’s no secret that we have let security problems slip through at times. We’re right now working toward our 143rd release during our around 16 years of life-time. We have found and announced 28 security problems over the years. Looking at these found problems, it is clear that very few security problems are discovered quickly after introduction. Most of them linger around for several years until found and fixed. So, realistically speaking based on history: there are security bugs still in the code, and they have probably been present for a while already.

code reviews and code standards

We try to review all patches from people without push rights in the project. It would probably be a good idea to review all patches before they go in for real, but that just wouldn’t work with the (lack of) man power we have in the project while we at the same time want to develop curl, move it forward and introduce new things and features.

We maintain code standards and formatting to keep code easy to understand and follow. We keep individual commits smallish for easier review now or in the future.

test cases

As simple as it is, we test that the basic stuff works. We don’t and can’t test everything but having test cases for most things give us the confidence to change code when we see problems as we then remain fairly sure things keep working the same way as long as the test go through. In projects with much less test coverage, you become much more conservative with what you dare to change and that also makes you more vulnerable.

We always want more test cases and we want to improve on how we always add test cases when we add new features and ideally we should also add new test cases when we fix bugs so that we know that we don’t introduce any such bug again in the future.

static code analyzes

We regularly scan our code base using static code analyzers. Both clang-analyzer and coverity are good tools, and they help us by pointing out code that look wrong or suspicious. By making sure we have very few or no such flaws left in the code, we minimize the risk. A static code analyzer is better than run-time tools for cases where they can check code flows that are hard to repeat in my local environment.

valgrind

bike helmetValgrind is an awesome tool to detect memory problems in run-time. Leaks or just stupid uses of memory or related functions. We have our test suite automatically use valgrind when it runs tests in case it is present and it helps us make sure that all situations we test for are also error-free from valgrind’s point of view.

autobuilds

Building and testing curl on a plethora of platforms non-stop is also useful to make sure we don’t depend on behaviors of particular library implementations or non-standard features and more. Testing it all is basically the only way to make sure everything keeps working over the years while we continue to develop and fix bugs. We would course be even better off with more platforms that would test automatically and with more developers keeping an eye on problems that show up there…

code complexity

Arguably, one of the best ways to avoid security flaws and bugs in general, is to keep the source code as simple as possible. Complex functions need to be broken down into smaller functions that are possible to read and understand. A good way to identify functions suitable for fixing is pmccabe,

essential third parties

curl and libcurl are usually built to use a whole bunch of third party libraries in order to perform all the functionality. In order to not have any of those uses turn into a source for trouble we must of course also participate in those projects and help them stay strong and make sure that we use them the proper way that doesn’t lead to any bad side-effects.

You can help!

All this takes time, energy and system resources. Your contributions and help will be appreciated where ever among these tasks that you can insert any. We could do more of all this, more often and more thorough if we only were more people involved!