Archive for the ‘cURL and libcurl’ Category

My talks at FOSDEM 2015

Wednesday, January 14th, 2015

fosdem

Sunday 13:00, embedded room (Lameere)

Tile: Internet all the things – using curl in your device

Embedded devices are very often network connected these days. Network connected embedded devices often need to transfer data to and from them as clients, using one or more of the popular internet protocols.

libcurl is the world’s most used and most popular internet transfer library, already used in every imaginable sort of embedded device out there. How did this happen and how do you use libcurl to transfer data to or from your device?

Note that this talk was originally scheduled to be at a different time!

Sunday, 09:00 Mozilla room (UD2.218A)

Title: HTTP/2 right now

HTTP/2 is the new version of the web’s most important and used protocol. Version 2 is due to be out very soon after FOSDEM and I want to inform the audience about what’s going on with the protocol, why it matters to most web developers and users and not the last what its status is at the time of FOSDEM.

curl 7.40.0: unix domain sockets and smb

Thursday, January 8th, 2015

curl and libcurl curl dot-to-dot7.40.0 was just released this morning. There’s a closer look at some of the perhaps more noteworthy changes. As usual, you can find the entire changelog on the curl web site.

HTTP over unix domain sockets

So just before the feature window closed for the pending 7.40.0 release of curl, Peter Wu’s patch series was merged that brings the ability to curl and libcurl to do HTTP over unix domain sockets. This is a feature that’s been mentioned many times through the history of curl but never previously truly implemented. Peter also very nicely adjusted the test server and made two test cases that verify the functionality.

To use this with the curl command line, you specify the socket path to the new –unix-domain option and assuming your local HTTP server listens on that socket, you’ll get the response back just as with an ordinary TCP connection.

Doing the operation from libcurl means using the new CURLOPT_UNIX_SOCKET_PATH option.

This feature is actually not limited to HTTP, you can do all the TCP-based protocols except FTP over the unix domain socket, but it is to my knowledge only HTTP that is regularly used this way. The reason FTP isn’t supported is of course its use of two connections which would be even weirder to do like this.

SMB

SMB is also known as CIFS and is an old network protocol from the Microsoft world access files. curl and libcurl now support this protocol with SMB:// URLs thanks to work by Bill Nagel and Steve Holme.

Security Advisories

Last year we had a large amount of security advisories published (eight to be precise), and this year we start out with two fresh ones already on the 8th day… The ones this time were of course discovered and researched already last year.

CVE-2014-8151 is a way we accidentally allowed an application to bypass the TLS server certificate check if a TLS Session-ID was already cached for a non-checked session – when using the Mac OS SecureTransport SSL backend.

CVE-2014-8150 is a URL request injection. When letting curl or libcurl speak over a HTTP proxy, it would copy the URL verbatim into the HTTP request going to the proxy, which means that if you craft the URL and insert CRLFs (carriage returns and linefeed characters) you can insert your own second request or even custom headers into the request that goes to the proxy.

You may enjoy taking a look at the curl vulnerabilities table.

Bugs bugs bugs

The release notes mention no less than 120 specific bug fixes, which in comparison to other releases is more than average.

Enjoy!

Can curl avoid to be in a future funnily named exploit that shakes the world?

Monday, December 15th, 2014

During this year we’ve seen heartbleed and shellshock strike (and a  few more big flaws that I’ll skip for now). Two really eye opening recent vulnerabilities in projects with many similarities:

  1. Popular corner stones of open source stacks and internet servers
  2. Mostly run and maintained by volunteers
  3. Mature projects that have been around since “forever”
  4. Projects believed to be fairly stable and relatively trustworthy by now
  5. A myriad of features, switches and code that build on many platforms, with some parts of code only running on a rare few
  6. Written in C in a portable style

Does it sound like the curl project to you too? It does to me. Sure, this description also matches a slew of other projects but I lead the curl development so let me stay here and focus on this project.

cURLAre we in jeopardy? I honestly don’t know, but I want to explain what we do in our project in order to minimize the risk and maximize our ability to find problems on our own before they become serious attack vectors somewhere!

previous flaws

There’s no secret that we have let security problems slip through at times. We’re right now working toward our 143rd release during our around 16 years of life-time. We have found and announced 28 security problems over the years. Looking at these found problems, it is clear that very few security problems are discovered quickly after introduction. Most of them linger around for several years until found and fixed. So, realistically speaking based on history: there are security bugs still in the code, and they have probably been present for a while already.

code reviews and code standards

We try to review all patches from people without push rights in the project. It would probably be a good idea to review all patches before they go in for real, but that just wouldn’t work with the (lack of) man power we have in the project while we at the same time want to develop curl, move it forward and introduce new things and features.

We maintain code standards and formatting to keep code easy to understand and follow. We keep individual commits smallish for easier review now or in the future.

test cases

As simple as it is, we test that the basic stuff works. We don’t and can’t test everything but having test cases for most things give us the confidence to change code when we see problems as we then remain fairly sure things keep working the same way as long as the test go through. In projects with much less test coverage, you become much more conservative with what you dare to change and that also makes you more vulnerable.

We always want more test cases and we want to improve on how we always add test cases when we add new features and ideally we should also add new test cases when we fix bugs so that we know that we don’t introduce any such bug again in the future.

static code analyzes

We regularly scan our code base using static code analyzers. Both clang-analyzer and coverity are good tools, and they help us by pointing out code that look wrong or suspicious. By making sure we have very few or no such flaws left in the code, we minimize the risk. A static code analyzer is better than run-time tools for cases where they can check code flows that are hard to repeat in my local environment.

valgrind

bike helmetValgrind is an awesome tool to detect memory problems in run-time. Leaks or just stupid uses of memory or related functions. We have our test suite automatically use valgrind when it runs tests in case it is present and it helps us make sure that all situations we test for are also error-free from valgrind’s point of view.

autobuilds

Building and testing curl on a plethora of platforms non-stop is also useful to make sure we don’t depend on behaviors of particular library implementations or non-standard features and more. Testing it all is basically the only way to make sure everything keeps working over the years while we continue to develop and fix bugs. We would course be even better off with more platforms that would test automatically and with more developers keeping an eye on problems that show up there…

code complexity

Arguably, one of the best ways to avoid security flaws and bugs in general, is to keep the source code as simple as possible. Complex functions need to be broken down into smaller functions that are possible to read and understand. A good way to identify functions suitable for fixing is pmccabe,

essential third parties

curl and libcurl are usually built to use a whole bunch of third party libraries in order to perform all the functionality. In order to not have any of those uses turn into a source for trouble we must of course also participate in those projects and help them stay strong and make sure that we use them the proper way that doesn’t lead to any bad side-effects.

You can help!

All this takes time, energy and system resources. Your contributions and help will be appreciated where ever among these tasks that you can insert any. We could do more of all this, more often and more thorough if we only were more people involved!

libcurl multi_socket 3333 days later

Wednesday, December 10th, 2014

.SE-logoOn October 25, 2005 I sent out the announcement about “libcurl funding from the Swedish IIS Foundation“. It was the beginning of what would eventually become the curl_multi_socket_action() function and its related API features. The API we provide for event-driven applications. This API is the most suitable one in libcurl if you intend to scale up your client up to and beyond hundreds or thousands of simultaneous transfers.

Thanks to this funding from IIS, I could spend a couple of months working full-time on implementing the ideas I had. They paid me the equivalent of 19,000 USD back then. IIS is the non-profit foundation that runs the .se TLD and they fund projects that help internet and internet usage, in particular in Sweden. IIS usually just call themselves “.se” (dot ess ee) these days.

Event-based programming isn’t generally the easiest approach so most people don’t easily take this route without careful consideration, and also if you want your event-based application to be portable among multiple platforms you also need to use an event-based library that abstracts the underlying function calls. These are all reasons why this remains a niche API in libcurl, used only by a small portion of users. Still, there are users and they seem to be able to use this API fine. A success in my eyes.

One dollar billPart of that improvement project to make libcurl scale and perform better, was also to introduce HTTP pipelining support. I didn’t quite manage that part with in the scope of that project but the pipelining support in libcurl was born in that period  (autumn 2006) but had to be improved several times over the years until it became decently good just a few years ago – and we’re just now (still) fixing more pipelining problems.

On December 10, 2014 there are exactly 3333 days since that initial announcement of mine. I’d like to highlight this occasion by thanking IIS again. Thanks IIS!

Current funding

These days I’m spending a part of my daytime job working on curl with my employer’s blessing and that’s the funding I have – most of my personal time spent is still spare time. I certainly wouldn’t mind seeing others help out, but the best funding is provided as pure man power that can help out and not by trying to buy my time to add your features. Also, I will decline all (friendly) offers to host the web site on your servers since we already have a fairly stable and reliable infrastructure sponsored.

I’m not aware of anyone else that are spending (much) paid work time on curl code, although I’m know there are quite a few who do it every now and then – especially to fix problems that occur in commercial products or services or to add features to such.

IIS still donates money to internet related projects in Sweden but I never applied for any funding from them again. Mostly because it has been hard to sync with my normal life and job situation. If you’re a Swede or just live in Sweden, do consider checking this out for your next internet adventure!

Why curl defaults to stdout

Monday, November 17th, 2014

(Recap: I founded the curl project, I am still the lead developer and maintainer)

When asking curl to get a URL it’ll send the output to stdout by default. You can of course easily change this behavior with options or just using your shell’s redirect feature, but without any option it’ll spew it out to stdout. If you’re invoking the command line on a shell prompt you’ll immediately get to see the response as soon as it arrives.

I decided curl should work like this, and it was a natural decision I made already when I worked on the predecessors during 1997 or so that later would turn into curl.

On Unix systems there’s a common mantra that “everything is a file” but also in fact that “everything is a pipe”. You accomplish things on Unix by piping the output of one program into the input of another program. Of course I wanted curl to work as good as the other components and I wanted it to blend in with the rest. I wanted curl to feel like cat but for a network resource. And cat is certainly not the only pre-curl command that writes to stdout by default; they are plentiful.

And then: once I had made that decision and I released curl for the first time on March 20, 1998: the call was made. The default was set. I will not change a default and hurt millions of users. I rather continue to be questioned by newcomers, but now at least I can point to this blog post! :-)

About the wget rivalry

cURLAs I mention in my curl vs wget document, a very common comment to me about curl as compared to wget is that wget is “easier to use” because it needs no extra argument in order to download a single URL to a file on disk. I get that, if you type the full commands by hand you’ll use about three keys less to write “wget” instead of “curl -O”, but on the other hand if this is an operation you do often and you care so much about saving key presses I would suggest you make an alias anyway that is even shorter and then the amount of options for the command really doesn’t matter at all anymore.

I put that argument in the same category as the people who argue that wget is easier to use because you can type it with your left hand only on a qwerty keyboard. Sure, that is indeed true but I read it more like someone trying to come up with a reason when in reality there’s actually another one underneath. Sometimes that other reason is a philosophical one about preferring GNU software (which curl isn’t) or one that is licensed under the GPL (which wget is) or simply that wget is what they’re used to and they know its options and recognize or like its progress meter better.

I enjoy our friendly competition with wget and I seriously and honestly think it has made both our projects better and I like that users can throw arguments in our face like “but X can do Y”and X can alter between curl and wget depending on which camp you talk to. I also really like wget as a tool and I am the occasional user of it, just like most Unix users. I contribute to the wget project well, both with code and with general feedback. I consider myself a friend of the current wget maintainer as well as former ones.

daniel.haxx.se episode 8

Monday, October 27th, 2014

Today I hesitated to make my new weekly video episode. I looked at the viewers number and how they basically have dwindled the last few weeks. I’m not making this video series interesting enough for a very large crowd of people. I’m re-evaluating if I should do them at all, or if I can do something to spice them up…

… or perhaps just not look at the viewers numbers at all and just do what think is fun?

I decided I’ll go with the latter for now. After all, I enjoy making these and they usually give me some interesting feedback and discussions even if the numbers are really low. What good is a number anyway?

This week’s episode:

Personal

Firefox

Fun

HTTP/2

TALKS

  • I’m offering two talks for FOSDEM

curl

  • release next Wednesday
  • bug fixing period
  • security advisory is pending

wget

Pretending port zero is a normal one

Saturday, October 25th, 2014

Speaking the TCP protocol, we communicate between “ports” in the local and remote ends. Each of these port fields are 16 bits in the protocol header so they can hold values between 0 – 65535. (IPv4 or IPv6 are the same here.) We usually do HTTP on port 80 and we do HTTPS on port 443 and so on. We can even play around and use them on various other custom ports when we feel like it.

But what about port 0 (zero) ? Sure, IANA lists the port as “reserved” for TCP and UDP but that’s just a rule in a list of ports, not actually a filter implemented by anyone.

In the actual TCP protocol port 0 is nothing special but just another number. Several people have told me “it is not supposed to be used” or that it is otherwise somehow considered bad to use this port over the internet. I don’t really know where this notion comes from more than that IANA listing.

Frank Gevaerts helped me perform some experiments with TCP port zero on Linux.

In the Berkeley sockets API widely used for doing TCP communications, port zero has a bit of a harder situation. Most of the functions and structs treat zero as just another number so there’s virtually no problem as a client to connect to this port using for example curl. See below for a printout from a test shot.

Running a TCP server on port 0 however, is tricky since the bind() function uses a zero in the port number to mean “pick a random one” (I can only assume this was a mistake done eons ago that can’t be changed). For this test, a little iptables trickery was run so that incoming traffic on TCP port 0 would be redirected to port 80 on the server machine, so that we didn’t have to patch any server code.

Entering a URL with port number zero to Firefox gets this message displayed:

This address uses a network port which is normally used for purposes other than Web browsing. Firefox has canceled the request for your protection.

… but Chrome accepts it and tries to use it as given.

The only little nit that remains when using curl against port 0 is that it seems glibc’s getpeername() assumes this is an illegal port number and refuses to work. I marked that line in curl’s output in red below just to highlight it for you. The actual source code with this check is here. This failure is not lethal for libcurl, it will just have slightly less info but will still continue to work. I claim this is a glibc bug.

$ curl -v http://10.0.0.1:0 -H "Host: 10.0.0.1"
* Rebuilt URL to: http://10.0.0.1:0/
* Hostname was NOT found in DNS cache
* Trying 10.0.0.1...
* getpeername() failed with errno 107: Transport endpoint is not connected
* Connected to 10.0.0.1 () port 0 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.38.1-DEV
> Accept: */*
> Host: 10.0.0.1
>
< HTTP/1.1 200 OK
< Date: Fri, 24 Oct 2014 09:08:02 GMT
< Server: Apache/2.4.10 (Debian)
< Last-Modified: Fri, 24 Oct 2014 08:48:34 GMT
< Content-Length: 22
< Content-Type: text/html

<html>testpage</html>

Why doing this experiment? Just for fun to to see if it worked.

(Discussion and comments on this post is also found at Reddit.)

curl is no POODLE

Friday, October 17th, 2014

Once again the internet flooded over with reports and alerts about a vulnerability using a funny name: POODLE. If you have even the slightest interest in this sort of stuff you’ve already grown tired and bored about everything that’s been written about this so why on earth do I have to pile on and add to the pain?

This is my way of explaining how POODLE affects or doesn’t affect curl, libcurl and the huge amount of existing applications using libcurl.

Is my application using HTTPS with libcurl or curl vulnerable to POODLE?

No. POODLE really is a browser-attack.

Motivation

The POODLE attack is a combination of several separate pieces that when combined allow attackers to exploit it. The individual pieces are not enough stand-alone.

SSLv3 is getting a lot of heat now since POODLE must be able to downgrade a connection to SSLv3 from TLS to work. Downgrade in a fairly crude way – in libcurl, only libcurl built to use NSS as its TLS backend supports this way of downgrading the protocol level.

Then, if an attacker manages to downgrade to SSLv3 (both the client and server must thus allow this) and get to use the sensitive block cipher of that protocol, it must maintain a connection to the server and then retry many similar requests to the server in order to try to work out details of the request – to figure out secrets it shouldn’t be able to. This would typically be made using javascript in a browser and really only HTTPS allows this so no other SSL-using protocol can be exploited like this.

For the typical curl user or a libcurl user, there’s A) no javascript and B) the application already knows the request it is doing and normally doesn’t inject random stuff from 3rd party sources that could be allowed to steal secrets. There’s really no room for any outsider here to steal secrets or cookies or whatever.

How will curl change

There’s no immediate need to do anything as curl and libcurl are not vulnerable to POODLE.

Still, SSLv3 is long overdue and is not really a modern protocol (TLS 1.0, the successor, had its RFC published 1999) so in order to really avoid the risk that it will be possible exploit this protocol one way or another now or later using curl/libcurl, we will disable SSLv3 by default in the next curl release. For all TLS backends.

Why? Just to be extra super cautious and because this attack helped us remember that SSLv3 is old and should be let down to die.

If possible, explicitly requesting SSLv3 should still be possible so that users can still work with their legacy systems in dire need of upgrade but placed in corners of the world that every sensible human has since long forgotten or just ignored.

In-depth explanations of POODLE

I especially like the ones provided by PolarSSL and GnuTLS, possibly due to their clear “distance” from browsers.

FOSS them students

Thursday, October 16th, 2014

On October 16th, I visited DSV at Stockholm University where I had the pleasure of holding a talk and discussion with students (and a few teachers) under the topic Contribute to Open Source. Around 30 persons attended.

Here are the slides I use, as usual possibly not perfectly telling stand-alone without the talk but there was no recording made and I talked in Swedish anyway…

Contribute to Open Source from Daniel Stenberg

internal timers and timeouts of libcurl

Friday, October 10th, 2014

wall clockBear with me. It is time to take a deep dive into the libcurl internals and see how it handles timeouts and timers. This is meant as useful information to libcurl users but even more as insights for people who’d like to fiddle with libcurl internals and work on its source code and architecture.

socket activity or timeout

Everything internally in libcurl is using the multi, asynchronous, interface. We avoid blocking calls as far as we can. This means that libcurl always either waits for activity on a socket/file descriptor or for the time to come to do something. If there’s no socket activity and no timeout, there’s nothing to do and it just returns back out.

It is important to remember here that the API for libcurl doesn’t force the user to call it again within or at the specific time and it also allows users to call it again “too soon” if they like. Some users will even busy-loop like crazy and keep hammering the API like a machine-gun and we must deal with that. So, the timeouts are mostly to be considered advisory.

many timeouts

A single transfer can have multiple timeouts. For example one maximum time for the entire transfer, one for the connection phase and perhaps even more timers that handle for example speed caps (that makes libcurl not transfer data faster than a set limit) or detecting transfers speeds below a certain threshold within a given time period.

A single transfer is done with a single easy handle, which holds a list of all its timeouts in a sorted list. It allows libcurl to return a single time left until the nearest timeout expires without having to bother with the remainder of the timeouts (yet).

Curl_expire()

… is the internal function to set a timeout to expire a certain number of milliseconds into the future. It adds a timeout entry to the list of timeouts. Expiring a timeout just means that it’ll signal the application to call libcurl again. Internally we don’t have any identifiers to the timeouts, they’re just a time in the future we ask to be called again at. If the code needs that specific time to really have passed before doing something, the code needs to make sure the time has elapsed.

Curl_expire_latest()

A newcomer in the timeout team. I figured out we need this function since if we are in a state where we need to be called no later than a certain specific future time this is useful. It will not add a new timeout entry in the timeout list in case there’s a timeout that expires earlier than the specified time limit.

This function is useful for example when there’s a state in libcurl that varies over time but has no specific time limit to check for. Like transfer speed limits and the like. If Curl_expire() is used in this situation instead of Curl_expire_latest() it would mean adding a new timeout entry every time, and for the busy-loop API usage cases it could mean adding an excessive amount of timeout entries. (And there was a scary bug reported that got “tens of thousands of entries” which motivated this function to get added.)

timeout removals

We don’t remove timeouts from the list until they expire. Like for example if we have a condition that is timing dependent, then we set a timeout with Curl_expire() and we know we should be called again at the end of that time.

If we wouldn’t add the timeout and there’s no socket activity on the socket then we may not be called again – ever.

When an internal state transition into something else and we therefore don’t need a previously set timeout anymore, we have no handle or identifier to the timeout so it cannot be removed. It will instead lead to us getting called again when the timeout triggers even though we didn’t really need it any longer. As we’re having an API that allows this anyway, this is already handled by the logic and getting called an extra time is usually very cheap and is not considered a problem worth addressing.

Timeouts are removed automatically from the list of timers when they expire. Timeouts that are in passed time are removed from the list and the timers following will then get moved to the front of the queue and be used to calculate how long the single timeout should be next.

The only internal API to remove timeouts that we have removes all timeouts, used when cleaning up a handle.

many easy handles

I’ve mentioned how each easy handle treats their timeouts above. With the multi interface, we can have any amount of easy handles added to a single multi handle. This means one list of timeouts for each easy handle.

To handle many thousands of easy handles added to the same multi handle, all with their own timeout (as each easy handle only show their closest timeout), it builds a splay tree of easy handles sorted on the timeout time. It is a splay tree rather than a sorted list to allow really fast insertions and removals.

As soon as a timeout expires from one of the easy handles and it moves to the next timeout in its list, it means removing one node (easy handle) from the splay tree and inserting it again with the new timeout timer.