Archive for the ‘cURL and libcurl’ Category

Can curl avoid to be in a future funnily named exploit that shakes the world?

Monday, December 15th, 2014

During this year we’ve seen heartbleed and shellshock strike (and a  few more big flaws that I’ll skip for now). Two really eye opening recent vulnerabilities in projects with many similarities:

  1. Popular corner stones of open source stacks and internet servers
  2. Mostly run and maintained by volunteers
  3. Mature projects that have been around since “forever”
  4. Projects believed to be fairly stable and relatively trustworthy by now
  5. A myriad of features, switches and code that build on many platforms, with some parts of code only running on a rare few
  6. Written in C in a portable style

Does it sound like the curl project to you too? It does to me. Sure, this description also matches a slew of other projects but I lead the curl development so let me stay here and focus on this project.

cURLAre we in jeopardy? I honestly don’t know, but I want to explain what we do in our project in order to minimize the risk and maximize our ability to find problems on our own before they become serious attack vectors somewhere!

previous flaws

There’s no secret that we have let security problems slip through at times. We’re right now working toward our 143rd release during our around 16 years of life-time. We have found and announced 28 security problems over the years. Looking at these found problems, it is clear that very few security problems are discovered quickly after introduction. Most of them linger around for several years until found and fixed. So, realistically speaking based on history: there are security bugs still in the code, and they have probably been present for a while already.

code reviews and code standards

We try to review all patches from people without push rights in the project. It would probably be a good idea to review all patches before they go in for real, but that just wouldn’t work with the (lack of) man power we have in the project while we at the same time want to develop curl, move it forward and introduce new things and features.

We maintain code standards and formatting to keep code easy to understand and follow. We keep individual commits smallish for easier review now or in the future.

test cases

As simple as it is, we test that the basic stuff works. We don’t and can’t test everything but having test cases for most things give us the confidence to change code when we see problems as we then remain fairly sure things keep working the same way as long as the test go through. In projects with much less test coverage, you become much more conservative with what you dare to change and that also makes you more vulnerable.

We always want more test cases and we want to improve on how we always add test cases when we add new features and ideally we should also add new test cases when we fix bugs so that we know that we don’t introduce any such bug again in the future.

static code analyzes

We regularly scan our code base using static code analyzers. Both clang-analyzer and coverity are good tools, and they help us by pointing out code that look wrong or suspicious. By making sure we have very few or no such flaws left in the code, we minimize the risk. A static code analyzer is better than run-time tools for cases where they can check code flows that are hard to repeat in my local environment.

valgrind

bike helmetValgrind is an awesome tool to detect memory problems in run-time. Leaks or just stupid uses of memory or related functions. We have our test suite automatically use valgrind when it runs tests in case it is present and it helps us make sure that all situations we test for are also error-free from valgrind’s point of view.

autobuilds

Building and testing curl on a plethora of platforms non-stop is also useful to make sure we don’t depend on behaviors of particular library implementations or non-standard features and more. Testing it all is basically the only way to make sure everything keeps working over the years while we continue to develop and fix bugs. We would course be even better off with more platforms that would test automatically and with more developers keeping an eye on problems that show up there…

code complexity

Arguably, one of the best ways to avoid security flaws and bugs in general, is to keep the source code as simple as possible. Complex functions need to be broken down into smaller functions that are possible to read and understand. A good way to identify functions suitable for fixing is pmccabe,

essential third parties

curl and libcurl are usually built to use a whole bunch of third party libraries in order to perform all the functionality. In order to not have any of those uses turn into a source for trouble we must of course also participate in those projects and help them stay strong and make sure that we use them the proper way that doesn’t lead to any bad side-effects.

You can help!

All this takes time, energy and system resources. Your contributions and help will be appreciated where ever among these tasks that you can insert any. We could do more of all this, more often and more thorough if we only were more people involved!

libcurl multi_socket 3333 days later

Wednesday, December 10th, 2014

.SE-logoOn October 25, 2005 I sent out the announcement about “libcurl funding from the Swedish IIS Foundation“. It was the beginning of what would eventually become the curl_multi_socket_action() function and its related API features. The API we provide for event-driven applications. This API is the most suitable one in libcurl if you intend to scale up your client up to and beyond hundreds or thousands of simultaneous transfers.

Thanks to this funding from IIS, I could spend a couple of months working full-time on implementing the ideas I had. They paid me the equivalent of 19,000 USD back then. IIS is the non-profit foundation that runs the .se TLD and they fund projects that help internet and internet usage, in particular in Sweden. IIS usually just call themselves “.se” (dot ess ee) these days.

Event-based programming isn’t generally the easiest approach so most people don’t easily take this route without careful consideration, and also if you want your event-based application to be portable among multiple platforms you also need to use an event-based library that abstracts the underlying function calls. These are all reasons why this remains a niche API in libcurl, used only by a small portion of users. Still, there are users and they seem to be able to use this API fine. A success in my eyes.

One dollar billPart of that improvement project to make libcurl scale and perform better, was also to introduce HTTP pipelining support. I didn’t quite manage that part with in the scope of that project but the pipelining support in libcurl was born in that period  (autumn 2006) but had to be improved several times over the years until it became decently good just a few years ago – and we’re just now (still) fixing more pipelining problems.

On December 10, 2014 there are exactly 3333 days since that initial announcement of mine. I’d like to highlight this occasion by thanking IIS again. Thanks IIS!

Current funding

These days I’m spending a part of my daytime job working on curl with my employer’s blessing and that’s the funding I have – most of my personal time spent is still spare time. I certainly wouldn’t mind seeing others help out, but the best funding is provided as pure man power that can help out and not by trying to buy my time to add your features. Also, I will decline all (friendly) offers to host the web site on your servers since we already have a fairly stable and reliable infrastructure sponsored.

I’m not aware of anyone else that are spending (much) paid work time on curl code, although I’m know there are quite a few who do it every now and then – especially to fix problems that occur in commercial products or services or to add features to such.

IIS still donates money to internet related projects in Sweden but I never applied for any funding from them again. Mostly because it has been hard to sync with my normal life and job situation. If you’re a Swede or just live in Sweden, do consider checking this out for your next internet adventure!

Why curl defaults to stdout

Monday, November 17th, 2014

(Recap: I founded the curl project, I am still the lead developer and maintainer)

When asking curl to get a URL it’ll send the output to stdout by default. You can of course easily change this behavior with options or just using your shell’s redirect feature, but without any option it’ll spew it out to stdout. If you’re invoking the command line on a shell prompt you’ll immediately get to see the response as soon as it arrives.

I decided curl should work like this, and it was a natural decision I made already when I worked on the predecessors during 1997 or so that later would turn into curl.

On Unix systems there’s a common mantra that “everything is a file” but also in fact that “everything is a pipe”. You accomplish things on Unix by piping the output of one program into the input of another program. Of course I wanted curl to work as good as the other components and I wanted it to blend in with the rest. I wanted curl to feel like cat but for a network resource. And cat is certainly not the only pre-curl command that writes to stdout by default; they are plentiful.

And then: once I had made that decision and I released curl for the first time on March 20, 1998: the call was made. The default was set. I will not change a default and hurt millions of users. I rather continue to be questioned by newcomers, but now at least I can point to this blog post! :-)

About the wget rivalry

cURLAs I mention in my curl vs wget document, a very common comment to me about curl as compared to wget is that wget is “easier to use” because it needs no extra argument in order to download a single URL to a file on disk. I get that, if you type the full commands by hand you’ll use about three keys less to write “wget” instead of “curl -O”, but on the other hand if this is an operation you do often and you care so much about saving key presses I would suggest you make an alias anyway that is even shorter and then the amount of options for the command really doesn’t matter at all anymore.

I put that argument in the same category as the people who argue that wget is easier to use because you can type it with your left hand only on a qwerty keyboard. Sure, that is indeed true but I read it more like someone trying to come up with a reason when in reality there’s actually another one underneath. Sometimes that other reason is a philosophical one about preferring GNU software (which curl isn’t) or one that is licensed under the GPL (which wget is) or simply that wget is what they’re used to and they know its options and recognize or like its progress meter better.

I enjoy our friendly competition with wget and I seriously and honestly think it has made both our projects better and I like that users can throw arguments in our face like “but X can do Y”and X can alter between curl and wget depending on which camp you talk to. I also really like wget as a tool and I am the occasional user of it, just like most Unix users. I contribute to the wget project well, both with code and with general feedback. I consider myself a friend of the current wget maintainer as well as former ones.

daniel.haxx.se episode 8

Monday, October 27th, 2014

Today I hesitated to make my new weekly video episode. I looked at the viewers number and how they basically have dwindled the last few weeks. I’m not making this video series interesting enough for a very large crowd of people. I’m re-evaluating if I should do them at all, or if I can do something to spice them up…

… or perhaps just not look at the viewers numbers at all and just do what think is fun?

I decided I’ll go with the latter for now. After all, I enjoy making these and they usually give me some interesting feedback and discussions even if the numbers are really low. What good is a number anyway?

This week’s episode:

Personal

Firefox

Fun

HTTP/2

TALKS

  • I’m offering two talks for FOSDEM

curl

  • release next Wednesday
  • bug fixing period
  • security advisory is pending

wget

Pretending port zero is a normal one

Saturday, October 25th, 2014

Speaking the TCP protocol, we communicate between “ports” in the local and remote ends. Each of these port fields are 16 bits in the protocol header so they can hold values between 0 – 65535. (IPv4 or IPv6 are the same here.) We usually do HTTP on port 80 and we do HTTPS on port 443 and so on. We can even play around and use them on various other custom ports when we feel like it.

But what about port 0 (zero) ? Sure, IANA lists the port as “reserved” for TCP and UDP but that’s just a rule in a list of ports, not actually a filter implemented by anyone.

In the actual TCP protocol port 0 is nothing special but just another number. Several people have told me “it is not supposed to be used” or that it is otherwise somehow considered bad to use this port over the internet. I don’t really know where this notion comes from more than that IANA listing.

Frank Gevaerts helped me perform some experiments with TCP port zero on Linux.

In the Berkeley sockets API widely used for doing TCP communications, port zero has a bit of a harder situation. Most of the functions and structs treat zero as just another number so there’s virtually no problem as a client to connect to this port using for example curl. See below for a printout from a test shot.

Running a TCP server on port 0 however, is tricky since the bind() function uses a zero in the port number to mean “pick a random one” (I can only assume this was a mistake done eons ago that can’t be changed). For this test, a little iptables trickery was run so that incoming traffic on TCP port 0 would be redirected to port 80 on the server machine, so that we didn’t have to patch any server code.

Entering a URL with port number zero to Firefox gets this message displayed:

This address uses a network port which is normally used for purposes other than Web browsing. Firefox has canceled the request for your protection.

… but Chrome accepts it and tries to use it as given.

The only little nit that remains when using curl against port 0 is that it seems glibc’s getpeername() assumes this is an illegal port number and refuses to work. I marked that line in curl’s output in red below just to highlight it for you. The actual source code with this check is here. This failure is not lethal for libcurl, it will just have slightly less info but will still continue to work. I claim this is a glibc bug.

$ curl -v http://10.0.0.1:0 -H "Host: 10.0.0.1"
* Rebuilt URL to: http://10.0.0.1:0/
* Hostname was NOT found in DNS cache
* Trying 10.0.0.1...
* getpeername() failed with errno 107: Transport endpoint is not connected
* Connected to 10.0.0.1 () port 0 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.38.1-DEV
> Accept: */*
> Host: 10.0.0.1
>
< HTTP/1.1 200 OK
< Date: Fri, 24 Oct 2014 09:08:02 GMT
< Server: Apache/2.4.10 (Debian)
< Last-Modified: Fri, 24 Oct 2014 08:48:34 GMT
< Content-Length: 22
< Content-Type: text/html

<html>testpage</html>

Why doing this experiment? Just for fun to to see if it worked.

(Discussion and comments on this post is also found at Reddit.)

curl is no POODLE

Friday, October 17th, 2014

Once again the internet flooded over with reports and alerts about a vulnerability using a funny name: POODLE. If you have even the slightest interest in this sort of stuff you’ve already grown tired and bored about everything that’s been written about this so why on earth do I have to pile on and add to the pain?

This is my way of explaining how POODLE affects or doesn’t affect curl, libcurl and the huge amount of existing applications using libcurl.

Is my application using HTTPS with libcurl or curl vulnerable to POODLE?

No. POODLE really is a browser-attack.

Motivation

The POODLE attack is a combination of several separate pieces that when combined allow attackers to exploit it. The individual pieces are not enough stand-alone.

SSLv3 is getting a lot of heat now since POODLE must be able to downgrade a connection to SSLv3 from TLS to work. Downgrade in a fairly crude way – in libcurl, only libcurl built to use NSS as its TLS backend supports this way of downgrading the protocol level.

Then, if an attacker manages to downgrade to SSLv3 (both the client and server must thus allow this) and get to use the sensitive block cipher of that protocol, it must maintain a connection to the server and then retry many similar requests to the server in order to try to work out details of the request – to figure out secrets it shouldn’t be able to. This would typically be made using javascript in a browser and really only HTTPS allows this so no other SSL-using protocol can be exploited like this.

For the typical curl user or a libcurl user, there’s A) no javascript and B) the application already knows the request it is doing and normally doesn’t inject random stuff from 3rd party sources that could be allowed to steal secrets. There’s really no room for any outsider here to steal secrets or cookies or whatever.

How will curl change

There’s no immediate need to do anything as curl and libcurl are not vulnerable to POODLE.

Still, SSLv3 is long overdue and is not really a modern protocol (TLS 1.0, the successor, had its RFC published 1999) so in order to really avoid the risk that it will be possible exploit this protocol one way or another now or later using curl/libcurl, we will disable SSLv3 by default in the next curl release. For all TLS backends.

Why? Just to be extra super cautious and because this attack helped us remember that SSLv3 is old and should be let down to die.

If possible, explicitly requesting SSLv3 should still be possible so that users can still work with their legacy systems in dire need of upgrade but placed in corners of the world that every sensible human has since long forgotten or just ignored.

In-depth explanations of POODLE

I especially like the ones provided by PolarSSL and GnuTLS, possibly due to their clear “distance” from browsers.

FOSS them students

Thursday, October 16th, 2014

On October 16th, I visited DSV at Stockholm University where I had the pleasure of holding a talk and discussion with students (and a few teachers) under the topic Contribute to Open Source. Around 30 persons attended.

Here are the slides I use, as usual possibly not perfectly telling stand-alone without the talk but there was no recording made and I talked in Swedish anyway…

Contribute to Open Source from Daniel Stenberg

internal timers and timeouts of libcurl

Friday, October 10th, 2014

wall clockBear with me. It is time to take a deep dive into the libcurl internals and see how it handles timeouts and timers. This is meant as useful information to libcurl users but even more as insights for people who’d like to fiddle with libcurl internals and work on its source code and architecture.

socket activity or timeout

Everything internally in libcurl is using the multi, asynchronous, interface. We avoid blocking calls as far as we can. This means that libcurl always either waits for activity on a socket/file descriptor or for the time to come to do something. If there’s no socket activity and no timeout, there’s nothing to do and it just returns back out.

It is important to remember here that the API for libcurl doesn’t force the user to call it again within or at the specific time and it also allows users to call it again “too soon” if they like. Some users will even busy-loop like crazy and keep hammering the API like a machine-gun and we must deal with that. So, the timeouts are mostly to be considered advisory.

many timeouts

A single transfer can have multiple timeouts. For example one maximum time for the entire transfer, one for the connection phase and perhaps even more timers that handle for example speed caps (that makes libcurl not transfer data faster than a set limit) or detecting transfers speeds below a certain threshold within a given time period.

A single transfer is done with a single easy handle, which holds a list of all its timeouts in a sorted list. It allows libcurl to return a single time left until the nearest timeout expires without having to bother with the remainder of the timeouts (yet).

Curl_expire()

… is the internal function to set a timeout to expire a certain number of milliseconds into the future. It adds a timeout entry to the list of timeouts. Expiring a timeout just means that it’ll signal the application to call libcurl again. Internally we don’t have any identifiers to the timeouts, they’re just a time in the future we ask to be called again at. If the code needs that specific time to really have passed before doing something, the code needs to make sure the time has elapsed.

Curl_expire_latest()

A newcomer in the timeout team. I figured out we need this function since if we are in a state where we need to be called no later than a certain specific future time this is useful. It will not add a new timeout entry in the timeout list in case there’s a timeout that expires earlier than the specified time limit.

This function is useful for example when there’s a state in libcurl that varies over time but has no specific time limit to check for. Like transfer speed limits and the like. If Curl_expire() is used in this situation instead of Curl_expire_latest() it would mean adding a new timeout entry every time, and for the busy-loop API usage cases it could mean adding an excessive amount of timeout entries. (And there was a scary bug reported that got “tens of thousands of entries” which motivated this function to get added.)

timeout removals

We don’t remove timeouts from the list until they expire. Like for example if we have a condition that is timing dependent, then we set a timeout with Curl_expire() and we know we should be called again at the end of that time.

If we wouldn’t add the timeout and there’s no socket activity on the socket then we may not be called again – ever.

When an internal state transition into something else and we therefore don’t need a previously set timeout anymore, we have no handle or identifier to the timeout so it cannot be removed. It will instead lead to us getting called again when the timeout triggers even though we didn’t really need it any longer. As we’re having an API that allows this anyway, this is already handled by the logic and getting called an extra time is usually very cheap and is not considered a problem worth addressing.

Timeouts are removed automatically from the list of timers when they expire. Timeouts that are in passed time are removed from the list and the timers following will then get moved to the front of the queue and be used to calculate how long the single timeout should be next.

The only internal API to remove timeouts that we have removes all timeouts, used when cleaning up a handle.

many easy handles

I’ve mentioned how each easy handle treats their timeouts above. With the multi interface, we can have any amount of easy handles added to a single multi handle. This means one list of timeouts for each easy handle.

To handle many thousands of easy handles added to the same multi handle, all with their own timeout (as each easy handle only show their closest timeout), it builds a splay tree of easy handles sorted on the timeout time. It is a splay tree rather than a sorted list to allow really fast insertions and removals.

As soon as a timeout expires from one of the easy handles and it moves to the next timeout in its list, it means removing one node (easy handle) from the splay tree and inserting it again with the new timeout timer.

Coverity scan defect density: 0.00

Thursday, October 9th, 2014

A couple of days ago I decided to stop slacking and grab this long dangling item in my TODO list: run the coverity scan on a recent curl build again.

Among the static analyzers, coverity does in fact stand out as the very best one I can use. We run clang-analyzer against curl every night and it hasn’t report any problems at all in a while. This time I got almost 50 new issues reported by Coverity.

To put it shortly, a little less than half of them were issues done on purpose: for example we got several reports on ignored return codes we really don’t care about and there were several reports on dead code for code that are conditionally built on other platforms than the one I used to do this with.

But there were a whole range of legitimate issues. Nothing really major popped up but a range of tiny flaws that were good to polish away and smooth out. Clearly this is an exercise worth repeating every now and then.

End result

21 new curl commits that mention Coverity. Coverity now says “defect density: 0.00” for curl and libcurl since it doesn’t report any more flaws. (That’s the number of flaws found per thousand lines of source code.)

Want to see?

I can’t seem to make all the issues publicly accessible, but if you do want to check them out in person just click over to the curl project page at coverity and “request more access” and I’ll grant you view access, no questions asked.

A day in the curl project

Monday, September 29th, 2014

cURLI maintain curl and lead the development there. This is how I spend my time an ordinary day in the project. Maybe I don’t do all of these things every single day, but sometimes I do and sometimes I just do a subset of them. I just want to give you a look into what I do and why I don’t add new stuff more often or faster… I spend about one to three hours on the project every day. Let me also stress that curl is a tiny little project in comparison with many other open source projects. I’m certainly not saying otherwise.

the new bug

Someone submits a new bug in the bug tracker or on one of the mailing lists. Most initial bug reports lack sufficient details so the first thing I do is ask for more info and possibly ask the submitter to try a recent version as very often we get bug reported on very old versions. Many bug reports take several demands for more info before the necessary details have been provided. I don’t really start to investigate a problem until I feel I have a sufficient amount of details. We’re a very small core team that acts on other people’s bugs.

the question by a newbie in the project

A new person shows up with a question. The question is usually similar to a FAQ entry or an example but not exactly. It deserves a proper response. This kind of question can often be answered by anyone, but also most people involved in the project don’t feel the need or “familiarity” to respond to such questions and therefore remain quiet.

the old mail I haven’t responded to yet

I want every serious email that reaches the mailing lists to get a response, so all mails that neither I nor anyone else responds to I keep around in my inbox and when I have idle time over I go back and catch up on old mails. Some of them can then of course result in a new bug or patch or whatever. Occasionally I have to resort to simply saving away the old mail without responding in order to catch up, just to cut the list of outstanding things to do a little.

the TODO list for my own sake, things I’d like to get working on

There are always things I really want to see done in the project, and I work on them far too little really. But every once in a while I ignore everything else in my life for a couple of hours and spend them on adding a new feature or fixing something I’ve been missing. Actual development of new features is a very small fraction of all time I spend on this project.

the list of open bug reports

I regularly revisit this list to see what I can do to push the open ones forward. Follow-up questions, deep dives into source code and specifications or just the sad realization that a particular issue won’t be fixed within the nearest time (year?) so that I close it as “future” and add the problem to our KNOWN_BUGS document. I strive to keep the bug list clean and only keep relevant bugs open. Those issues that are not reproducible, are left without the proper attention from the reporter or otherwise stall will get closed. In general I feel quite lonely as responder in the bug tracker…

the mailing list threads that are sort of dying but I do want some progress or feedback on

In my primary email inbox I usually keep ongoing threads around. Lots of discussions just silently stop getting more posts and thus slowly wither away further up the list to become forgotten and ignored. With some interval I go back to see if the posters are still around, if there’s any more feedback or whatever in order to figure out how to proceed with the subject. Very often this makes me get nothing at all back and instead I just save away the entire conversation thread, forget about it and move on.

the blog post I want to do about a recent change or fix I did I’d like to highlight

I try to explain some changes to the world in blog posts. Not all changes but the ones that are somehow noteworthy as they perhaps change the way things have been or introduce new fun features perhaps not that easily spotted. Of course all features are always documented etc, but sometimes I feel I need to put some extra attention on focus on things in a more free-form style. Or I just write about meta stuff, like this very posting.

the reviewing and merging of patches

One of the most important tasks I have is to review patches. I’m basically the only person in the project who volunteers to review patches against any angle or corner of the project. When people have spent time and effort and gallantly send the results of their labor our way in the best possible format (a patch!), the submitter deserves a good review and proper feedback. Also, paving the road for more patches is one of the best way to scale the project. Helping newcomers become productive is important.

Patches are preferably posted on the mailing lists but there’s also some coming in via pull requests on github and while I strongly discourage that (due to them not getting the same attention and possible scrutiny on the list like the others) I sometimes let them through anyway just to be smooth.

When the patch looks good (or sometimes good enough and I just edit some minor detail), I merge it.

the non-disclosed discussions about a potential security problem

We’re a small project with a wide reach and security problems can potentially have grave impact on users. We take security seriously, and we very often have at least one non-public discussion going on about a problem in curl that may have security implications. We then often work on phrasing security advisories, working down exactly which versions that are vulnerable, producing patches for at least the most recent ones of those affected versions and so on.

tame stackoverflow

stackoverflow.com has become almost like a wikipedia for source code and programming related issues (although it isn’t wiki), and that site is one of the primary referrers to curl’s web site these days. I tend to glance over the curl and libcurl related questions and offer my answers at times. If nothing else, it is good to help keeping the amount of disinformation at low levels.

I strongly disapprove of people filing bug reports on such places or even very detailed (lib)curl core questions that should’ve been asked on the curl-library list.

there are idle times too

Yeah. Not very often, but sometimes I actually just need a day off all this. Sometimes I just don’t find motivation or energy enough to dig into that terrible seldom-happening bug on a platform I’ve never seen personally. A project like this never ends. The same day we release a new release, we just reset our clocks and we’re back on improving curl, fixing bugs and cleaning up things for the next release. Forever and ever until the end of time.

keep-calm-and-improve-curl