News in curl 7.24.0

We continue doing curl releases roughly bi-monthly. This time we strike back with a release holding a few interesting new things that I thought are worth highlighting a little extra!

The most important and most depressing news about this release is the two security problems that were fixed. Never before have we released two security advisories for the same release.

Security fixes

The “curl URL sanitization vulnerability” is about how curl trusts user provided URL strings a little too much. Providing sneakily crafted URLs with embeded url-encoded carriage returns and line feeds users could trick curl to do un-intended actions when POP3, SMTP or IMAP protocols were used.

The “curl SSL CBC IV vulnerability” is about how curl inadvertently disables a security measurement in OpenSSL and thus weakens the security for some aspects of SSL 3.0 and TLS 1.0 connections.

Changes

We have a bunch of new changes added to curl and libcurl that some users might like:

  • curl has this ability to run a set of “extra commands” for a couple of protocols when doing a transfer – we call them “quote” operations. A while ago we introduced a way to mark commands within a series of quote commands as not being important if they fail and that the rest of the commands should be sent anyway. We mark such commands with a ‘*’-prefix. Starting now, we support that ‘*’-prefix for SFTP operations as well!
  • CURLOPT_DNS_SERVERS is a brand new option that allows programs to set which DNS server(s) libcurl should use to resolve host names. This function only works if libcurl was built to use a resolver backend that allows it to change DNS servers. That currently means nothing else but c-ares.
  • Now supports nettle for crypto functions. libcurl has long been supporting both OpenSSL and gcrypt backends for some of the crypto functions libcurl supports. The gcrypt made perfect sense when libcurl was built to use GnuTLS built to use gcrypt, but since GnuTLS recently has changed to using nettle by default the newly added support to use nettle with remove the need for an extra crypto link being linked for some users.
  • CURLOPT_INTERFACE was modified to allow “magic prefixes” for the application to tell that it uses an interface and not a host name and vice versa. The previous way would always test for both, which could lead to accidental (and slow) name resolves when the interface name isn’t currently present etc.
  • Active FTP sessions with the multi interface are now done much more non-blocking than before. Previously the multi interface would block while waiting for the server to connect back but it no longer does. A new option called CURLOPT_ACCEPTTIMEOUT_MS was added to allow programs to set how long libcurl should wait for accepting the server getting back.
  • Coming in from the Debian packaging guys, the configure script how features a new option called –enable-versioned-symbols that does exactly what it is called: it enables versioned symbols in the output libcurl.

Join the SPDY library development

Back in October I posted about my intentions to work on getting curl support for SPDY to be based on libspdy. I also got in touch with Thomas, the primary author of libspdy and owner of libspdy.org.

Unfortunately, he was ill already then and he was ill when I communicated with him what I wanted to see happen and I also posted a patch etc to him. He mentioned to me (in a private email) a lot of work they’ve done on the code in a private branch and he invited me to get access to that code to speed up development and allow me to use their code.

I never got any response on my eager “yes, please let me in!” mail and I’ve since mailed him twice over the period of the latest months and as there have been no responses I’ve decided to slowly ramp up my activities on my side while hoping he will soon get back.

I’ve started today by setting up the spdy-library mailing list. I hope to attract fellow interested hackers to join me on this. The goal is quite simply to make a libspdy that works for us. It is to be C89 code that is portable with an API that “makes sense”. I don’t know yet if we will work on libspdy as it currently looks, if Thomas’ team will push their updated work soon or if going with my current spindly fork off github is the way. I hope to get help to decide this!

Join the effort by simply adding yourself the mailing list and participate in the discussions: http://cool.haxx.se/cgi-bin/mailman/listinfo/spdy-library.

And a wiki on github.

Update: I’ve created a hub collecting all related info and pointers over at spindly.haxx.se.

Welcome!

Rosetta stone

How to figure out if a program uses curl? I get mails from users of it since the curl license is included somewhere and it includes my email address and very often that is the only address available…

To: Daniel Stenberg
Subject: Rosetta Stone Question

I am trying to install Rosetta Stone on my Mac but I am having trouble. The ReadMe says to contact the author, and this email was in the license info. Am I to understand that you are the author?

I don’t know exactly what Rosetta Stone is, but I guess it is the language learning software at www.rosettastone.com

Update

September 8, 2022. It is still alive!

getaddrinfo with round robin DNS and happy eyeballs

This is not news. This is only facts that seem to still be unknown to many people so I just want to help out documenting this to help educate the world. I’ll dance around the subject first a bit by providing the full background info…

round robin basics

Round robin DNS has been the way since a long time back to get some rough and cheap load-balancing and spreading out visitors over multiple hosts when they try to use a single host/service with static content. By setting up an A entry in a DNS zone to resolve to multiple IP addresses, clients would get different results in a semi-random manner and thus hitting different servers at different times:

server  IN  A  192.168.0.1
server  IN  A  10.0.0.1
server  IN  A  127.0.0.1

For example, if you’re a small open source project it makes a perfect way to feature a distributed service that appears with a single name but is hosted by multiple distributed independent servers across the Internet. It is also used by high profile web servers, like for example www.google.com and www.yahoo.com.

host name resolving

If you’re an old-school hacker, if you learned to do socket and TCP/IP programming from the original Stevens’ books and if you were brought up on BSD unix you learned that you resolve host names with gethostbyname() and friends. This is a POSIX and single unix specification that’s been around since basically forever. When calling gethostbyname() on a given round robin host name, the function returns an array of addresses. That list of addresses will be in a seemingly random order. If an application just iterates over the list and connects to them in the order as received, the round robin concept works perfectly well.

but gethostbyname wasn’t good enough

gethostbyname() is really IPv4-focused. The mere whisper of IPv6 makes it break down and cry. It had to be replaced by something better. Enter getaddrinfo() also POSIX (and defined in RFC 3943 and again updated in RFC 5014). This is the modern function that supports IPv6 and more. It is the shiny thing the world needed!

not a drop-in replacement

So the (good parts of the) world replaced all calls to gethostbyname() with calls to getaddrinfo() and everything now supported IPv6 and things were all dandy and fine? Not exactly. Because there were subtleties involved. Like in which order these functions return addresses. In 2003 the IETF guys had shipped RFC 3484 detailing Default Address Selection for Internet Protocol version 6, and using that as guideline most (all?) implementations were now changed to return the list of addresses in that order. It would then become a list of hosts in “preferred” order. Suddenly applications would iterate over both IPv4 and IPv6 addresses and do it in an order that would be clever from an IPv6 upgrade-path perspective.

no round robin with getaddrinfo

So, back to the good old way to do round robin DNS: multiple addresses (be it IPv4 or IPv6 or both). With the new ideas of how to return addresses this load balancing way no longer works. Now getaddrinfo() returns basically the same order in every invoke. I noticed this back in 2005 and posted a question on the glibc hackers mailinglist: http://www.cygwin.com/ml/libc-alpha/2005-11/msg00028.html As you can see, my question was delightfully ignored and nobody ever responded. The order seems to be dictated mostly by the above mentioned RFCs and the local /etc/gai.conf file, but neither is helpful if getting decent round robin is your aim. Others have noticed this flaw as well and some have fought compassionately arguing that this is a bad thing, while of course there’s an opposite side with people claiming it is the right behavior and that doing round robin DNS like this was a bad idea to start with anyway. The impact on a large amount of common utilities is simply that when they go IPv6-enabled, they also at the same time go round-robin-DNS disabled.

no decent fix

Since getaddrinfo() now has worked like this for almost a decade, we can forget about “fixing” it. Since gai.conf needs local edits to provide a different function response it is not an answer. But perhaps worse is, since getaddrinfo() is now made to return the addresses in a sort of order of preference it is hard to “glue on” a layer on top that simple shuffles the returned results. Such a shuffle would need to take IP versions and more into account. And it would become application-specific and thus would have to be applied to one program at a time. The popular browsers seem less affected by this getaddrinfo drawback. My guess is that because they’ve already worked on making asynchronous name resolves so that name resolving doesn’t lock up their processes, they have taken different approaches and thus have their own code for this. In curl’s case, it can be built with c-ares as a resolver backend even when supporting IPv6, and c-ares does not offer the sort feature of getaddrinfo and thus in these cases curl will work with round robin DNSes much more like it did when it used gethostbyname.

alternatives

The downside with all alternatives I’m aware of is that they aren’t just taking advantage of plain DNS. In order to duck for the problems I’ve mentioned, you can instead tweak your DNS server to respond differently to different users. That way you can either just randomly respond different addresses in a round robin fashion, or you can try to make it more clever by things such as PowerDNS’s geobackend feature. Of course we all know that A) geoip is crude and often wrong and B) your real-world geography does not match your network topology.

happy eyeballs

During this period, another connection related issue has surfaced. The fact that IPv6 connections are often handled as a second option in dual-stacked machines, and the fact is that IPv6 is mostly present in dual stacks these days. This sadly punishes early adopters of IPv6 (yes, they unfortunately IPv6 must still be considered early) since those services will then be slower than the older IPv4-only ones.

There seems to be a general consensus on what the way to overcome this problem is: the Happy Eyeballs approach. In short (and simplified) it recommends that we try both (or all) options at once, and the fastest to respond wins and gets to be used. This requires that we resolve A and AAAA names at once, and if we get responses to both, we connect() to both the IPv4 and IPv6 addresses and see which one is the fastest to connect.

This of course is not just a matter of replacing a function or two anymore. To implement this approach you need to do something completely new. Like for example just doing getaddrinfo() + looping over addresses and try connect() won’t at all work. You would basically either start two threads and do the IPv4-only route in one and do the IPv6 route in the other, or you would have to issue non-blocking resolver calls to do A and AAAA resolves in parallel in the same thread and when the first response arrives you fire off a non-blocking connect() …

My point being that introducing Happy Eyeballs in your good old socket app will require some rather major remodeling no matter what. Doing this will most likely also affect how your application handles with round robin DNS so now you have a chance to reconsider your choices and code!

Top-3 curl bugs in 2011

This is a continuation of my little top-3 things in curl during 2011 which started with the top-3 changes 2011.

The changelog on the curl site lists 150 bugs fixed in the seven released of the year. The most import fixes in my view were…

Bug-fix 1: handle HTTP redirects to //hostname/path

Following redirects is one of the fundamentals of HTTP user agents and one of the primary things people use curl and libcurl for is to mimic browser to do automatic stuff on the web. Therefore it was even more embarrassing to realize that libcurl didn’t properly support the relative redirect when the Location: header doesn’t include the protocol but the host name. It basically means that the protocol shall remain (in reality that means HTTP or HTTPS) but it should move over to the new host and path. All browsers support this since ages ago. Since November 15th 2011, libcurl does too!

Bug-fix 2: inappropriate GSSAPI delegation

We had one security vulnerability announced in 2011 and this was it. I won’t try to blame someone else for this mistake, but there are some corners of curl and libcurl I’m not personally very familiar with and I would say the GSS stuff is one of those. In fact, even the actual GSS and GSSAPI technologies are mercy areas as far as my knowledge reaches so I was not at all aware of this feature or that we even made us of it… Of course it also turns out that there’s a certain amount of existing applications that need it so we now have that ability in the library again if enabled by an option.

Bug-fix 3: multi interface, connect fail continue to next IP

One of those silly bugs nobody would expect us to have at this point. It turned out the code for the multi interface didn’t properly move on to try the next IP in case a connect() failed and the host name had resolved to a number of addresses to try. A long term goal of mine is to remodel the internals of libcurl to always use the multi interface code and I would just wrap that interfact with some glue logic to offer the easy interface. For that to work (and for lots of other reasons of course), the multi interface simply must work for all of these things.

Additionally, this is another of those things that are hard to test for in the test suite as it would involve trickery on IP or TCP level and that’s not easy to accomplish in a portable manner.

Top-3 curl changes in 2011

At the end of the year I thought it would be interesting to have a look back over the past twelve months and see what the biggest changes in the curl project were and what the most important bug-fixes were etc. I’ve turned into a little mini series of blog posts. The top-3s. First out, the top-3 changes.

Top-3 changes

In total I counted 29 notable changes brought during 2011. The most significant ones in my view are:

Change 1 – fancier protocol support in proxy strings

This may sound trivial, and the code certainly is, but with this change suddenly a lot of applications that use libcurl got better proxy support without having do anything at all. Previously the application would have to set what protocol type the proxy it would use is, even though libcurl has supported having the proxy specified as an environment variable for ages.

Having only the proxy name was useful but limiting. With this new change you can specify proxy type or rather proxy protocol by prefixing the proxy name like “socks4://magic-proxy.example.com” if you want a SOCKS4 proxy. libcurl now supports socks4, socks4a, socks5 or socks5h used as prefixes as well as http.

Change 2 – allow sending “empty” HTTP headers

Another minor change in code but possibly larger impact in usefulness for applications. We introduced a way for applications to change internal headers and to add new ones ages ago. We (or rather I) then made the choice that if you’d provide a header with only the name and a colon, that is with no contents on the right side, it would delete the internal header. That was not a clever move as later on people have wanted to add “blank” or “empty” headers that look exactly like that, but libcurl has then refused to.

There have been some more or less hackish work-arounds to trick libcurl into allowing an empty header, but now finally we introduce a nice and clean way for applications to pass in these kinds of empty headers:

Pass in an empty header that instead of a colon has a semicolon! This is an otherwise illegal header that wouldn’t make sense, but libcurl will use that as a trigger that an empty header should be used and it will then replace the semicolon with a colon and things will be fine.

Change 3 – Added support for cyassl and axTLS

Proving that libcurl moves forward and into more and more markets, the number of supported SSL libraries grew to 7 this year. The all new cyassl backend that replaced the previously done yassl backend that was using cyassl’s former OpenSSL emulation layer. Now we’re using the native and pure API and things are much cleaner. The possibly smallest available TLS library axTLS also got support.

Not all backends and not all SSL libraries are the same or support the same set of features, but then libcurl is used in many different scenarios and use cases and this way we offer more options to more users to craft libcurl for their particular needs. Our internal SSL backend API has managed quite well and proves to have been a worthy change. Adding support for yet another SSL library within libcurl is actually not a lot of work.

Change 4 – TSL-SRP

You may think number 4 of a top-3 list is weird, but I couldn’t cut it off here! =) TLS-SRP has been waiting in the shadows for so long and all of a sudden two of the major SSL libraries have support for it in released versions and libcurl got support for using these features in both libraries during 2011.

cURL

curlers rest on Sundays and during July

We now run gitstats on the curl git repository daily and provides fun graphs.

We have almost 11 years of source code history covered and I personally have done some ~68% of all commits. Given this long history it is fun to see some very clear trends. Like this first one: look at the distribution of commits per weekday over the entire period. The amount of commits done during weekends are significantly lower than during the work week, and the Sunday amount is clearly even lower than Saturday:

day_of_week

Similarly, we can see how the activity is spread out over calendar months. This shows an obvious correlation to the slower periods in my life, which means that July is vacation times and the numbers show it:

month_of_year

My five ADSL modems

bredbandsbolagetI previously blogged when my network hardware died. Here’s the recap and continuation of that story and how things evolved…

One day my ADSL modem could no longer get sync, I couldn’t send data and my (landline) phone was dead. My phone is connected into the ADSL modem through which it does IP telephony. Other times this has happened I could just switch off the modem for 10 seconds and then back on again it would work again for another 6 months or a year or so.

I’ve had ADSL at roughly 12mbit working flawlessly for several years so this was an unexpected breakage.

On 14 sep 16:16 I called my operator’s (Bredbandsbolaget) support about the issue when the modem hadn’t been able to get contact for a whole day – I was suspecting some kind of glitch in the service from the other end. The support person said that I had a “very old modem” and they immediately decided to send me a new modem by mail that would fix my problems.

xavi technologies x5258-p2At 16 sep 18:51 I called support again. I received modem #2 and installed it this day. The modem, Xavi Technologies X5258-P2, is a much more fancy model than what I had been using for the last couple of years – the new one had 4 Ethernet ports and wifi. Not that I really care about that cruft as I want to use my own wifi router anyway to get control of things better.

When I plugged in modem #2 I noticed that it lit up the ‘phone’ LED at once (which normally would only be on if I use the phone) and while internet data seemed to work, the phone did not. When I called support again to ask about this, they decided it was a broken modem they had sent me and would send me a replacement at once.

A few days later I got modem #3 and installed it. I also got the joy of sending back two ADSL modems.

3 oct 20:25 – I called the support again. Modem #3 hung occasionally and I wanted to get their help to fix the problem. The support guy I talked to claimed his sometimes happens if a wifi router is too close to the modem and adviced me to put my ADSL modem and wifi router further apart. It sounded like a suspicious analysis and theory to me, as why would the modem completely hang from this and if it did, why would it keep on running for days at times after a reboot? The support person also revealed that he had detailed logs going back a few weeks at least where he could see my ADSL modem power recycles and he could also see “bad CRC” counters going up before my restarts. I moved my devices two meters apart.

A little side-story: the modem has wifi support, but as I run my own wifi router behind it I don’t want the modem’s wifi. I noticed it ran on a different channel than my regular one so it wasn’t an immediate concern. It did however turn out that in order to switch it off I had to configure that with a Windows program and in order to install that program I had to enter a username and password that I didn’t have. Asking support for the credentials, they instead offered to simply disable the wifi from their end instead. That was fine by me, but again showed what fancy controls they have over these things.

For a week or so my connection actually was better and I actually thought my suspicions about the fishy advice were wrong. But no. It turned out I was only lucky for a few days as then it started hanging again every few days. It would stop transfering data in/out, and the “phone” led would blink slowly. How on earth could a device like this hang in any circumstance? I’ve been an embedded developer all my professional life, I know hanging is the worst possible thing. I much better but still ugly way to resolve a problem without any obvious way out, would be to reboot. A reboot would’ve been annoying as well, but far from as annoying as this.

Now, after all, I have a fiber installation coming “soon” so I figured I could possibly just shut up and endure this ADSL mess and it will go away or at least change drastically once I get my new connection…

But eventually it got too tedious, also partly because my kids and my wife also found it annoying and troubling – I had to give up the eduring. The fiber installtion also seemed to be delayed. Who knows how long I was supposed to remain on ADSL.

So, on 5 dec 18:38 I was back on the phone with the support people and complained about the hangs I frequently get with modem #3. The guy listened to me explaining the issue, he checked the reboot logs from his side and swiftly decided he would send me a new modem. He decided to send a modem of a different brand this time to see if this made things work better in my end.

zyxel-p-2601hn

On dec 8th I got modem #4. A different model this time compared to #2 and #3. It was now a Zyxel P-2601. I got home from work at 18:15, had a quick dinner and then I connected the new equipment. Would this really be the end of my troubles? Anticipation!

– Oh harsh reality, how thee can be rough and cold.

This modem can’t be powered on. If I flip the power switch and turns it on, all the leds switch on but as soon as my finger leaves the power-on toggle again the modem turns itself off… At 18:52 I tried to call support, but a voice claimed they had “internal systems problems” so I gave up.

12:45 on Friday Dec 9th I called again and reported my broken modem and the friendly support woman was a bit surprised I had gotten a broken device as she said “straight from the factory”. She even expressed some sympathy about the replacement unit, modem #5, not being able to reach me until Monday.

On Monday the 12th I got an invoice wanting to charge me 500 SEK for one of the broken modems they claimed I never sent back so I had to call customer service again and have them not do that. (I find 500 SEK for a broken ADSL modem quite a hefty charge when that’s basically the price for a completely new and working unit…)

December 13, modem #5 arrived and I connected it. It didn’t work at once but the phone worked which gave me a clue, so I connected a laptop directly to the ADSL modem and when I then tried to use a browser on that network I reached an admin interface web server and by using that I could switch the modem over to “bridge mode”. It turned out the default setting for this device is to function as a DHCP server and all sorts of other funny things that I didn’t want it to do.

At the time of this writing, number five has been running without problems for 72 hours.

I’m interviewed by foss-magasin

foss-magasin

Claes at foss-magasin.se asked a bunch of questions about me, my commitments within the FOSS community and related matters recently over email. This Swedish interview just now went public: Daniel Stenberg cURL, Rockbox och FOSS-Sthlm (dead link).

For my international friends who don’t understand the Swedish: I am quite happy with the questions and being allowed to answer them at this lengths etc, so I am considering doing a full translation of it and posting it at a later date.

tech, open source and networking