Tag Archives: cURL and libcurl

getaddrinfo with round robin DNS and happy eyeballs

This is not news. This is only facts that seem to still be unknown to many people so I just want to help out documenting this to help educate the world. I’ll dance around the subject first a bit by providing the full background info…

round robin basics

Round robin DNS has been the way since a long time back to get some rough and cheap load-balancing and spreading out visitors over multiple hosts when they try to use a single host/service with static content. By setting up an A entry in a DNS zone to resolve to multiple IP addresses, clients would get different results in a semi-random manner and thus hitting different servers at different times:

server  IN  A  192.168.0.1
server  IN  A  10.0.0.1
server  IN  A  127.0.0.1

For example, if you’re a small open source project it makes a perfect way to feature a distributed service that appears with a single name but is hosted by multiple distributed independent servers across the Internet. It is also used by high profile web servers, like for example www.google.com and www.yahoo.com.

host name resolving

If you’re an old-school hacker, if you learned to do socket and TCP/IP programming from the original Stevens’ books and if you were brought up on BSD unix you learned that you resolve host names with gethostbyname() and friends. This is a POSIX and single unix specification that’s been around since basically forever. When calling gethostbyname() on a given round robin host name, the function returns an array of addresses. That list of addresses will be in a seemingly random order. If an application just iterates over the list and connects to them in the order as received, the round robin concept works perfectly well.

but gethostbyname wasn’t good enough

gethostbyname() is really IPv4-focused. The mere whisper of IPv6 makes it break down and cry. It had to be replaced by something better. Enter getaddrinfo() also POSIX (and defined in RFC 3943 and again updated in RFC 5014). This is the modern function that supports IPv6 and more. It is the shiny thing the world needed!

not a drop-in replacement

So the (good parts of the) world replaced all calls to gethostbyname() with calls to getaddrinfo() and everything now supported IPv6 and things were all dandy and fine? Not exactly. Because there were subtleties involved. Like in which order these functions return addresses. In 2003 the IETF guys had shipped RFC 3484 detailing Default Address Selection for Internet Protocol version 6, and using that as guideline most (all?) implementations were now changed to return the list of addresses in that order. It would then become a list of hosts in “preferred” order. Suddenly applications would iterate over both IPv4 and IPv6 addresses and do it in an order that would be clever from an IPv6 upgrade-path perspective.

no round robin with getaddrinfo

So, back to the good old way to do round robin DNS: multiple addresses (be it IPv4 or IPv6 or both). With the new ideas of how to return addresses this load balancing way no longer works. Now getaddrinfo() returns basically the same order in every invoke. I noticed this back in 2005 and posted a question on the glibc hackers mailinglist: http://www.cygwin.com/ml/libc-alpha/2005-11/msg00028.html As you can see, my question was delightfully ignored and nobody ever responded. The order seems to be dictated mostly by the above mentioned RFCs and the local /etc/gai.conf file, but neither is helpful if getting decent round robin is your aim. Others have noticed this flaw as well and some have fought compassionately arguing that this is a bad thing, while of course there’s an opposite side with people claiming it is the right behavior and that doing round robin DNS like this was a bad idea to start with anyway. The impact on a large amount of common utilities is simply that when they go IPv6-enabled, they also at the same time go round-robin-DNS disabled.

no decent fix

Since getaddrinfo() now has worked like this for almost a decade, we can forget about “fixing” it. Since gai.conf needs local edits to provide a different function response it is not an answer. But perhaps worse is, since getaddrinfo() is now made to return the addresses in a sort of order of preference it is hard to “glue on” a layer on top that simple shuffles the returned results. Such a shuffle would need to take IP versions and more into account. And it would become application-specific and thus would have to be applied to one program at a time. The popular browsers seem less affected by this getaddrinfo drawback. My guess is that because they’ve already worked on making asynchronous name resolves so that name resolving doesn’t lock up their processes, they have taken different approaches and thus have their own code for this. In curl’s case, it can be built with c-ares as a resolver backend even when supporting IPv6, and c-ares does not offer the sort feature of getaddrinfo and thus in these cases curl will work with round robin DNSes much more like it did when it used gethostbyname.

alternatives

The downside with all alternatives I’m aware of is that they aren’t just taking advantage of plain DNS. In order to duck for the problems I’ve mentioned, you can instead tweak your DNS server to respond differently to different users. That way you can either just randomly respond different addresses in a round robin fashion, or you can try to make it more clever by things such as PowerDNS’s geobackend feature. Of course we all know that A) geoip is crude and often wrong and B) your real-world geography does not match your network topology.

happy eyeballs

During this period, another connection related issue has surfaced. The fact that IPv6 connections are often handled as a second option in dual-stacked machines, and the fact is that IPv6 is mostly present in dual stacks these days. This sadly punishes early adopters of IPv6 (yes, they unfortunately IPv6 must still be considered early) since those services will then be slower than the older IPv4-only ones.

There seems to be a general consensus on what the way to overcome this problem is: the Happy Eyeballs approach. In short (and simplified) it recommends that we try both (or all) options at once, and the fastest to respond wins and gets to be used. This requires that we resolve A and AAAA names at once, and if we get responses to both, we connect() to both the IPv4 and IPv6 addresses and see which one is the fastest to connect.

This of course is not just a matter of replacing a function or two anymore. To implement this approach you need to do something completely new. Like for example just doing getaddrinfo() + looping over addresses and try connect() won’t at all work. You would basically either start two threads and do the IPv4-only route in one and do the IPv6 route in the other, or you would have to issue non-blocking resolver calls to do A and AAAA resolves in parallel in the same thread and when the first response arrives you fire off a non-blocking connect() …

My point being that introducing Happy Eyeballs in your good old socket app will require some rather major remodeling no matter what. Doing this will most likely also affect how your application handles with round robin DNS so now you have a chance to reconsider your choices and code!

Top-3 curl bugs in 2011

This is a continuation of my little top-3 things in curl during 2011 which started with the top-3 changes 2011.

The changelog on the curl site lists 150 bugs fixed in the seven released of the year. The most import fixes in my view were…

Bug-fix 1: handle HTTP redirects to //hostname/path

Following redirects is one of the fundamentals of HTTP user agents and one of the primary things people use curl and libcurl for is to mimic browser to do automatic stuff on the web. Therefore it was even more embarrassing to realize that libcurl didn’t properly support the relative redirect when the Location: header doesn’t include the protocol but the host name. It basically means that the protocol shall remain (in reality that means HTTP or HTTPS) but it should move over to the new host and path. All browsers support this since ages ago. Since November 15th 2011, libcurl does too!

Bug-fix 2: inappropriate GSSAPI delegation

We had one security vulnerability announced in 2011 and this was it. I won’t try to blame someone else for this mistake, but there are some corners of curl and libcurl I’m not personally very familiar with and I would say the GSS stuff is one of those. In fact, even the actual GSS and GSSAPI technologies are mercy areas as far as my knowledge reaches so I was not at all aware of this feature or that we even made us of it… Of course it also turns out that there’s a certain amount of existing applications that need it so we now have that ability in the library again if enabled by an option.

Bug-fix 3: multi interface, connect fail continue to next IP

One of those silly bugs nobody would expect us to have at this point. It turned out the code for the multi interface didn’t properly move on to try the next IP in case a connect() failed and the host name had resolved to a number of addresses to try. A long term goal of mine is to remodel the internals of libcurl to always use the multi interface code and I would just wrap that interfact with some glue logic to offer the easy interface. For that to work (and for lots of other reasons of course), the multi interface simply must work for all of these things.

Additionally, this is another of those things that are hard to test for in the test suite as it would involve trickery on IP or TCP level and that’s not easy to accomplish in a portable manner.

Top-3 curl changes in 2011

At the end of the year I thought it would be interesting to have a look back over the past twelve months and see what the biggest changes in the curl project were and what the most important bug-fixes were etc. I’ve turned into a little mini series of blog posts. The top-3s. First out, the top-3 changes.

Top-3 changes

In total I counted 29 notable changes brought during 2011. The most significant ones in my view are:

Change 1 – fancier protocol support in proxy strings

This may sound trivial, and the code certainly is, but with this change suddenly a lot of applications that use libcurl got better proxy support without having do anything at all. Previously the application would have to set what protocol type the proxy it would use is, even though libcurl has supported having the proxy specified as an environment variable for ages.

Having only the proxy name was useful but limiting. With this new change you can specify proxy type or rather proxy protocol by prefixing the proxy name like “socks4://magic-proxy.example.com” if you want a SOCKS4 proxy. libcurl now supports socks4, socks4a, socks5 or socks5h used as prefixes as well as http.

Change 2 – allow sending “empty” HTTP headers

Another minor change in code but possibly larger impact in usefulness for applications. We introduced a way for applications to change internal headers and to add new ones ages ago. We (or rather I) then made the choice that if you’d provide a header with only the name and a colon, that is with no contents on the right side, it would delete the internal header. That was not a clever move as later on people have wanted to add “blank” or “empty” headers that look exactly like that, but libcurl has then refused to.

There have been some more or less hackish work-arounds to trick libcurl into allowing an empty header, but now finally we introduce a nice and clean way for applications to pass in these kinds of empty headers:

Pass in an empty header that instead of a colon has a semicolon! This is an otherwise illegal header that wouldn’t make sense, but libcurl will use that as a trigger that an empty header should be used and it will then replace the semicolon with a colon and things will be fine.

Change 3 – Added support for cyassl and axTLS

Proving that libcurl moves forward and into more and more markets, the number of supported SSL libraries grew to 7 this year. The all new cyassl backend that replaced the previously done yassl backend that was using cyassl’s former OpenSSL emulation layer. Now we’re using the native and pure API and things are much cleaner. The possibly smallest available TLS library axTLS also got support.

Not all backends and not all SSL libraries are the same or support the same set of features, but then libcurl is used in many different scenarios and use cases and this way we offer more options to more users to craft libcurl for their particular needs. Our internal SSL backend API has managed quite well and proves to have been a worthy change. Adding support for yet another SSL library within libcurl is actually not a lot of work.

Change 4 – TSL-SRP

You may think number 4 of a top-3 list is weird, but I couldn’t cut it off here! =) TLS-SRP has been waiting in the shadows for so long and all of a sudden two of the major SSL libraries have support for it in released versions and libcurl got support for using these features in both libraries during 2011.

cURL

curlers rest on Sundays and during July

We now run gitstats on the curl git repository daily and provides fun graphs.

We have almost 11 years of source code history covered and I personally have done some ~68% of all commits. Given this long history it is fun to see some very clear trends. Like this first one: look at the distribution of commits per weekday over the entire period. The amount of commits done during weekends are significantly lower than during the work week, and the Sunday amount is clearly even lower than Saturday:

day_of_week

Similarly, we can see how the activity is spread out over calendar months. This shows an obvious correlation to the slower periods in my life, which means that July is vacation times and the numbers show it:

month_of_year

I’m interviewed by foss-magasin

foss-magasin

Claes at foss-magasin.se asked a bunch of questions about me, my commitments within the FOSS community and related matters recently over email. This Swedish interview just now went public: Daniel Stenberg cURL, Rockbox och FOSS-Sthlm (dead link).

For my international friends who don’t understand the Swedish: I am quite happy with the questions and being allowed to answer them at this lengths etc, so I am considering doing a full translation of it and posting it at a later date.

A special thank you from Google

Believe me, this kind of seemingly random act of kindness warms up even the most seasoned and cold hacker’s heart. Look at this excerpt from a mail I received:

From: Stephanie T <…@google.com>
Subject: 2 Googlers would like to recognize your hard work on cURL

Hello Daniel,

As you know, we here in Google’s Open Source Programs Office are always interested in learning about new projects and people in the open source community. To that end, we asked our co-workers to help us by nominating people outside of Google that they thought were doing great things in the world of open source. This information helps us determine which organizations to fund and which projects to select for the Google Summer of Code and related programs.

Chris C and John M (Googlers) believe your work on cURL deserves to be recognized. To show our appreciation for this work, we would like to send you a special thank you gift and 2- $175 USD gift codes (totaling $350 USD) to the Google online store so you can pick out tee shirts or other gifts for yourself and loved ones.

[snip]

Thanks a lot!

(The names of the persons in the mail have been shortened by me to reduce their exposure here.)

Apple’s modified CA cert handling and curl

I tweeted about me finding a change in Apple’s version of curl that I haven’t seen any public patch for. Apple otherwise hosts a whole slew of curl patches which they never discuss with us about but still make public and we can see what they did.

I was trying to help out a fellow curl user on IRC (we’re in #curl on freenode, come see us) and he was trying to understand some funny effects of running curl against a HTTPS site and he showed me the output from a “curl -v” log. The verbose log curiously was different than mine (same curl version built by myself on Linux). My conclusion was that something was different in the Apple version.

The users log said:

* About to connect() to host.example.com port 443 (#0)
*   Trying 1.2.3.4... connected
* Connected to host.example.com (1.2.3.4) port 443 (#0)
* SSLv3, TLS handshake, Client hello (1):

… while my command against the same site said:

* About to connect() to host.example.com port 443 (#0)
*   Trying 1.2.3.4... connected
* Connected to host.example.com (1.2.3.4) port 443 (#0)
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: none
* SSLv3, TLS handshake, Client hello (1):

(I’ve bolded the part my output showed that wasn’t in the mac version, the real host name and IP have been changed.)

It seems I was wrong however.

The output above is only shown if libcurl sets the CA cert path to OpenSSL and it seems the Mac version doesn’t. Somehow they get the CA certs loaded to libcurl differently.

So ok, maybe they didn’t modify curl but they certainly changed how curl uses CA certs and they did this by modifying OpenSSL and clearly their version of OpenSSL now defaults to use their CA cert bundle. The end result for me is still the same though: I have no idea how CA certs work with curl on Mac so it leaves me with the unfortunate situation where I can’t help fellow curl users when they have CA cert problems on a Mac.

It also leaves me very curious on what –cacert does exactly on the mac version of curl.

OpenSSL is patched. Apparently it now works so that if the “normal” x509 validation fails, and TrustEvaluationAgent (TEA) is enabled, it will attempt to use the TEA to validate the certificate. The apple source code to read through for this is x509_vfy_apple.c in their patched OpenSSL tree. It is also possible to skip the TEA verification thing in OpenSSL by setting an environment variable, so that we can still have curl on mac act “as default” with a command line like:

$ env OPENSSL_X509_TEA_DISABLE=1 curl https://www.example.com/

Finally: yes, curl is released under an MIT license. They’re perfectly allowed to do whichever of these actions they want. I know this, and I chose the MIT license fully aware that any company can take the code, modify it and never return any changes. I’m not arguing against anyone’s rights to do this with curl.

Thank you, friendly anonymous helper for helping me straighten out my findings!

Three out of one hundred

If I’m not part of the solution, I’m part of the problem and I don’t want to be part of the problem. More specifically, I’m talking about female presence in tech and in particular in open source projects.

3 out of 100I’ve been an open source and free software hacker, contributor and maintainer for almost 20 years. I’m the perfect stereo-type too: a white, hetero, 40+ years old male living in a suburb of a west European city. (I just lack a beard.) I’ve done more than 20,000 commits in public open source code repositories. In the projects I maintain, and have a leading role in, and for the sake of this argument I’ll limit the discussion to curl, libssh2, and c-ares, we’re certainly no better than the ordinary average male-dominated open source projects. We’re basically only men (boys?) talking to other men and virtually all the documentation, design and coding is done by male contributors (to a significant degree).

Sure, we have female contributors in all these projects, but for example in the curl case we have over 850 named contributors and while I’m certainly not sure who is a woman and who is not when I get contributions, there’s only like 10 names in the list that are typically western female names. Let’s say there are 20. or 30. Out of a total of 850 the proportions are devastating no matter what. It might be up to 3%. Three. THREE. I know women are under-represented in technology in general and in open source in particular, but I think 3% is even lower than the already low bad average open source number. (Although, some reports claim the number of female developers in foss is as low as just above 1%, geekfeminism says 1-5%).

Numbers

Three percent. (In a project that’s been alive and kicking for thirteen years…) At this level after this long time, there’s already a bad precedent and it of course doesn’t make it easier to change now. It is also three percent of the contributors when we consider all contributors alike. If we’d count the number of female persons in leading roles in these projects, the amount would be even less.

It could be worth noting that we don’t really have any recent reliable stats for “real world” female share either. Most sources that I find on the Internet and people have quoted in talks tend to repeat old numbers that were extracted using debatable means and questions. The comparisons I’ve seen repeated many times on female participation in FOSS vs commercial software, are very often based on stats that are really not comparable. If someone has reliable and somewhat fresh data, please point them out for me!

“Ghosh, R. A.; Glott, R.; Krieger, B.;
Robles, G. 2002. Free/Libre and Open Source Software: Survey and Study. Part
IV: Survey of Developers. Maastricht: International Institute of Infonomics
/Merit.

A design problem of “the system”

I would blame “the system”. I’m working in embedded systems professionally as a consultant and contract developer. I’ve worked as a professional developer for some 20 years. In my niche, there’s not even 10% female developers. A while ago I went through my past assignments in order to find the last female developer that I’ve worked with, in a project, physically located in the same office. The last time I met a fellow developer at work who was female was early 2007. I’ve worked in 17 (seventeen!) projects since then, without even once having had a single female developer colleague. I usually work in smaller projects with like 5-10 people. So one female in 18 projects makes it something like one out of 130 or so. I’m not saying this is a number that is anything to draw any conclusions from since it just me and my guesstimates. It does however hint that the problem is far beyond “just” FOSS. It is a tech problem. Engineering? Software? Embedded software? Software development? I don’t know, but I know it is present both in my professional life as well as in my volunteer open source work.

Geekfeminism says the share is 10-30% in the “tech industry”. My experience says the share gets smaller and smaller the closer to “the metal” and low level programming you get – but I don’t have any explanation for it.

Fixing the problems

What are we (I) doing wrong? Am I at fault? Is the the way I talk or the way we run these projects in some subtle – or obvious – ways not friendly enough or downright hostile to women? What can or should we change in these projects to make us less hostile? The sad reality is that I don’t think we have any such fatal flaws in our projects that create the obstacles. I don’t think many females ever show up near enough the projects to even get mistreated in the first place.

I have a son and I have a daughter – they’re both still young and unaware of this kind of differences and problems. I hope I will be able to motivate and push and raise them equally. I don’t want to live in a world where my daughter will have a hard time to get into tech just because she’s a girl.

libspdy

SPDY is a neat new protocol and possible contender to replace HTTP – at least in some areas and for some use cases. SPDY has been invented and developed mostly by Google engineers.

SPDY allows better usage of fewer TCP connections (since it sends multiple logical streams over a single physical TCP connection) and it helps clients overcome problems with TCP (like how a new connection starts slowly) while at the same time reducing latency and bandwidth requirements. Very similar to how channels are handled over an SSH connection.SPDY

Chrome of course already supports SPDY and Firefox has some early experimental support being worked on.

Of course there are also legitimate criticisms against SPDY as well, including subjects like how it makes caching proxies impossible (because everything goes over SSL), how it makes debugging a lot harder by using compressed headers, how it is impossible to extract just a single header from the stream due to its compression approach and how the compression state buffers make each individual stream use more memory than plain old HTTP (plain TCP) ones.

We can expect SPDY<=>HTTP gateways to appear so that nobody gets locked into either side of these protocols.

SPDY will provide faster transfers. libcurl is currently used for speed reasons in many cases. To me, it makes perfect sense to have libcurl use and try to use SPDY instead of HTTP exactly like how the browsers are starting to do it, so that the libcurl using applications will get their contents transferred faster.

My thinking is that we introduce some new magic option(s) that makes libcurl use SPDY, and for normal easy interface transfers it will remain to use a single connection for each new SPDY transfer, but if you use the multi interface and you enable pipelining you’ll instead make libcurl do multiple transfers over the same single SPDY connection (as long as you speak with the same server and port etc). From an application’s stand-point it shouldn’t make any difference, apart from being faster than otherwise. Just like we want it!

Implementation wise, I would like to use a reliable and efficient third-party library for the actual SPDY implementation. If there doesn’t exist any, we make one and run that one independently. I found libspdy, but I found some concerns about it (no mailing list, looks like one-man project, not C89 compliant, no API docs etc). I mailed the libspdy author, I hoping we’d sort out my doubts and then I’d base my continued work on that library.

After some time Thomas Roth, primary libspdy author, responded and during our subsequent email exchange I’ve gotten a restored faith and belief in this library and its direction. Not only did he fix the C89 compliance pretty quickly, he is also promising rather big changes that are pending to get committed within a week or so.

Comforted by what I’ve learned from Thomas, I’ll wait for his upcoming changes and I’ll join the soon to be created mailing list for the libspdy project and I’ll contribute some ideas and efforts to help shape it into the fine SPDY library we all want. I can only encourage other fellow SPDY library interested persons to do the same!

Updated: Join the SPDY library development

curl meetup at Fosdem 2012

The FOSDEM 2012 dates were recently revealed (4-5 February 2012).

A pint of guinness

I’d be happy to arrange a get-together for libcurl hackers at Fosdem this year. To me, Brussels, Belgium seems mid-europe enough to be able to attract a bunch of us:

  • libcurl application users/authors
  • libcurl binding hackers
  • libcurl contributors
  • … and everyone else who’s doing related activities or who just is interested

Potential subjects to discuss at such a meeting:

  • what’s the most important stuff libcurl still lacks?
  • what’s the least documented/understood parts of libcurl?
  • are there shared problems several/many libcurl bindings have to solve?
  • can we improve how we work/develop libcurl and bindings?
  • what kind of beer is best at a curl meetup?
  • [fill in your own curl related subject]

I would like at least 4-5 people voicing interest for this to be worthwhile for me to actually try to do anything. Please speak up on the libcurl mailing list, tweet me or mail me privately! The more people that are interested, the more planning and stuff we’ll do for it.