Tag Archives: cURL and libcurl

curl 7.19.1

Trying hard to maintain the bimonthly release schedule we’ve been keeping up with for quite some time by now, we therefore now proudly announce the release of curl and libcurl 7.19.1

This release includes at least 24 bug fixes and the following changes:

The Open Source Census Report

I’d never heard about the Open Source Census before when I fell over a mention of their recent report somewhere. Their mission is to get “enterprises” to install their little client which scans computers for open source products and reports the findings back to a central server.

Anyway, their current database consists of a “mere” 2300 machines scanned but that equals a total of 314,000 open source installations. 768 different packages are identified. The top-10 found products are:

  1. firefox 84.4%
  2. zlib 65.75%
  3. xerces 61.24%
  4. wget 61.12%
  5. xalan 58.19%
  6. prototype 57.03%
  7. activation 53.01%
  8. javamail 50.15%
  9. openssl 46.45%
  10. docbook-xml 46.27%

Ok, as an open source hacker and a geek, there are two things we need to do here: 1) find out how our own projects rank among the others and 2) how the scanning is done and thus how good it is. Thankfully all this is possible due to the entire data set being downloadable for free and the client being fully open source.

find out how our own projects rank

“curl” was found on 18.19% of all computers. That makes it #81 on the list, just below virtualbox and wireshark, but immediately above jstl and busybox. This includes “All Versions” of all tools, and for curl’s sake that was 22 different versions!

I found no other project I do anything noticeable in. Subversion is at #44.

how the scanning is done

It’s quite simple. It scans for file names based on a file name pattern and then it pattern matches contents of those files. It also extracts version numbers for the files using those regex patterns. You can see the full set of patterns/rules in the XML file straight off their source code repository: project-rules.xml.

how good is it

With this specific patterns for binary contents they of course need special human treatment for many versions and that is of course error-prone. That could explain why no curl version of the latest version (7.19.0) was reported. It will also cause renamed tools to remain undetected.

In my particular case I would of course also like to know how much libcurl is used, but they don’t seem to check for that (I found several projects besides the curl tool that I know use libcurl).

All this said, I didn’t actually try out the client myself so I haven’t verified it for real.

ohloh vs statcvs

I’ve played a bit with statcvs lately and I generated reports for the curl repository. It turned out rather interesting (well, assuming you’re a statistics geek such as me) especially in comparison to the data and stats ohloh.net presents for the same code:

[the images have been lost in time, like tears in rain]

Executive summary:

  • I’ve done 82% of all code changes.
  • We seem to grow at roughly the same pace (both number of code lines and number of files) over the last years.
  • The lines of code per file count seems rather fixed

Oh, that initial big bump at late 1999/early 2000 was due to a lot of “wrong” files such as configure, config.guess etc were committed and subsequently removed. It is a bit annoying to have there as it ruins the data somewhat but I’ve not managed to fool statcvs into ignoring that part…

strcasecmp in Turkish

A friendly user submitted the (lib)curl bug report #2154627 which identified a problem with our URL parser. It doesn’t treat “file://” as a known protocol if the locale in use is Turkish.

This was the beginning of a minor world-moving revelation for me. Of course this is already known to mankind and I’m just behind, but really: lots of my fellow hacker friends had no idea either.

So “file” and “FILE” are not the same word case insensitively in Turkish because ‘i’ is not the lowercase version of ‘I’.

Back to strcasecmp: POSIX pretty much makes the function useless by saying that “The results are unspecified in other locales [than POSIX]”.

I’m a bit annoyed by this fact, as now I have to introduce my own function (which thus cannot use tolower() or toupper() since they also are affected by the locale) and use since the strings in our code is clearly used for “English” strings so file and FILE truly are the same string when compared case insensitively…

Another curl scan shows work to do

The nice guys on Coverity did a new scan on curl (the 7.19.0 source code) and they dug a bunch of new flaws. The previous version they checked was 7.16.1, release some 20 months before. The new changes are not only because of how the code has changed in the mean time, but it seems their scanner have improved a bit since the last time as well!

Here’s a sample view of how libcurl might dereference a NULL pointer with a step-by-step explanation on what conditions that lead to the flaw:

They identify 22 flaws and I found it interesting to compare the top list of bad functions as reported by Coverity with the complexity list I showed the other day. First we need to ignore the 9 flaws Coverity found in the ‘curl’ tool code (i.e not within the library). Then the 10 remaining functions with flaws marked by Coverity are:

  • Curl_getinfo (4 flaws, all the other ones have one each)
  • Curl_cookie_add (present in the complexity top-10 table)
  • FormAdd (present in the complexity top-10 table)
  • parsedate
  • ftp_parse_url_path
  • tftp_do
  • resolve_server
  • curl_easy_pause
  • add_closure
  • Curl_connect

See? Only two of them were present in that list. The Coverity tool does in fact also count the complexity for each function, and while it doesn’t match the values pmccabe shows exactly, they seem to agree in general about what functions that are the most complex ones.

Ok, now let’s go work on fixing all these problems…

Curl Cyclomatic Complexity

I was at the OWASP Sweden meeting last night and spoke about Open source and security. One of the other speakers present was Simon Josefsson who in his talk showed a nice table listing functions in his project sorted by “complexity“. Functions above a certain score are then considered “high risk” as they are hard to read and follow and thus may be subject to security problems.

The kind man he is, Simon already shows a page with a Curl Cyclomatic Complexity Report nicely identifying a bunch of functions we should really consider poking at to decrease complexity of. The top-10 “bad” functions are:

Function Score Statements Lines Code
ssh_statemach_act 254 880 1582 lib/ssh.c
Curl_http 204 395 886 lib/http.c
readwrite_headers 129 269 709 lib/transfer.c
Curl_cookie_add 118 247 502 lib/cookie.c
FormAdd 105 210 421 lib/formdata.c
dprintf_formatf 92 233 395 lib/mprintf.c
multi_runsingle 94 251 606 lib/multi.c
Curl_proxyCONNECT 74 212 443 lib/http.c
readwrite_data 73 127 319 lib/transfer.c
ftp_state_use_port 60 195 387 lib/ftp.c

I intend to use this as an indication on what functions within libcurl to work on. My plan is to primarily break down each of these functions to smaller ones to make them easier to read and follow. It would be cool to get every single function below 50. But I’m not sure that’s feasible or even really a good idea.

Security and Open Source

OWASP Sweden is arranging an event on October 6th in Stockholm Sweden to talk about security in the open source process.

I will be there doing talk about security in open source projects, in particular then how we work with security in the curl project. If you think of anything particular you would like me to address or include, feel free to give be a clue already before the event!

curl and libcurl 7.19.0

With almost 40 described bug fixes curl and libcurl 7.19.0 come flying with a range of new things, including the following:

  • curl_off_t gets its size/typedef somewhat differently than before. This may cause an ABI change for you. See lib/README.curl_off_t for a full explanation.
  • Added CURLINFO_PRIMARY_IP
  • Added CURLOPT_CRLFILE and CURLE_SSL_CRL_BADFILE
  • Added CURLOPT_ISSUERCERT and CURLE_SSL_ISSUER_ERROR
  • curl’s option parser for boolean options reworked
  • Added –remote-name-all
  • Now builds for the INTEGRITY operating system
  • Added CURLINFO_APPCONNECT_TIME
  • Added test selection by key word in runtests.pl
  • the curl tool’s -w option support the %{ssl_verify_result} variable
  • Added CURLOPT_ADDRESS_SCOPE and scope parsing of the URL according to RFC4007
  • Support –append on SFTP uploads (not with OpenSSH, though)
  • Added curlbuild.h and curlrules.h to the external library interface

We’ve worked really hard to get this to be a really solid and fine release. I hope it’ll show.

Getting cacerts for your tools

As the primary curl author, I’m finding the comments here interesting. That blog entry “Teaching wget About Root Certificates” is about how you can get cacerts for wget by downloading them from curl’s web site, and people quickly point out how getting cacerts from an untrusted third party place of course is an ideal situation for an MITM “attack”.

Of course you can’t trust any files off a HTTP site or a HTTPS site without a “trusted” certificate, but thinking that the curl project would run one of those just to let random people load PEM files from our site seems a bit weird. Thus, we also provide the scripts we do all this with so that you can run them yourself with whatever input data you need, preferably something you trust. The more paranoid you are, the harder that gets of course.

On Fedora, curl does come with ca certs (at least I’m told recent Fedoras do) and even if it doesn’t, you can actually point curl to use whatever cacert you like and since most default installs of curl uses OpenSSL like wget does, you could tell curl to use the same cacert your wget install uses.

This last thing gets a little more complicated when one of the two gets compiled with a SSL library that doesn’t easily support PEM (read: NSS), but in the case of curl in recent Fedora they build it with NSS but with an additional patch that allows it to still be able to read PEM files.