Tag Archives: PHP

libcurl claimed to be dangerous

On October 24th, my twitter feed suddenly got more activity than usual when suddenly there’s a mention of a newly(?) published paper:

The most dangerous code in the world: validating SSL certificates in non-browser software

Within the twelve page document they discuss flaws in various APIs and other certificate checking software, and for libcurl they say:

Internally, it uses OpenSSL to verify the chain of trust and verifies the hostname itself. This functionality is controlled by parameters CURLOPT_SSL_VERIFYPEER (default value: true) and CURLOPT_SSL_VERIFYHOST (default value: 2). This interface is almost perversely bad. The VERIFYPEER parameter is a boolean, while a similar-looking VERIFYHOST parameter is an integer.

(The fact that libcurl supports no less than nine(!) different SSL library backends seems to have been ignored but is irrelevant.)

The final part is their focus. It is an integer option but it looks like it could be similar to the VERIFYPEER option which could be considered a boolean option – but note that there is no boolean options at all in libcurl, those are all “long” values. They go on to explain:

Well-intentioned developers not only routinely misunderstand these parameters, but often set CURLOPT_SSL_VERIFY HOST to TRUE, thereby changing it to 1 and thus accidentally isabling hostname verification with disastrous consequences

They back up their claim with some snippets from PHP programs showing wrong use in chapter 7.

What did the authors do to try to fix the problem before posting rude comments in a report? Nothing. At. All. They could’ve emailed, tweeted or posted a bug report or patch but none of that happened.

They also only post examples of the bad use made by PHP code. The PHP code uses the PHP/CURL binding and a change could easily be done in the PHP binding. I don’t know PHP internals, but perhaps the option could be made to not accept a boolean value instead of a numerical there.

We’re now discussing this topic on the libcurl mailing list. If you have ideas or suggestions or just comments, feel free to join in!

Oh, and I feel that my recent blog post on the non-verifying users seems related and relevant.

I will also call the majority of all these suddenly appearing complainers on this API to be mostly hypocrites since the API has been established and working like this for over a 10 (ten!) years and not a single person has objected to it before. Joining up on the “bandwagon” now and calling the API stupid or silly is… well, I’d call it “non-intelligent behavior”. In libcurl we take a stable and solid API and ABI very seriously. We simply do not break API nor ABI unless forced brutally into a corner we can’t escape otherwise. Therefore we have kept this API to keep existing applications functional.

Update: the discussion thread on the topic from the PHP-DEV list. Thanks to Jan Ehrhardt.

Second update: we shipped libcurl 7.28.1 on November 20 2012, and it no longer accepts the value 1 to VERIFYHOST, but will instead cause curl_easy_setopt() return an error and use the default value (which is 2). This will prevent applications to accidentally be insecure due to use of 1.

SSL verification still often disabled

SSL padlockBack in 2002 I realized that having libcurl not do SSL server verification by default basically meant that everyone writing libcurl apps would inherit that flaw, simply because most people always just let the defaults remain unless they really have to read up on what something does and then modify them. If things work, things will just remain. So when we shipped libcurl 7.10 on the first of October that year, libcurl started verifying server certs by default.

Fast forward about ten years.

Surely SSL clients everywhere now do the right thing?

One day a couple of months ago, I was referred to this bug report for the pyssl module in Python which identifies that it doesn’t verify server certs by default! The default SSL handler in Python doesn’t verify the certificate properly. It makes all python programs that use this without special attention vulnerable for man in the middle attacks.

So let’s look at the state of another popular language: PHP. A plain standard PHP program opens a ssl:// or tls:// stream. Unless the author of said program knows and understands these things, it too runs without verifying server certs. If a program instead decides to use the PHP/CURL binding for HTTPS or similar, it will use libcurl’s default which verifies it (as I explained above).

But not everything is gloomy. Some parts of our community have decided to do the right thing:

I was told (and proven) that Ruby now does the right thing, but I don’t know how recent that is and thus how many older Ruby programs that suffer.

The same problem existed with perl’s major HTTPS using module, the LWP, for a very long time. The perl camp however already modified LWP to do verification by default with the release of libwww-perl 6.00, released in March 2011.

Side-note: in the curl project we make it easy for everyone on the Internet to use Firefox’s excellent CA cert bundle to verify server certs by providing the Firefox CA cert collection converted to PEM – the preferred format for OpenSSL, GnuTLS and others.

Conclusion:

Even today, lots and lots of applications and scripts will remain insecure – even though they probably think they’re fairly safe when they switch to a HTTPS or SSL using protocol –  and might be subject for man-in-the-middle attacks without even being able to spot it. I think it is pretty sad.

curl performance

Benchmarks, speed, comparisons, performance. I get a lot of questions on how curl and libcurl compare against other tools or libraries, and I rarely have any specific answers as I personally basically never use or test any other tools or libraries!

This text will instead be more elaborate on how we work on libcurl, why I believe libcurl will remain the fastest alternative.

cURL

libcurl is low-level

libcurl is written in C and uses the native function calls of the operating system to perform its network operations. It features a lot of features, but when it comes to plain sending and receiving data the code paths are very short and the loops can’t be shortened or sped up by any significant amount. Based on these facts, I am confident that for simple single stream transfers you really cannot write a file transfer library that runs faster. (But yes, I believe other similarly low-level style libraries can reach the same speeds.)

When adding more complicated test cases, like doing SSL or perhaps many connections that need to be kept persistent between transfers, then of course libraries can start to differ. libcurl uses SSL libraries natively so if they are fast, so will libcurl’s SSL handling be and vice versa. Of course we also strive to provide features such as connection pooling, SSL session id reuse, DNS caching and more to make the normal and frequent use cases as fast as we possibly can. What takes time when using libcurl should be the underlying network operations, not the tricks libcurl adds to them.

event-based is the way to grow

If you plan on making an app that uses more than just a few connections, libcurl can of course still do the heavy lifting for you. You should consider taking precautions already when you do your design and make sure that you can use an event-based concept and avoid relying on select or poll for the socket handling. Using libcurl’s multi_socket API, you can go up and beyond tens of thousands of connections and still reach maximum performance. And this using basically all of the protocols libcurl supports, this is not limited to a small subset. (There are unfortunate exceptions, like for example “file://” URLs but there are completely technical reasons for this.)

Very few file transfer libraries have this direct support for event-based operations. I’ve read reports of apps that have gone up to and beyond 70,000 connections on the same host using libcurl like this. The fact that TCP only has a 16 bit field in the protocol header for “source port” of course forces users who want to try this stunt to use more than one interface as source address.

And before you ask: you cannot grow a client to that amount using any other technique than event-based using many connections in the same thread, as basically no other approach scales as well.

When handling very many connections, the mere “juggling” of the connections take time and can be done in good or bad ways. It would be interesting to one day measure exactly how good libcurl is at this.

Binding Benchmark and Comparisons

We are aware of something like 40 different bindings for libcurl to make it possible to use from just about any language you like. Lots, if not just every, language also tend to have its own native version of a transfer or at least HTTP library. For many languages, the native version is the one that is most preferred and most used in books, articles and promoted on the internet. Rarely can the native versions compete with the libcurl based ones in actual transfer performance. Because of what I mentioned above: libcurl does little extra when transferring stuff but the actual and raw transfer. Of course there still needs to be additional glue and logic to make libcurl work fine with the different languages’ own unique environments, but it is still often proven that it doesn’t make the speed gain get lost or become invisible. I’ll illustrate below with some sample environments.

Ruby comparisons

Paul Dix is a Ruby guy and he’s done a lot of work with HTTP libraries and Ruby, and he’s also done some benchmarks on libcurl-based Ruby libraries. They show that the tools built on top of libcurl run significantly faster than the native versions.

Perl comparisons

“Ivan” wrote up a benchmarking script that performs a number of transfers using three different mechanisms available to perl hackers. One of them being the official libcurl perl binding (WWW::Curl), one of them being the perl standard one called LWP. The results leaves no room for doubts: the libcurl-based version is significantly faster than the “native” alternatives.

PHP comparisons

The PHP binding for libcurl, PHP/CURL, is a popular one. In PHP the situation is possibly a bit different as they don’t have a native library that is nearly as feature complete as the libcurl binding, but they do have a native version for doing things like getting HTTP data etc. This function has been compared against PHP/CURL many times, for example Ricky’s comparisons, and Alix Axel’s comparisons. They all show that the libcurl-based alternative is faster. Exactly how much faster is of course depending on a lot of factors but I’m not going into such specific details here and now.

We miss more benchmarks!

I wish I knew about more benchmarks and comparisons of speed. If you know of others, or if you get inspired enough to write up and publish any after reading my rant here, please let me know! Not only is it fun and ego-boosting to see our project win, but I also want to learn from them and see where we’re lacking and if anyone beats us in a test, it’d be great to see what we could do to improve.

The curl and the PHP

We have a sort of symbiosis between the curl project and the PHP project, at least we in the curl project get a lot of people learning about curl the first time when they hack PHP. This happens to the extent that to a lot of people, curl is but the name of a PHP extension.

So while we can thank the PHP project for referring us a bunch of users that might not otherwise have found us, there is also quite some “friction” or perhaps better called “disagreements” between our projects and how we (don’t) interact.PHP logo

name

CURL vs libcurl vs cURL. We only ever use the funny casing cURL when referring to the cURL project. The cURL project produces curl and libcurl. curl is a command line tool and libcurl is a file transfer library.

The PHP team provides and distributes an extension they call CURL which is a libcurl binding for PHP. This naming causes a great deal of confusion to PHP users who go to the curl site only to find that it isn’t at all devoted to (just) the PHP extension but instead there’s mostly a lot of other curl stuff there!

I’ve discussed this naming issue with the PHP team on several occasions but they don’t agree with me that this causes confusion, and even if it would cause confusion they seem to be of the opinion that it doesn’t matter since the PHP users should find all their info about CURL and related matters on the PHP site and thus it doesn’t matter what the curl site shows or not. (Or something similar to that, I really don’t mean to put words into their mouths so you better ask them about this to get their real and unaltered view – see my link to an old conversation for some info.)

I tend to call it PHP/CURL just to make sure it is clear that we’re talking about the binding. This of course also confuse users since that’s not what it is called in the PHP documentation…

irony

PHP themselves recognize the problem of related projects borrowing the name, so they forbid derivate projects to include “PHP” in their names. Clearly stated in paragraph 4 of their license.

versions

The binary build of PHP for windows have libcurl built in statically with the curl extension code, so people can’t easily replace the libcurl version used by PHP. And in general, Windows people using open source are much less likely to ever build anything on their own in my experience.

PHP 5.2.6 was released on May 1st 2008 still has libcurl 7.16.0 built into the Windows version. That libcurl version was released in October 2006 and right now we have released eight (8) releases after that one. All of them including many bug fixes. This is more than slightly annoying.

support

This isn’t anyone’s fault but… there really aren’t many PHP people who are involved or care about the libcurl binding so those who have PHP/CURL problems tend to ask questions on the curl-and-php mailing list and in the #curl IRC channel but there aren’t any PHP insiders around in those areas to answer PHP questions…

development

Is it just my imagination or isn’t there a lot of PHP users that have asked for the same features in the PHP libcurl binding for a long time by now, but really very few actually step forward and make a difference? So these features remain unfixed and not added. This is even “just” a binding, nothing of the really hard work is done in the binding itself… It might just be me and my head, but the ratio for doers/plain users in the PHP world seems to be exceptionally low in comparison to many other open source areas I see. Of course this is tainted by me only really seing the PHP/CURL side of the PHP world.

future

I have no reason to expect anything to change, nor do I know how I can make anything of this change on my own so I assume things will just continue working exactly like this in the future as well…

WordPress quirks and edits

There’s no secret I’ve had my share of gripes with WordPress and here comes two more:

I can’t upload images at the moment! I run the “plain” wordpress package in Debian testing and when I try to upload an image using the fancy new ajax way in 2.5, it just sits there for a while and it seems it receives the file but I don’t get the UI up that I believe I should get when the upload is completed… so I can’t confirm the upload etc so it instead it gets discarded!

I’m suffering a bit from trackback spam so I installed a plugin named Trackback Validator to help me reduce the manual work of denying them. It seems to work rather well so far in that I now no longer have to mark very many comments (trackbacks appear as comments within WordPress) at all, but the annoying part is that even though the validator unvalidates the trackbacks I still get information mails sent out to me about them! I’ve now also enabled the Akismet plugin so let’s see what happens. Of course simply disabling trackbacks is an option that I’ll use if this doesn’t work good enough.

A funny side-effect with installing and enabling Akismet was that all of a suddent I could access comments previously marked as spam, and thus I could undo the damages from my accidental mark-as-spam-hiccup the other day!

While playing around with plugins, I also installed a gravatar plugin that shows gravatar-images for users on comments, and I installed a plugin that will automatically set my timezone correctly even when DST changes – which WordPress can’t do by itself!

Then all of a sudden when I poked around (too much) I managed to somehow ruin the background image I use a the top of all pages on my blog. Somewhat I got a gradient there instead, which indeed is what the theme supports (the theme I use is of course a standard one but I have done some minor edits of it). Took me a while to manage to get rid of the gradient and get back image back… I had to resort to editing the PHP file for the theme!