xCurl

It is often said that Imitation is the Sincerest Form of Flattery.

Also, remember libcrurl? That was the name of the thing Google once planned to do: reimplement the libcurl API on top of their Chrome networking library. Flattery or not, it never went anywhere.

The other day I received an email asking me about details regarding something called xCurl. Not having a clue what that was, a quick search soon had me enlightened.

xCurl is, using their own words, a Microsoft Game Development Kit (GDK) compliant implementation of the libCurl API.

A Frankencurl

The article I link to above describes how xCurl differs from libcurl:

xCurl differs from libCurl in that xCurl is implemented on top of WinHttp and automatically follows all the extra Microsoft Game Development Kit (GDK) requirements and best practices. While libCurl itself doesn’t meet the security requirements for the Microsoft Game Development Kit (GDK) console platform, use xCurl to maintain your same libCurl HTTP implementation across all platforms while changing only one header include and library linkage.

I don’t know anything about WinHttp, but since that is an HTTP API I can only presume that making libcurl use that instead of plain old sockets has to mean a pretty large surgery and code change. I also have no idea what the mentioned “security requirements” might be. I’m not very familiar with Windows internals nor with their game development setups.

The article then goes on to describe with some detail exactly which libcurl options that work, and which don’t and what libcurl build options that were used when xCurl was built. No DoH, no proxy support, no cookies etc.

The provided functionality is certainly a very stripped down and limited version of the libcurl API. A fun detail is that they quite bluntly just link to the libcurl API documentation to describe how xCurl works. It is easy and convenient of course, and it will certainly make xCurl “forced” to stick to the libcurl behavior

With large invasive changes of this kind we can certainly hope that the team making it has invested time and spent serious effort on additional testing, both before release and ongoing.

Source code?

I have not been able to figure out how to download xCurl in any form, and since I can’t find the source code I cannot really get a grip of exactly how much and how invasive Microsoft has patched this. They have not been in touch or communicated about this work of theirs to anyone in the curl project.

Therefore, I also cannot say which libcurl version this is based on – as there is no telling of that on the page describing xCurl.

The email that triggered me to crawl down this rabbit hole included a copyright file that seems to originate from an xCurl package, and that includes the curl license. The curl license file has the specific detail that it shows the copyright year range at the top and this file said

Copyright (c) 1996 - 2020, Daniel Stenberg, daniel@haxx.se, and many contributors, see the THANKS file.

It might indicate that they use a libcurl from a few years back. Only might, because it is quite common among users of libcurl to “forget” (sometimes I’m sure on purpose) to update this copyright range even when they otherwise upgrade the source code. This makes the year range a rather weak evidence of the actual age of the libcurl code this is based on.

Updates

curl (including libcurl) ships a new version at least once every eight weeks. We merge bugfixes at a rate of around three bugfixes per day. Keeping a heavily modified libcurl version in sync with the latest curl releases is hard work.

Of course, since they deliberately limit the scope of the functionality of their clone, lots of upstream changes in curl will not affect xCurl users.

License

curl is licensed under… the curl license! It is an MIT license that I was unclever enough to slightly modify many years ago. The changes are enough for organizations such as SPDX to consider it a separate one: curl. I normally still say that curl is MIT licensed because the changes are minuscule and do not change the spirit of the license.

The curl license of course allows Microsoft or anyone else do to this kind of stunt and they don’t even have to provide the source code for their changes or the final product and they don’t have to ask or tell anyone:

Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.

I once picked this license for curl exactly because it allows this. Sure it might sometimes then make people do things in secret that they never contribute back, and we miss out on possible improvements because of that, but I think the more important property is that no company feels scared or restricted as to when and where they can use this code. A license designed for maximum adoption.

I have always had the belief that it is our relentless update scheme and never-ending flood of bugfixes that is what will keep users wanting to use the real thing and avoid maintaining long-running external patches. There will of course always be exceptions to that.

Follow-up

Forensics done by users who installed this indicate that this xCurl is based on libcurl 7.69..x. We removed a define from the headers in 7.70.0 (CURL_VERSION_ESNI) that this package still has. It also has the CURLOPT_MAIL_RCPT_ALLLOWFAILS define, added in 7.69.0.

curl 7.69.1 was released on March 11, 2020. It has 40 known vulnerabilities, and we have logged 3,566 bugfixes since then. Of course not all of any of those affect xCurl.

“you have hacked into my devices”

I’ve shown you email examples many times before. Today I received this. I don’t know this person. Clearly a troubled individual. I suspect she found my name and address somewhere and then managed to put me somewhere in the middle of the conspiracy against her.

The entire mail is written in a single paragraph and the typos are saved as they were written. It is a little hard to penetrate, but here it is:

From: Lindsay

Thank you for making it so easy for me to see that you have hacked into 3 of my very own devices throughout the year. I’m going to be holding onto all of my finds that have your name all over it and not by me because I have absolutely no reason to hack my own belongings. I will be adding this to stuff I have already for my attorney. You won’t find anything on my brand new tablet that you all have so kindly broken into and have violated my rights but have put much stress on myself as well. Maybe if you would have came and talked to me instead of hacking everything I own and fallow me to the point of a panic attack because I suffer from PTSD I might have helped you. I cannot help what my boyfriend does and doesn’t do but one ting I was told by the bank is that they would not let me talk for him so I can’t get involved. He has had his car up for pickup for months but I’m guessing that the reason they won’t pick the car up in the street right where it has sat for months waiting is because I’ve probably see every single driver that has or had fallowed me. My stress is so terrible that when I tell him to call the bank over and over again he does and doesn’t get anywhere and because of my stress over this he gets mad and beats me or choaks me. I have no where to go at the moment and I’m not going to sleep on the streets either. So if you can kindly tell the repo truck to pick up the black suv at his dad’s house in the street the bank can give them the number it would be great so I no longer have to deal with people thinking that they know the whole story. But really I am suffering horrificly. I’m not a mean person but imagine not knowing anything about what’s going on with your spouse and then finding out they didn’t pay the car payment and so being embarrassed about it try to pay for it yourself and they say no I have it only to find out that he did it for a second time and his dad actually was supposed to pay the entire thing off but instead he went down hill really fast and seeing the same exact people every day everywhere you go and you tell your spouse and they don’t believe you and start calling you wicked names like mine has and then from there every time my ptsd got worse from it happening over and over again and he says you’re a liar and he’s indenial about it and because I don’t agree with him so I get punched I get choked and now an broken with absolutely no one but God on my side.how would you feel if it was being done to you and people following you and your so angry that alls you do is yell at people anymore and come off as a mean person when I am not? I don’t own his car that he surrendered I don’t pay his bills he told me to drive it and that’s it.i trusted a liar and an abuser. I need someone to help other than my mom my attorney and eventually the news if everyone wants to be cruel to me I’m going to the news for people taking my pictures stalking me naibors across the street watching and on each side of the house and the school behind. It isn’t at all what you all think it is I want someone to help get the suv picked up not stalked. How would you feel if 5 cities were watching every single move? I am the victim all the way around and not one nabor has ever really taken the time to get to know me. I’m not at all a mean person but this is not my weight to carry. I have everyone on camera and I will have street footage pulled and from each store or gestation I go to. I don’t go anywhere anymore from this and I’m the one asking for help. Their was one guy who was trying to help me get in touch with the tow truck guy and I haven’t seen him since and his name is Antonio. He was going to help me. I have been trying to to the right thing from the start and yet you all took pleasure in doing rotten mean things to me and laughing about it. I want one person to come help me since I can’t talk with the bank to get his suv picked up and I won’t press charges on the person that helps nor onthe tow truck guy either.

I have not replied.

URL parser performance

URLs is a dear subject of mine on this blog, as readers might have noticed.

“URL” is this mythical concept of a string that identifies a resource online and yet there is no established standard for its syntax. There are instead multiple ones out of which one is on purpose “moving” so it never actually makes up its mind but instead keeps changing.

This then leads to there being basically no two URL parsers that treat URLs the same, to the extent that mixing parsers is considered a security risk.

The standards

The browsers have established their WHATWG URL Specification as a “living document”, saying how browsers should parse URLs, gradually taking steps away from the earlier established RFC 3986 and RFC 3987 attempts.

The WHATWG standard keeps changing and the world that tries to stick to RFC 3986 still needs to sometimes get and adapt to WHATWG influences in order to interoperate with the browser-centric part of the web. It leaves URL parsing and “URL syntax” everywhere in a sorry state.

curl

In the curl project we decided in 2018 to help mitigate the mixed URL parser problem by adding a URL parser API so that applications that use libcurl can use the same parser for all its URL parser needs and thus avoid the dangerous mixing part.

The libcurl API for this purpose is designed to let users parse URLs, to extract individual components, to set/change individual components and finally to extract a normalized URL if wanted. Including some URL encoding/decoding and IDN support.

trurl

Thanks to the availability and functionality of the public libcurl URL API, we could build and ship the separate trurl tool earlier this year.

Ada

Some time ago I was made aware of an effort to (primarily) write a new URL parser for node js – although the parser is stand-alone and can be used by anyone else who wants to: The Ada URL Parser. The two primary developers behind this effort, Yagiz Nizipli and Daniel Lemire figured out that node does a large amount of URL parsing so by speeding up this parser alone it would apparently have a general performance impact.

Ada is C++ project designed to parse WHATWG URLs and the first time I was in contact with Yagiz he of course mentioned how much faster their parser is compared to curl’s.

You can also see them reproduce and talk about these numbers on this node js conference presentation.

Benchmarks

Everyone who ever tried to write code faster than some other code has found themselves in a position where they need to compare. To benchmark one code set against the other. Benchmarking is an art that is close to statistics and marketing: very hard to do without letting your own biases or presumptions affect the outcome.

Speed vs the rest

After I first spoke with Yagiz, I did go back to the libcurl code to see what obvious mistakes I had done and what low hanging fruit there was to pick in order to speed things up a little. I found a few flaws that maybe did a minor difference, but in my view there are several other properties of the API that is actually more important than sheer speed:

  • non-breaking API and ABI
  • readable and maintainable code
  • sensible and consistent API
  • error codes that help users understand what the problem is

Of course, there is also the thing that if you first figure out how to parse a URL the fastest way, maybe you can work out a smoother API that works better with that parsing approach. That’s not how I went about when creating the libcurl API.

If we can maintain those properties mentioned above, I still want the parser to run as fast as possible. There is no point in being slower than necessary.

URLs vs URLs

Ada parses WHATWG URLs and libcurl parses RFC 3986 URLs. They parse URLs differently and provide different feature sets. They are not interchangeable.

In Ada’s benchmarks they have ignored the parser differences. Throw the parsers against each other, and according to all their public data since early 2023 their parser is 7-8 times faster.

700% faster really?

So how on earth can you make such a simple thing as URL parsing 700% faster? It never sat right with me when they claimed those numbers but since I had not compared them myself I trusted them. After all, they should be fairly easy to compare and they seemed clueful enough.

Until recently when I decided to reproduce their claims and see how much their numbers depends on their specific choices of URLs to parse. It taught me something.

Reproduce the numbers

In my tests, their parser is fast. It is clearly faster than the libcurl parser, and I too of course ignored the parser results since they would not be comparable anyway.

In my tests on my development machine, Ada is 1.25 – 1.8 times faster than libcurl. There is no doubt Ada is faster, just far away from the enormous difference they claim. How come?

  1. You use the input data that most favorably shows a difference
  2. You run the benchmark on a hardware for which your parser has magic hardware acceleration

I run a decently modern 13th gen Intel Core-I7 i7-13700K CPU in my development machine. It’s really fast, especially on single-thread stuff like this. On my machine, the Ada parser can parse more URLs/second than even the Ada people themselves claim, which just tells us they used slower machines to test on. Nothing wrong with that.

The Ada parser has code that is using platform specific instructions on some environments and the benchmark they decide to use when boasting about their parser was done on such a platform. An Apple m1 CPU to be specific. In most aspects except performance per watt, not a speed monster CPU.

In itself this is not wrong, but maybe a little misleading as this is far from clearly communicated.

I have a script, urlgen. that generates URLs in as many combinations as possible so that the parser’s every corner and angle are suitably exercised and verified. Many of those combination therefor illegal in subtle ways. This is the set of URLs I have thrown at the curl parser mostly, which then also might explain why this test data is the set that makes Ada least favorable (at 1.26 x the libcurl speed). Again: their parser is faster, no doubt. I have not found a test case that does not show it running faster than libcurl’s parser.

A small part of the explanation of how they are faster is of course that they do not provide the result, the individual components, in their own separately allocated strings.

Here’s a separate detailed document how I compared.

More mistakes

They also repeatably insist curl does not handle International Domain Names (IDN) correctly, which I simply cannot understand and I have not got any explanation for. curl has handled IDN since 2004. I’m guessing a mistake, an old bug or that they used a curl build without IDN support.

Size

I would think a primary argument against using Ada vs libcurl’s parser is its size and code. Not that I believe that there are many situations where users are actually selecting between these two.

Ada header and source files are 22,774 lines of C++

libcurl URL API header and source files are 2,103 lines of C.

Comparing the code sizes like this is a little unfair since Ada has its own IDN management code included, which libcurl does not, and that part comes with several huge tables and more.

Improving libcurl?

I am sure there is more that can be done to speed up the libcurl URL parser, but there is also the case of diminishing returns. I think it is pretty fast already. On Ada’s test case using 100K URLs from Wikipedia, libcurl parses them at an average of 178 nanoseconds per URL on my machine. More than 5.6 million real world URLs parsed per second per core.

This, while also storing each URL component in a separate allocation after each parse, and also returning an error code that helps identifying the problem if the URL fails to parse. With an established and well-documented API that has been working since 2018 .

The hardware specific magic Ada uses can possibly be used by libcurl too. Maybe someone can try that out one day.

I think we have other areas in libcurl where work and effort are better spent right now.

curl on 100 operating systems

In a recent pull-request for curl, I clarified to the contributor that their change would only be accepted and merged into curl’s git code repository if they made sure that the change was done in a way so that it did not break (testing) for and on legacy platforms.

In that thread, I could almost feel how the contributor squirmed as this requirement made their work harder. Not by much, but harder no less.

I insisted that since curl at that point (and still does) already supports 32 bit time_t types, changes in this area should maintain that functionality. Even if 32 bit time_t is of limited use already and will be even more limited as we rush toward the year 2038. Quite a large number of legacy platforms are still stuck on the 32 bit version.

Why do I care so much about old legacy crap?

Nobody asked me exactly that using those words. I am paraphrasing what I suspect some contributors think at times when I ask them to do additional changes to pull requests. To make their changes complete.

It is not so much about the legacy systems. It is much more about sticking to our promises and not breaking things if we don’t have to.

Partly stability and promises

In the curl project we work relentlessly to maintain ABI and API stability and compatibility. You can upgrade your libcurl using application from the mid 2000s to the latest libcurl – without recompiling the application – and it still works the same. You can run your unmodified scripts you wrote in the early 2000s with the latest curl release today – and it is almost guaranteed that it works exactly the same way as it did back then.

This is more than a party trick and a snappy line to use in the sales brochures.

This is the very core of curl and libcurl and a foundational principle of what we ship: you can trust us. You can lean on us. Your application’s Internet transfer needs are in safe hands and you can be sure that even if we occasionally ship bugs, we provide updates that you can switch over to without the normal kinds of upgrade pains software so often comes with. In a never-ending fashion.

Also of course. Why break something that is already working fine?

Partly user numbers don’t matter

Users do matter, but what I mean in this subtitle is that the number of users on a particular platform is rarely a reason or motivator for working on supporting it and making things work there. That is not how things tend to work.

What matters is who is doing the work and if the work is getting done. If we have contributors around that keep making sure curl works on a certain platform, then curl will keep running on that platform even if they are said to have very few users. Those users don’t maintain the curl code. Maintainers do.

A platform does not truly die in curl land until necessary code for it is no longer maintained – and in many cases the unmaintained code can remain functional for years. It might also take a long time until we actually find out that curl no longer works on a particular platform.

On the opposite side it can be hard to maintain a platform even if it has large amount of users if there are not enough maintainers around who are willing and knowledgeable to work on issues specific to that platform.

Partly this is how curl can be everywhere

Precisely because we keep this strong focus on building, working and running everywhere, even sometimes with rather funny and weird configurations, is an explanation to how curl and libcurl has ended up in so many different operating systems, run on so many CPU architectures and is installed in so many things. We make sure it builds and runs. And keeps doing so.

And really. Countless users and companies insist on sticking to ancient, niche or legacy platforms and there is nothing we can do about that. If we don’t have to break functionality for them, having them stick to relying on curl for transfers is oftentimes much better security-wise than almost all other (often homegrown) alternatives.

We still deprecate things

In spite of the fancy words I just used above, we do remove support for things every now and then in curl. Mostly in the terms of dropping support for specific 3rd party libraries as they dwindle away and fall off like leaves in the fall, but also in other areas.

The key is to deprecate things slowly, with care and with an open communication. This ensures that everyone (who wants to know) is aware that it is happening and can prepare, or object if the proposal seems unreasonable.

If no user can detect a changed behavior, then it is not changed.

curl is made for its users. If users want it to keep doing something, then it shall do so.

The world changes

Internet protocols and versions come and go over time.

If you bring up your curl command lines from 2002, most of them probably fail to work. Not because of curl, but because the host names and the URLs used back then no longer work.

A huge reason why a curl command line written in 2002 will not work today exactly as it was written back then is the transition from HTTP to HTTPS that has happened since then. If the site actually used TLS (or SSL) back in 2002 (which certainly was not the norm), it used a TLS protocol version that nowadays is deemed insecure and modern TLS libraries (and curl) will refuse to connect to it if it has not been updated.

That is also the reason that if you actually have a saved curl executable from 2002 somewhere and manage to run that today, it will fail to connect to modern HTTPS sites. Because of changes in the transport protocol layers, not because of changes in curl.

Credits

Top image by Sepp from Pixabay

Discussion

Hacker news

curl coasters

Durable, sturdy, washable and good-looking curl cheat sheet coasters. What more can you possibly need?

Buy them here

Tim Westermann is the creator who makes, sells and distributes these beauties. My only involvement has been to help Tim with the textual content on them.

There is no money going to me or the curl project as part of this setup. But you might become a better curl user while saving your desk at the same time.

I have a set myself. I love them. Note that they are printed on both sides – with different content.

They do not only look like PCBs, they are high-quality printed circuit boards (PCBs).

Dimensions: 10 cm x 10 cm [3.94″ x 3.94″]
Thickness: 1.6mm [0.04″]
Weight: 30g

mastering libcurl

On November 16 2023, I will do a multi-hour tutorial video on how to use libcurl. How to master it. As a follow-up to my previous video class called mastering the curl command line (at 13,000+ views right now).

This event will run as a live-stream webinar combination. You can opt to join via zoom or twitch. Zoom users can ask questions using voice, twitch viewers can do the same using the chat. For attending via Zoom you need to register, for twitch you can just show up.

It starts at 18:00 CET (09:00 PST, 17:00 UTC)

This session is part one of the planned two-part series. The amount of content is simply too much for me to deliver in a single sitting with intact quality. The second part will happen the following Monday, on November 20. Same time.

Both sessions will be recorded and will be made available for browsing after the fact.

At the time of me putting this blog post up the presentation and therefore agenda is not complete. I have also not yet figured out exactly how to do the split between the two episodes. Below you will find the planned topics that will be covered over the two episodes. Details will of course change in the final presentation.

The idea is that very little about developing with libcurl should be left out. This is the most thorough, most advanced, most in-depth libcurl video you ever saw. And it will be shock-full with source code examples. All examples that are shown in the video, are also provided stand-alone for easy browsing, copy and paste and later reading in a dedicated mastering libcurl GitHub repository.

Part one

The slides for part one.

Part two

The slides for part two.

Mastering libcurl

The project

  • Reminders about the curl project

getting it

  • installing
  • building
  • debugging
  • free support
  • paid support

API and ABI

  • compatibility
  • versions
  • the API is for C
  • header files
  • compiling libcurl programs
  • documentation

Architecture

  • C89
  • backends
  • everything is non-blocking

API fundamentals

  • content ignorant / URLs / callbacks
  • basic by default
  • global init
  • easy handles
  • easy options
  • curl_easy_setopt
  • curl_easy_perform
  • curl_easy_cleanup
  • write callback
  • multi handle
  • multi perform
  • curl_multi_info_read
  • curl_multi_cleanup
  • a multi example
  • caches
  • curl_multi_socket_action
  • curl_easy_getinfo

Puppeteering

  • verbose
    debug function
    tracing
  • curl_version / curl_version_info
  • persistent connections
  • multiplexing
  • Downloads
  • Storing downloads
  • Compression
  • Multiple downloads
  • Maximum file size
  • Resuming and ranges
  • Uploads
  • Multiple uploads
  • Transfer controls
  • Stop slow transfers
  • Rate limiting
  • Connections
  • Name resolve tricks
  • Connection timeout
  • Network interface
  • Local port number
  • Keep alive
  • Timeouts
  • Authentication
  • .netrc
  • return codes
  • –libcurl
  • post transfer meta-data
  • caches
  • some words on threads
  • error handling with libcurl

Share API

  • sharing data between easy handles

TLS

  • enable TLS
  • ciphers
  • verifying server certificates
    custom checks
  • client certificates
  • on TLS backends
  • SSLKEYLOGFILE

Proxies

  • Proxy type
  • HTTP proxy
  • SOCKS proxy (tor)
  • Authentication
  • HTTPS proxy
  • Proxy environment variables
  • Proxy headers

HTTP

  • Ranges
  • HTTP versions
  • Conditionals
  • HTTP POST
    data with callback
  • Multipart formpost
  • Redirects
  • Modify the HTTP request
  • HTTP PUT
  • Cookies
  • Alternative Services
  • HSTS
  • HTTP/2
  • HTTP/3

HTTP header API

  • get specific header field after transfer
  • iterate over many headers

URL API

  • Parse a URL
  • extract components
  • update components
  • URL encoding/decoding
  • IDN encoding/decoding
  • redirects

WebSocket

  • just a quickie, I did a separate websocket video recently
    https://youtu.be/NLIhd0wYO24