Category Archives: Open Source

Open Source, Free Software, and similar

curl + hackerone = TRUE

There seems to be no end to updated posts about bug bounties in the curl project these days. Not long ago I mentioned the then new program that sadly enough was cancelled only a few months after its birth.

Now we are back with a new and refreshed bug bounty program! The curl bug bounty program reborn.

This new program, which hopefully will manage to survive a while, is setup in cooperation with the major bug bounty player out there: hackerone.

Basic rules

If you find or suspect a security related issue in curl or libcurl, report it! (and don’t speak about it in public at all until an agreed future date.)

You’re entitled to ask for a bounty for all and every valid and confirmed security problem that wasn’t already reported and that exists in the latest public release.

The curl security team will then assess the report and the problem and will then reward money depending on bug severity and other details.

Where does the money come from?

We intend to use funds and money from wherever we can. The Hackerone Internet Bug Bounty program helps us, donations collected over at opencollective will be used as well as dedicated company sponsorships.

We will of course also greatly appreciate any direct sponsorships from companies for this program. You can help curl getting even better by adding funds to the bounty program and help us reward hard-working researchers.

Why bounties at all?

We compete for the security researchers’ time and attention with other projects, both open and proprietary. The projects that can help put food on these researchers’ tables might have a better chance of getting them to use their tools, time, skills and fingers to find our problems instead of someone else’s.

Finding and disclosing security problems can be very time and resource consuming. We want to make it less likely that people give up their attempts before they find anything. We can help full and part time security engineers sustain their livelihood by paying for the fruits of their labor. At least a little bit.

Only released code?

The state of the code repository in git is not subject for bounties. We need to allow developers to do mistakes and to experiment a little in the git repository, while we expect and want every actual public release to be free from security vulnerabilities.

So yes, the obvious downside with this is that someone could spot an issue in git and decide not to report it since it doesn’t give any money and hope that the flaw will linger around and ship in the release – and then reported it and claim reward money. I think we just have to trust that this will not be a standard practice and if we in fact notice that someone tries to exploit the bounty in this manner, we can consider counter-measures then.

How about money for the patches?

There’s of course always a discussion as to why we should pay anyone for bugs and then why just pay for reported security problems and not for heroes who authored the code in the first place and neither for the good people who write the patches to fix the reported issues. Those are valid questions and we would of course rather pay every contributor a lot of money, but we don’t have the funds for that. And getting funding for this kind of dedicated bug bounties seem to be doable, where as a generic pay contributors fund is trickier both to attract money but it is also really hard to distribute in an open project of curl’s nature.

How much money?

At the start of this program the award amounts are as following. We reward up to this amount of money for vulnerabilities of the following security levels:

Critical: 2,000 USD
High: 1,500 USD
Medium: 1,000 USD
Low: 500 USD

Depending on how things go, how fast we drain the fund and how much companies help us refill, the amounts may change over time.

Found a security flaw?

Report it!

Test servers for curl

curl supports some twenty-three protocols (depending on exactly how you count).

In order to properly test and verify curl’s implementations of each of these protocols, we have a test suite. In the test suite we have a set of handcrafted servers that speak the server-side of these protocols. The more used a protocol is, the more important it is to have it thoroughly tested.

We believe in having test servers that are “stupid” and that offer buttons, levers and thresholds for us to control and manipulate how they act and how they respond for testing purposes. The control of what to send should be dictated as much as possible by the test case description file. If we want a server to send back a slightly broken protocol sequence to check how curl supports that, the server must be open for this.

In order to do this with a large degree of freedom and without restrictions, we’ve found that using “real” server software for this purpose is usually not good enough. Testing the broken and bad cases are typically not easily done then. Actual server software tries hard to do the right thing and obey standards and protocols, while we rather don’t want the server to make any decisions by itself at all but just send exactly the bytes we ask it to. Simply put.

Of course we don’t always get what we want and some of these protocols are fairly complicated which offer challenges in sticking to this policy all the way. Then we need to be pragmatic and go with what’s available and what we can make work. Having test cases run against a real server is still better than no test cases at all.

Now SOCKS

“SOCKS is an Internet protocol that exchanges network packets between a client and server through a proxy server. Practically, a SOCKS server proxies TCP connections to an arbitrary IP address, and provides a means for UDP packets to be forwarded.

(according to Wikipedia)

Recently we fixed a bug in how curl sends credentials to a SOCKS5 proxy as it turned out the protocol itself only supports user name and password length of 255 bytes each, while curl normally has no such limits and could pass on credentials with virtually infinite lengths. OK, that was silly and we fixed the bug. Now curl will properly return an error if you try such long credentials with your SOCKS5 proxy.

As a general rule, fixing a bug should mean adding at least one new test case, right? Up to this time we had been testing the curl SOCKS support by firing up an ssh client and having that setup a SOCKS proxy that connects to the other test servers.

curl -> ssh with SOCKS proxy -> test server

Since this setup doesn’t support SOCKS5 authentication, it turned out complicated to add a test case to verify that this bug was actually fixed.

This test problem was fixed by the introduction of a newly written SOCKS proxy server dedicated for the curl test suite (which I simply named socksd). It does the basic SOCKS4 and SOCKS5 protocol logic and also supports a range of commands to control how it behaves and what it allows so that we can now write test cases against this server and ask the server to misbehave or otherwise require fun things so that we can make really sure curl supports those cases as well.

It also has the additional bonus that it works without ssh being present so it will be able to run on more systems and thus the SOCKS code in curl will now be tested more widely than before.

curl -> socksd -> test server

Going forward, we should also be able to create even more SOCKS tests with this and make sure to get even better SOCKS test coverage.

no more global dns cache in curl

In January 2002, we added support for a global DNS cache in libcurl. All transfers set to use it would share and use the same global cache.

We rather quickly realized that having a global cache without locking was error-prone and not really advisable, so already in March 2004 we added comments in the header file suggesting that users should not use this option.

It remained in the code and time passed.

In the autumn of 2018, another fourteen years later, we finally addressed the issue when we announced a plan for this options deprecation. We announced a date for when it would become deprecated and disabled in code (7.62.0), and then six months later if no major incidents or outcries would occur, we said we would delete the code completely.

That time has now arrived. All code supporting a global DNS cache in curl has been removed. Any libcurl-using program that sets this option from now on will simply not get a global cache and instead proceed with the default handle-oriented cache, and the documentation is updated to clearly indicate that this is the case. This change will ship in curl 7.65.0 due to be released in May 2019 (merged in this commit).

If a program still uses this option, the only really noticeable effect should be a slightly worse name resolving performance, assuming the global cache had any point previously.

Programs that want to continue to have a DNS cache shared between multiple handles should use the share interface, which allows shared DNS cache and more – with locking. This API has been offered by libcurl since 2003.


curl says bye bye to pipelining

HTTP/1.1 Pipelining is the protocol feature where the client sends off a second HTTP/1.1 request already before the answer to the previous request has arrived (completely) from the server. It is defined in the original HTTP/1.1 spec and is a way to avoid waiting times. To reduce latency.

HTTP/1.1 Pipelining was badly supported by curl for a long time in the sense that we had a series of known bugs and it was a fragile feature without enough tests. Also, pipelining is fairly tricky to debug due to the timing sensitivity so very often enabling debug outputs or similar completely changes the nature of the behavior and things are not reproducing anymore!

HTTP pipelining was never enabled by default by the large desktop browsers due to all the issues with it, like broken server implementations and the likes. Both Firefox and Chrome dropped pipelining support entirely since a long time back now. curl did in fact over time become more and more lonely in supporting pipelining.

The bad state of HTTP pipelining was a primary driving factor behind HTTP/2 and its multiplexing feature. HTTP/2 multiplexing is truly and really “pipelining done right”. It is way more solid, practical and solves the use case in a better way with better performance and fewer downsides and problems. (curl enables multiplexing by default since 7.62.0.)

In 2019, pipelining should be abandoned and HTTP/2 should be used instead.

Starting with this commit, to be shipped in release 7.65.0, curl no longer has any code that supports HTTP/1.1 pipelining. It has been disabled in the code since 7.62.0 already so applications and users that use a recent version already should not notice any difference.

Pipelining was always offered on a best-effort basis and there was never any guarantee that requests would actually be pipelined, so we can remove this feature entirely without breaking API or ABI promises. Applications that ask libcurl to use pipelining can still do that, it just won’t have any effect.

curl up 2019 is over

(I will update this blog post with more links to videos and PDFs to presentations as they get published, so come back later in case your favorite isn’t linked already.)

The third curl developers conference, curl up 2019, is how history. We gathered in the lovely Charles University in central Prague where we sat down in an excellent class room. After the HTTP symposium on the Friday, we spent the weekend to dive in deeper in protocols and curl details.

I started off the Saturday by The state of the curl project (youtube). An overview of how we’re doing right now in terms of stats, graphs and numbers from different aspects and then something about what we’ve done the last year and a quick look at what’s not do good and what we could work on going forward.

James Fuller took the next session and his Newbie guide to contributing to libcurl presentation. Things to consider and general best practices to that could make your first steps into the project more likely to be pleasant!

Long term curl hacker Dan Fandrich (also known as “Daniel two” out of the three Daniels we have among our top committers) followed up with Writing an effective curl test where the detailed what different tests we have in curl, what they’re for and a little about how to write such tests.

Sign seen at the curl up dinner reception Friday night

After that I was back behind the desk in the classroom that we used for this event and I talked The Deprecation of legacy crap (Youtube). How and why we are removing things, some things we are removing and will soon remove and finally a little explainer on our new concept and handling of “experimental” features.

Igor Chubin then explained his new protect for us: curlator: a framework for console services (Youtube). It’s a way and tooling that makes it easier to provide access to shell and console oriented services over the web, using curl.

Me again. Governance, money in the curl project and someone offering commercial support (Youtube) was a presentation about how we intend for the project to join a legal entity SFC, and a little about money we have, what to spend it on and how I feel it is good to keep the project separate from any commercial support ventures any of us might do!

While the list above might seems like more than enough, the day wasn’t over. Christian Schmitz also did his presentation on Using SSL root certificate from Mac/Windows.

Our local hero organizer James Fuller then spoiled us completely when we got around to have dinner at a monastery with beer brewing monks and excellent food. Good food, good company and curl related dinner subjects. That’s almost heaven defined!

Sunday

Daylight saving time morning and you could tell. I’m sure it was not at all related to the beers from the night before…

James Fuller fired off the day by talking to us about Curlpipe (github), a DSL for building http execution pipelines.

The class room we used for the curl up presentations and discussions during Saturday and Sunday.

Robin Marx then put in the next gear and entertained us another hour with a protocol deep dive titled HTTP/3 (QUIC): the details (slides). For me personally this was a exactly what I needed as Robin clearly has kept up with more details and specifics in the QUIC and HTTP/3 protocols specifications than I’ve managed and his talk help the rest of the room get at least little bit more in sync with current development.

Jakub Nesetril and Lukáš Linhart from Apiary then talked us through what they’re doing and thinking around web based APIs and how they and their customers use curl: Real World curl usage at Apiary.

Then I was up again and I got to explain to my fellow curl hackers about HTTP/3 in curl. Internal architecture, 3rd party libs and APIs.

Jakub Klímek explained to us in very clear terms about current and existing problems in his talk IRIs and IDNs: Problems of non-ASCII countries. Some of the problems involve curl and while most of them have their clear explanations, I think we have to lessons to learn from this: URLs are still as messy and undocumented as ever before and that we might have some issues to fix in this area in curl.

To bring my fellow up to speed on the details of the new API introduced the last year I then made a presentation called The new URL API.

Clearly overdoing it for a single weekend, I then got the honors of doing the last presentation of curl up 2019 and for an audience that were about to die from exhaustion I talked Internals. A walk-through of the architecture and what libcurl does when doing a transfer.

Summary

I ended up doing seven presentations during this single weekend. Not all of them stellar or delivered with elegance but I hope they were still valuable to some. I did not steal someone else’s time slot as I would gladly have given up time if we had other speakers wanted to say something. Let’s aim for more non-Daniel talkers next time!

A weekend like this is such a boost for inspiration, for morale and for my ego. All the friendly faces with the encouraging and appreciating comments will keep me going for a long time after this.

Thank you to our awesome and lovely event sponsors – shown in the curl up logo below! Without you, this sort of happening would not happen.

curl up 2020

I will of course want to see another curl up next year. There are no plans yet and we don’t know where to host. I think it is valuable to move it around but I think it is even more valuable that we have a friend on the ground in that particular city to help us out. Once this year’s event has sunken in properly and a month or two has passed, the case for and organization of next year’s conference will commence. Stay tuned, and if you want to help hosting us do let me know!


curl goes 180

The 180th public curl release is a patch release: 7.64.1. There’s been 49 days since 7.64.0 shipped. The first release since our 21st birthday last week. (Full changelog.)

Numbers

the 180th release
2 changes
49 days (total: 7,677)

116 bug fixes (total: 5,029)
184 commits (total: 24,111)
0 new public libcurl functions (total: 80)
2 new curl_easy_setopt() options (total: 267)

1 new curl command line option (total: 221)
49 contributors, 25 new (total: 1,929)
25 authors, 10 new (total: 669)
0 security fixes (total: 87)

News!

This is a patch release but we still managed to introduce some fun news in this version. We ship brand new alt-svc support which we encourage keen and curious users to enable in their builds and test out. We strongly discourage anyone from using that feature in production as we reserve ourselves the right to change it before removing the EXPERIMENTAL label. As mentioned in the blog post linked above, alt-svc is the official way to bootstrap into HTTP/3 so this is a fundamental stepping stone for supporting that protocol version in a future curl.

We also introduced brand new support for the Amiga-specific TLS backend AmiSSL, which is a port of OpenSSL to that platform.

Bug-fixes

With over a hundred bug-fixes landed in this period there are a lot to choose from, but some of the most most fun and important ones from my point of view include the following.

connection check crash

This was a rather bad regression that occasionally caused crashes when libcurl would scan its connection cache for a live connection to reuse. Most likely to trigger with the Schannel backend.

connection sharing crash

The example source code that uses a shared connection cache among many threads was another crash regression. It turned out a thread could accidentally get hold of a connection already in private use by another thread…

“Expire in…” logs removed

Having the harmless but annoying text there was a mistake to begin with. It was a debug-only line that accidentally was pushed and not discovered in time. It’s history now.

curl -M manual removed

The tutorial-like manual piece that was previously included in the -M (or –manual) built-in command documentation, is no longer included. The output shown is now just the curl.1 man page. The reason for this is that the tutorial has gone a bit stale and there is now better updated and better explained documentation elsewhere. Primarily perhaps in everything curl. The online version of that document will eventually also be removed.

TLS terminology cleanups

We now refer to the Windows TLS backend as “Schannel” and the Apple macOS one as “Secure Transport” in all curl code and documentation. Those are the official names and those are the names people in general know them as. No more use of the former names that sometimes made people confused.

Shaving off bytes and mallocs

We rearranged the layout of a few structs and changed to using bitfields instead of booleans and more. This way, we managed to shrink two of the primary internal structs by 5% and 11% with no functionality change or loss.

Similarly, we removed a few mallocs, even in the common code path, so now the number of allocs for my regular test download of 4GB data over a localhost HTTP server claims fewer allocs than ever before.

Next?

We estimate that there will be a 7.65.0 release to ship 56 days from now. Then we will remove some deprecated features, perhaps add something new and quite surely fix a whole bunch of more bugs. Who know what fun we will come up with at curl up this coming weekend?

Keep reporting. Keep posting pull-requests. We love them and you!

Brand new sticker shipment for curl up from our beloved sticker sponsor!


Happy 21st, curl!

Another year has passed. The curl project is now 21 years old.

I think we can now say that it is a grown-up in most aspects. What have we accomplished in the project in these 21 years?

We’ve done 179 releases. Number 180 is just a week away.

We estimate that there are now roughly 6 billion curl installations world-wide. In phones, computers, TVs, cars, video games etc. With 4 billion internet users, that’s like 1.5 curl installation per Internet connected human on earth

669 persons have authored patches that was merged.

The curl source code now consists of 160,000 lines of code made in over 24,000 commits.

1,927 persons have helped out so far. With code, bug reports, advice, help and more.

The curl repository also hosts 429 man pages with a total of 36,900 lines of documentation. That count doesn’t even include the separate project Everything curl which is a dedicated book on curl with an additional 10,165 lines.

In this time we have logged more than 4,900 bug-fixes, out of which 87 were security related problems.

We keep doing more and more CI builds, auto-builds, fuzzing and static code analyzing on our code day-to-day and non-stop. Each commit is now built and tested in over 50 different builds and environments and are checked by at least four different static code analyzers, spending upwards 20-25 CPU hours per commit.

We have had 2 curl developer conferences, with the third curl up about to happen this coming weekend in Prague, Czech Republic.

The curl project was created by me and I’m still the lead developer. Up until today, almost 60% of the commits in the project have my name on them. I have done most commits per month in the project every single month since August 2015, and in 186 months out of the 232 months for which we have logged data.

Looking for the Refresh header

The other day someone filed a bug on curl that we don’t support redirects with the Refresh header. This took me down a rabbit hole of Refresh header research and I’ve returned to share with you what I learned down there.

tl;dr Refresh is not a standard HTTP header.

As you know, an HTTP redirect is specified to use a 3xx response code and a Location: header to point out the new URL (I use the term URL here but you know what I mean). This has been the case since RFC 1945 (HTTP/1.0). According to an old mail from Roy T Fielding (dated June 1996), Refresh “didn’t make it” into that spec. That was the first “real” HTTP specification. (And the HTTP we used before 1.0 didn’t even have headers!)

The little detail that it never made it into the 1.0 spec or any later one, doesn’t seem to have affected the browsers. Still today, browsers keep supporting the Refresh header as a sort of Location: replacement even though it seems to never have been present in a HTTP spec.

In good company

curl is not the only HTTP library that doesn’t support this non-standard header. The popular python library requests apparently doesn’t according to this bug from 2017, and another bug was filed about it already back in 2011 but it was just closed as “old” in 2014.

I’ve found no support in wget or wget2 either for this header.

I didn’t do any further extensive search for other toolkits’ support, but it seems that the browsers are fairly alone in supporting this header.

How common is the the Refresh header?

I decided to make an attempt to figure out, and for this venture I used the Rapid7 data trove. The method that data is collected with may not be the best – it scans the IPv4 address range and sends a HTTP request to each TCP port 80, setting the IP address in the Host: header. The result of that scan is 52+ million HTTP responses from different and current HTTP origins. (Exactly 52254873 responses in my 59GB data dump, dated end of February 2019).

Results from my scans

  • Location is used in 18.49% of the responses
  • Refresh is used in 0.01738% of the responses (exactly 9080 responses featured them)
  • Location is thus used 1064 times more often than Refresh
  • In 35% of the cases when Refresh is used, Location is also used
  • curl thus handles 99.9939% of the redirects in this test

Additional notes

  • When Refresh is the only redirect header, the response code is usually 200 (with 404 being the second most)
  • When both headers are used, the response code is almost always 30x
  • When both are used, it is common to redirect to the same target and it is also common for the Refresh header value to only contain a number (for the number of seconds until “refresh”).

Refresh from HTML content

Redirects can also be done by meta tags in HTML and sending the refresh that way, but I have not investigated how common as that isn’t strictly speaking HTTP so it is outside of my research (and interest) here.

In use, not documented, not in the spec

Just another undocumented corner of the web.

When I posted about these findings on the HTTPbis mailing list, it was pointed out that WHATWG mentions this header in their iana page. I say mention because calling that documenting would be a stretch…

It is not at all clear exactly what the header is supposed to do and it is not documented anywhere. It’s not exactly a redirect, but almost?

Will/should curl support it?

A decision hasn’t been made about it yet. With such a very low use frequency and since we’ve managed fine without support for it so long, maybe we can just maintain the situation and instead argue that we should just completely deprecate this header use from the web?

Updates

After this post first went live, I got some further feedback and data that are relevant and interesting.

  • Yoav Wiess created a patch for Chrome to count how often they see this header used in real life.
  • Eric Lawrence pointed out that IE had several incompatibilities in its Refresh parser back in the day.
  • Boris pointed out (in the comments below) the WHATWG documented steps for handling the header.
  • The use of <meta> tag refresh in contents is fairly high. The Chrome counter says almost 4% of page loads!