Tag Archives: cURL and libcurl

Scalable application layer transfers

At FSCONS 2010 I had the pleasure to do a talk about how to make your client-side networking applications really scale when upping the number of simultaneous connections. Including some details that libcurl will support you all the way!

My talk was named “Scalable application layer transfers” and the slides from it is available online. See below. Hopefully the video recording of it will appear later and I’ll post a  follow-up then. A little extra bonus material as background would be my poll vs select vs event-based article.

As I mentioned in a previous post, the room was shock full when I started preparing my equipment for the talk since the session before me was a keynote, but by the time I actually starter presenting there were only the limited set of hardcore geeks left.

In the FSCONS program there were several talks over the weekend about women in FOSS and so on, while I on the other hand certainly only contributed to enforcing the stereotypes by being white, male, middle-aged, very techy and I delivered my two speeches for audiences in which I believe not a single woman attended. Whether I am part of the problem or the solution we can discuss in a separate post later on… 🙂

Living With Open Source

.SEAs a session during the Internetdagarna conference (orginized by .SE), Björn Stenberg, Daniel Melin and I joined up to talk about open source with the title “Living With Open Source” (“Att Leva med Öppen Källkod” in the language of the brave: Swedish) on October 27. We did a 90 minute session split up between the three of us. The session was in Swedish and it was recorded so I expect that it will be made available online soon for those who are curious but didn’t attend.

Bjorn Stenberg during "att leva med Öppen kallkod"

Björn (on the picture above) started off by talking about how to work with Open Source as a user when using Open Source components. How to deal with changes, sending upstream, the cost of keeping changes private etc.

Talare - Att leva med öppen källkodDaniel Melin continued and talked about open source licensing. It is quite clearly an area that people find tricky and mysterious, judging from the many questions that followed. I think large parts of the audience wasn’t very advanced or well versed into open source details so then of course there is a lot to learn and to talk about. I think we all felt that we tried to cover quite a lot that together with the questions was hard to fit within the given time.

I ended our triplet by talking about open source from a producer’s viewpoint, how we view things in a typical open source project and I used a lot of details and factual points from the cURL project.

The audience consisted of perhaps 50 people. We had a rather nerdy subject and we had tough competition from five other parallel sessions, with some of them featuring Internet and other local celebrities.

Over all, I think we did good. The idea that held our three talks together I think was fine, we kept the schedule pretty good, the audience seemed to enjoy it and I had a great time. And we got a really nice lunch afterwards!

git, patents, meego and android

dotse-logoAt this Tuesday afternoon, almost 100 people apparently managed to escape work and attend foss-sthlm’s fourth meeting. This time graciously sponsored by .SE who stood for the facilities, the coffee etc. Thank you .SE! Yours truly did his best to make it happen and to make sure we had a variety of talks by skilled people and I think we did good this time as well! This meeting took place at the same time the big Internetdagarna conference had their 6(!) parallel tracks in the building just next to ours, so it was also rather fierce competition for attention.

Robin with git

Robin Rosenberg started off the sessions by telling us about git and related dives into JGit, EGit, gerrit, code reviews and Eclipse. Robin is a core developer in the EGit/JGit projects. While I think I know at least a little git already, Robin provided an overlook over several different things in a good way. (It should be noted that Robin was called in very late in the game due to another talker having to drop out.)

As a side-note, I will never cease to be amazed by the habit in Java land to re-implement everything in “pure Java” instead of simply wrapping around the existing code/tools and leveraging what already exists and is stable…

Jonas with patents

Jonas Bosson spoke about the dangers with software patents and how they are not good, they’re hindering innovation and cost a lot of money for everyone involved. He also pledged the audience to join FFII to help the cause. You can tell Jonas is quite committed to this subject and really believes in this! And quite frankly, I don’t think a lot of people in this surrounding would argue against him…

Andreas with MeegoMeeGo

Andreas Jakl, just arrived from a rainy Helsinki, then told us (in English while all the other talks were in Swedish) about how to develop stuff with Qt for Symbian, Meego or desktop using the same tools. He showed us the latest fancy GUI builder they have called Qt Quick and how they use QML to do fancy things in a fast manner. He also managed to show us the code running in simulator and on device. Quite impressive. Andreas is a very good speaker and did a very complete session. As a bonus point, he used ‘haxx.se’ as test site for demonstrating his little demo build and of course you can’t help loving him more then? 😉

Johan with Android

Android

Johan Nilsson started off just after the coffee break with educating us how you can do push stuff from your server applications to your mobile device. How it works and how to control that in various way. Johan’s presentation was into details, in a way at least I really appreciated it, and his (hand drawn on paper then scanned) graphics used in the presentation were stunning! The fact that Johan sneaked in a couple of curl command lines of course gave him bonus points in my mind! 😉

Henrik with FribidFribid

Henrik Nordström took the stage and briefed us on the status of the Fribid project, which is a very Swedish-centric project that works on implementing full Linux support for “bankid” which is a electronic identification system established by a consortium of Swedish banks and is used by a wide range of authorities and organizations these days. The existing Linux client is poor (and hard to get working right), closed source, saves data encrypted with private hidden keys etc.

Food, Talk, Tablets

We_Tab-140-Motiv_4-3

In the restaurant after the seminaries, we gathered in a basement with beer in our glasses and chili on our plates and there was lots of open source and foss talks and we had a great time and good drinks. Two attendees brought their new tablets, which made us able to play with them and compare. the Android Samsung Galaxy Tab and the German Meego based WeTab.

Samsung Galaxy TabTo me there really wasn’t any competition. The Galaxy Tab is a slick, fast and nice device that feels like a big Android phone and it’s really usable and I could possibly see myself using it for emails and browsing. It was a while since I tried an Ipad but it gave about the same speed impression.

The WeTab however, even if it runs a modified Meego that isn’t “original” and that might suffer from bugs and what not, was a rough UI that looked far too much like my regular X Window system put in a touch device. For example, and I think this is telling, you scroll a web page down by using the right-side scroll bar and not by touching the screen in the middle and dragging it down like you’d do on IOS or Android. In fact, when I dragged down the scroll-bar like that I found it far too easy to accidentally then press one of the buttons that are always present immediately to the right of the scrollbar. Of course, the Galaxy Tab is a smaller device and also much more expensive, so perhaps those factors will bring a few users to WeTab then still.

I don’t think I’ll get a tablet anytime soon though, I just don’t see when I would use it.

Summary

I didn’t do any particular talk this time, but I felt we had a lot of good content and I can always blurb another time anyway. I really really like that we so far have managed to get lots of different speakers and I hope that we can continue to get many new speakers before we have to recycle.

It was a great afternoon and evening. All the good people and encouraging words inspire me to keep up my work and efforts on this, and I’m now aiming towards another meeting in the early 2011.

I will do another post later on when the videos from these talks go online.

curl: ten years of more code and contributors

It feels like I’ve been doing curl forever, while in fact it is “only” in its early teens. I decided to dig up some numbers on how the development have been within the project over the last decade. How have things changed during the 10 most recent years.

To spice up the numbers, I generated some graphs based on them and to then make the graphs nice and presentable I moved them all over to a single graph using my super gimp powers.

Bugs, Linus of code and contributors over time in curl

Click the image to get a full resolution version. But even the small one shows the data I wanted to illustrate: we gain contributors in roughly the same speed as we grow in lines of code. And at the same time we get roughly the same amount of bug reports over the years, apparently independently from the amount of code and contributors! Note that I separate the bug fixed bars from the bug report bars because bug fixed is the amount of bugfixes mentioned in release notes while the bug reports is the count in the web based bug tracker. As seen we fixed a lot more bugs than we get submitted in the bug tracker.

I should add that the reason the green contributor line starts out a little slow and gets a speed bump after a while, is that I changed my way of working at that point and got much better at tracking exactly all contributors. The general angle on the curve for the last 4-5 years is however what I think is the interesting part of it. How it is basically the same angle as the source code increase.

The bug report counter is merely taken from our bug tracker at sourceforge, which is a very inexact count as a very large amount of bugs are reported on the mailing lists only.

Data from the curl release table, tells that during these 10 years we’ve done 77 releases in which we fixed 1414 bugs. That’s 18.4 bug fixes per release and one release roughly every 47 days on average. 141 bug fixes per year on average.

To see how this development has changed over time I decided to compare those numbers against those for the most recent 2.5 years. During this most recent 25% of the period we’ve done releases every 60 days on average but counting 155 bug fixes per year. Which made that the average number of bug fixes per release have gone up to 26; one bugfix every 2.3 days.

A more negative interpretation on this could be that we’re only capable of a certain amount of bug fixes per time so no matter how much code we get we fix bugs at roughly the same rate. The fact that we don’t get any increasing amount of bug reports of course speaks against this theory.

What do you see in a future curl?

cURLDuring the first few years of the cURL project I used to sum up something about the past year and what I expected of the year to come at every anniversary. I stopped that at some point as I felt I didn’t have much new to contribute with, it mostly got repetitive over the years. I’ve never been the guy with the eyes fixed at the horizon and a grand plan of what to do in a future or work towards on a long-term basis. I’m more the guy with my feet firmly on the ground solving today’s problems right now as good as possible so that it stays stable and fine tomorrow.

So here follows some thoughts and reasoning around stuff that may and may not be the future of curl and libcurl. I’m as always putting my trust in you my friends to help me out with the details…

Protocols

curl’s wide protocol support has surprised even me. While I think the amount of protocols that can be supported without bending over backwards far too much is shrinking rapidly, I also think that we’ve learned that there are always more protocols that can be supported and that people will like to use with curl and libcurl. I expect more to come.

Specific protocols that are on the watch list include:

  • Gopher – I just had to mention it just because curl doesn’t currently support it, it was there once but got yanked out when we found out it didn’t work and hadn’t been doing so for a very long time without anyone noticing! The support is about to come back though thanks to a patch being worked on right now.
  • SPDY is the Google HTTP replacement (experimental) protocol that if it turns out successful seems like a very important piece for curl to grok.
  • Websockets will come to stay in your browser and it will happen soon. To remain a really powerful web scraper tool in the future with more and more sites switching to using Websockets for parts of its functionality, I believe at least some level of support might be required.
  • SCTP. Long-shot.This transport layer protocol was standardized in 2007 but really has not taken off in any significant way, even though it features a lot of benefits compared to TCP in many aspects. The now dead HTTP over SCTP internet-draft shows that HTTP can indeed be moved over to use it and it would solve a bunch of the same problems that SPDY does (and MPTCP). SCTP also has its own share of problems that hamper its adoption, primarily the lack of support in middle-boxes like NATs and routers.

Do you think there’s any particular other protocol we should support in a future?

Stability

I’m not sure if curl is particular unstable. I don’t think it suffers from any unusual amounts of bugs or anything, but I figure a decent list of stuff to consider for the future is if we make things within the project in a good way. Do we use the proper procedures so that we reach stability and produce a good product? Should we fix, change or replace certain ways or practices so that our outputs get less flaws?

I suspect these questions are hard for someone as involved and inside the project as myself. Possibly it is also hard for someone coming from the outside to convince us old-timers about anything like that too…

Interface

The current libcurl API was basically introduced when libcurl was first made a reality back in the year 2000. While the multi interface and the subsequent multi_socket API were added later on, they are both heavily influenced and affected by the previous easy interface.

Maybe there’s a much better API we should have that would make it easier to make a better library and easier to write better application using this library? It would require some serious brain-storming to come up with something, or perhaps there’s someone out there who already have thought something out?

Contributors

We’re but a few people who push commits to the master branch. How should we proceed to attract more? Should we work differently, perhaps have sub-maintainers for parts of the code etc? Can we work differently or make anything better to encourage more people to send bug reports and/or patches?

We do in fact have a very low level to entry already so I’m not aware of many additional things we can carve out to streamline this further.

Sponsors

As friends of me know, I always feel a little bit envy for projects and developers who manage to get corporate sponsors or otherwise enough paid support contracts to make them able to work on their pet projects full-time or at least part-time. I certainly have never reached that level for more than just a few months at a time. I’m not sure this is very important, as lack of funding doesn’t stop us it just slows us down and really, in an open source project like ours there aren’t really many hard deadlines. If it doesn’t get done now, it’ll get done later if enough people want it to happen.

Getting funding isn’t always easy either, since there may very well appear a company that wants to pay for a feature to get added but we don’t agree that it is a feature suitable for the project…

Organization

At times I think of the long-term future of our little project. Like what’s gonna happen the day I want to give up my chieftainship or if a company would like to step up and sponsor the project, but won’t be able to because there’s no actual legal entity for the project to sponsor or pay.

Would it make sense to have a curl organization? We could (I presume) easily join an umbrella organization like ASF. Or perhaps even more suitable the Software Freedom Conservancy. I will certainly not advocate setting up our own non-profit or anything like that.

I don’t think FSF or GNU will ever be a serious alternative since a pretty solid design goal of mine has been to avoid the *GPL licenses for curl to keep it attractive for commercial interests in a higher degree than (L)GPL licensed projects are. (This is not a license debate, so please don’t lecture me about the existence of GPL’ed commercial stuff.) I will not change license.

Feedback

One thing I’m sure of though, is that we will continue to listen to our users and the general curl and libcurl community about what to work on and what isn’t good and what we should do next. Please tell me your opinions and views on these matters!

Websockets right now

Lots of sites today have JavaScript running that connects to the site and keeps the connection open for a long time (or just doing very frequent checks if there is updated info to get). This to such an wide extent that people have started working on a better way. A way for scripts in browsers to connect to a site in a TCP-like manner to exchange messages back and forth, something that doesn’t suffer from the problems HTTP provides when (ab)used for this purpose.

Websockets is the name of the technology that has been lifted out from the HTML5 spec, taken to the IETF and is being worked on there to produce a network protocol that is basically a message-based low level protocol over TCP, designed to allow browsers to do long-living connections to servers instead of using “long-polling HTTP”, Ajax polling or other more or less ugly tricks.

Unfortunately, the name is used for both the on-the-wire protocol as well as the JavaScript API so there’s room for confusion when you read or hear this term, but I’m completely clueless about JavaScript so you can rest assured that I will only refer to the actual network protocol when I speak of Websockets!

I’m tracking the Hybi mailing list closely, and I’m summarizing this post on some of the issues that’s been debated lately. It is not an attempt to cover it all. There is a lot more to be said, both now and in the future.

The first versions of Websockets that were implemented by browsers (Chrome) was the -75 version, as written by Ian Hickson as editor and the main man behind it as it came directly from the WHATWG team (the group of browser-people that work on the HTML5 spec).

He also published a -76 version (that was implemented by Firefox) before it was taken over by the IETF’s Hybi working group, who published that same document as its -00 draft.

There are representatives for a lot of server software and from browsers like Opera, Firefox, Safari and Chrome on the list (no Internet Explorer people seem to be around).

Some Problems

The 76/00 version broke compatibility with the previous version and it also isn’t properly HTTP compatible that is wanted and needed by many.

The way Ian and WHATWG have previously worked on this (and the html5 spec?) is mostly by Ian leading and being the benevolent dictator of what’s good, what’s right and what goes into the spec. The collision with IETF’s way of working that include “everyone can have a say” and “rough consensus” has been harsh and a reason for a lot of arguments and long threads on the hybi mailing list, and I will throw out a guess that it will continue to be the source of “flames” even in the future.

Ian maintains – for example – that a hard requirement on the protocol is that it needs to be “trivial to implement for amateurs”, while basically everyone else have come to the conclusion that this requirement is mostly silly and should be reworded to “be simple” or similar, that avoids the word “amateur” completely or the implying that amateurs wouldn’t be able to follow specs etc.

Right now

(This is early August 2010, things and facts in this protocol are likely to change, potentially a lot, over time so if you read this later on you need to take the date of this writing into account.)

There was a Hybi meeting at the recent IETF 78 meeting in Maastricht where a range of issues got discussed and some of the controversial subjects did actually get quite convincing consensus – and some of those didn’t at all match the previous requirements or the existing -00 spec.

Almost in parallel, Ian suggested a schedule for work ahead that most people expressed a liking in: provide a version of the protocol that can be done in 4 weeks for browsers to adjust to right now, and then work on an updated version to be shipped in 6 months with more of the unknowns detailed.

There’s a very lively debate going on about the need for chunking, the need for multiplexing multiple streams over a single TCP connection and there was a very very long debate going on regarding if the protocol should send data length-prefixed or if data should use a sentinel that marks the end of it. The length-prefix approach (favored by yours truly) seem to have finally won. Exactly how the framing is to be done, what parts that are going to be core protocol and what to leave for extensions and how extensions should work are not yet settled. I think everyone wants the protocol to be “as simple as possible” but everyone has their own mind setup on what amount of features that “the simplest” possible form of the protocol has.

The subject on exactly how the handshake is going to be done isn’t quite agreed to either by my reading, even if there’s a way defined in the -00 spec. The primary connection method will be using a HTTP Upgrade: header and thus connect to the server’s port 80 and then upgrade the protocol from HTTP over to Websockets. A procedure that is documented and supported by HTTP, but there’s been a discussion on how exactly that should be done so that cross-protocol requests can be avoided to the furthest possible extent.

We’ve seen up and beyond 50 mails per day on the hybi mailing list during some of the busiest days the last couple of weeks. I don’t see any signs of it cooling down short-term, so I think the 4 weeks goal seems a bit too ambitious at this point but I’m not ruling it out. Especially since the working procedure with Ian at the wheel is not “IETF-ish” but it’s not controlled entirely by Ian either.

Future

We are likely to see a future world with a lot of client-side applications connecting back to the server. The amount of TCP connections for a single browser is likely to increase, a lot. In fact, even a single web application within a single tab is likely to do multiple websocket connections when they re-use components and widgets from several others.

It is also likely that libcurl will get a websocket implementation in the future to do raw websocket transfers with, as it most likely will be needed in a future to properly emulate a browser/client.

I hope to keep tracking the development, occasionally express my own opinion on the matter (on the list and here), but mostly stay on top of what’s going on so that I can feed that knowledge to my friends and make sure that the curl project keeps up with the time.

chinese-socket

(The pictured socket might not be a websocket, but it is a pretty remarkably designed power socket that is commonly seen in China, and as you can see it accepts and works with many of the world’s different power plug designs.)

curl performance

Benchmarks, speed, comparisons, performance. I get a lot of questions on how curl and libcurl compare against other tools or libraries, and I rarely have any specific answers as I personally basically never use or test any other tools or libraries!

This text will instead be more elaborate on how we work on libcurl, why I believe libcurl will remain the fastest alternative.

cURL

libcurl is low-level

libcurl is written in C and uses the native function calls of the operating system to perform its network operations. It features a lot of features, but when it comes to plain sending and receiving data the code paths are very short and the loops can’t be shortened or sped up by any significant amount. Based on these facts, I am confident that for simple single stream transfers you really cannot write a file transfer library that runs faster. (But yes, I believe other similarly low-level style libraries can reach the same speeds.)

When adding more complicated test cases, like doing SSL or perhaps many connections that need to be kept persistent between transfers, then of course libraries can start to differ. libcurl uses SSL libraries natively so if they are fast, so will libcurl’s SSL handling be and vice versa. Of course we also strive to provide features such as connection pooling, SSL session id reuse, DNS caching and more to make the normal and frequent use cases as fast as we possibly can. What takes time when using libcurl should be the underlying network operations, not the tricks libcurl adds to them.

event-based is the way to grow

If you plan on making an app that uses more than just a few connections, libcurl can of course still do the heavy lifting for you. You should consider taking precautions already when you do your design and make sure that you can use an event-based concept and avoid relying on select or poll for the socket handling. Using libcurl’s multi_socket API, you can go up and beyond tens of thousands of connections and still reach maximum performance. And this using basically all of the protocols libcurl supports, this is not limited to a small subset. (There are unfortunate exceptions, like for example “file://” URLs but there are completely technical reasons for this.)

Very few file transfer libraries have this direct support for event-based operations. I’ve read reports of apps that have gone up to and beyond 70,000 connections on the same host using libcurl like this. The fact that TCP only has a 16 bit field in the protocol header for “source port” of course forces users who want to try this stunt to use more than one interface as source address.

And before you ask: you cannot grow a client to that amount using any other technique than event-based using many connections in the same thread, as basically no other approach scales as well.

When handling very many connections, the mere “juggling” of the connections take time and can be done in good or bad ways. It would be interesting to one day measure exactly how good libcurl is at this.

Binding Benchmark and Comparisons

We are aware of something like 40 different bindings for libcurl to make it possible to use from just about any language you like. Lots, if not just every, language also tend to have its own native version of a transfer or at least HTTP library. For many languages, the native version is the one that is most preferred and most used in books, articles and promoted on the internet. Rarely can the native versions compete with the libcurl based ones in actual transfer performance. Because of what I mentioned above: libcurl does little extra when transferring stuff but the actual and raw transfer. Of course there still needs to be additional glue and logic to make libcurl work fine with the different languages’ own unique environments, but it is still often proven that it doesn’t make the speed gain get lost or become invisible. I’ll illustrate below with some sample environments.

Ruby comparisons

Paul Dix is a Ruby guy and he’s done a lot of work with HTTP libraries and Ruby, and he’s also done some benchmarks on libcurl-based Ruby libraries. They show that the tools built on top of libcurl run significantly faster than the native versions.

Perl comparisons

“Ivan” wrote up a benchmarking script that performs a number of transfers using three different mechanisms available to perl hackers. One of them being the official libcurl perl binding (WWW::Curl), one of them being the perl standard one called LWP. The results leaves no room for doubts: the libcurl-based version is significantly faster than the “native” alternatives.

PHP comparisons

The PHP binding for libcurl, PHP/CURL, is a popular one. In PHP the situation is possibly a bit different as they don’t have a native library that is nearly as feature complete as the libcurl binding, but they do have a native version for doing things like getting HTTP data etc. This function has been compared against PHP/CURL many times, for example Ricky’s comparisons, and Alix Axel’s comparisons. They all show that the libcurl-based alternative is faster. Exactly how much faster is of course depending on a lot of factors but I’m not going into such specific details here and now.

We miss more benchmarks!

I wish I knew about more benchmarks and comparisons of speed. If you know of others, or if you get inspired enough to write up and publish any after reading my rant here, please let me know! Not only is it fun and ego-boosting to see our project win, but I also want to learn from them and see where we’re lacking and if anyone beats us in a test, it’d be great to see what we could do to improve.

I’ll talk at FSCONS 2010

Recently I was informed that I got two talks accepted to the FSCONS 2010 conference, to be held in the beginning of November 2010.

My talks will be about the Future and current state of internet transport protocols (TCP, HTTP, SPDY, WebSockets, SCTP and more) and on High performance multi-protocol applications with libcurl, which will educate the audience on how to use libcurl when doing high performance clients with potentially a very large number of simultaneous transfers. A somewhat clueful reader will of course spot that these two talks have a lot in common, and yeah they do reveal a lot of what I do and what I like and what I poke on these days. I hope I’ll be able to put the light on some things not everyone is already perfectly aware of.

The talks will be held in English, and if the past FSCONS conferences tell anything, my talks will be video filmed and become available online afterward for the world to see if you have a funeral or something to attend to that prevents you from actually attending in person.

If you have thoughts, questions or anything on these topics that you would like to get answered in my talk, feel free to bring them up and I’ll see what I can do.

(If those fine guys and gals at FSCONS ever settled for a logo, or had one I could link to, I would’ve shown one of them right here.)

C-ares, now and ahead!

The project c-ares started many years ago (March 2004) when I decided to fork the existing ares project to get the changes done that I deemed necessary – and the original project owner didn’t want them.

I did my original work on c-ares back then primarily to get a good asynchronous name resolver for libcurl so that we would get around the limitation of having to do the name resolves totally synchronously as the libc interfaces mandate. Of course, c-ares was and is more than just name resolving and not too surprisingly, there have popped up other projects that are now using c-ares.

I’m maintaining a bunch of open source projects, and c-ares was never one that I felt a lot of love for, it was mostly a project that I needed to get done and when things worked the way I wanted them I found myself having ended up as maintainer for yet another project. I’ve repeatedly mentioned on the c-ares mailing list that I don’t really have time to maintain it and that I’d rather step down and let someone else “take over”.

After having said this for over 4 years, I’ve come to accept that even though c-ares has many users out there, and even seems to be appreciated by companies and open source projects, there just isn’t any particular big desire to help out in our project. I find it very hard to just “give up” a functional project, so I linger and do my best to give it the efforts and love it needs. I very much need and want help to maintain and develop c-ares. I’m not doing a very good job with it right now.

Threaded name resolves competes

I once thought we would be able to make c-ares capable of becoming a true drop-in replacement for the native system name resolver functions, but over the years with c-ares I’ve learned that the dusty corners of name resolving in unix and Linux have so many features and fancy stuff that c-ares is still a long way from that. It has also made me turn around somewhat and I’ve reconsidered that perhaps using a threaded native resolver is the better way for libcurl to do asynchronous name resolves. That way we don’t need any half-baked implementations of the resolver. Of course it comes at the price of a new thread for each name resolve, which turns really nasty of you grow the number of connections just a tad bit, but still most libcurl-using applications today hardly use more than just a few (say less than a hundred) simultaneous transfers.

Future!

I don’t think the future has any radical changes or drastically new stuff in the pipe for c-ares. I think we should keep polishing off bugs and add the small functions and features that we’re missing. I believe we’re not yet parsing all records we could do, to a convenient format.

As usual, a project is not about how much we can add but about how much we can avoid adding and how much we can remain true to our core objectives. I wish the growing popularity will make more people join the project and then not only to through a single patch at us, but to also hand around a while and help us somewhat more.

Hopefully we will one day be able to use c-ares instead of a typical libc-based name resolver and yet resolve the same names.

Join us and help us give c-ares a better future!

c-ares

Webscrapers unite!

The term web scraping has come to stay. It refers to getting stuff off web sites in an automated fashion. Other terms occasionally used for include spidering, web harvesting and more. I’ve used the term HTTP scripting for it.

The art is in many cases to act more or less like a browser with the single purpose of making the server not detect that it is a program in the other end.

This is something I’ve done a lot over the years while developing curl, and I’ve answered countless of questions on the matter. There are several good books (all of them having at least parts of them covering curl).

Today, we’ve setup a little web site over at webscrapers.haxx.se and an associated mailing list, in an attempt to gather people who are working in this field. Whether you do it for fun or profit doesn’t matter and while we can certainly discuss what tools you should or shouldn’t use for scraping, we certainly don’t limit which to discuss on the list as long as the subject is somewhat relevant.

So, if you’re into scraping or you want to learn more on how to automate your web bots, come on over and join the list and let’s see where this might take us!