Tag Archives: wget

Another wget reference was Bourne

wget-is-not-a-crimeBack in 2013, it came to light that Wget was used to to copy the files private Manning was convicted for having leaked. Around that time, EFF made and distributed stickers saying wget is not a crime.

Weirdly enough, it was hard to find a high resolution version of that image today but I’m showing you a version of it on the right side here.

In the 2016 movie Jason Bourne, Swedish actress Alicia Vikander is seen working on her laptop at around 1:16:30 into the movie and there’s a single visible sticker on that laptop. Yeps, it is for sure the same EFF sticker. There’s even a very brief glimpse of the top of the red EFF dot below the “crime” word.

vlcsnap-2016-10-22-00h36m17s934

Also recall the wget occurance in The Social Network.

My first 20 years of HTTP

During the autumn 1996 I took my first swim in the ocean known as HTTP. Twenty years ago now.

I had previously worked with writing an IRC bot in C, and IRC is a pretty simple text based protocol over TCP so I could use some experiences from that when I started to look into HTTP. That IRC bot was my first real application distributed to the world that was using TCP/IP. It was portable to most unixes and Amiga and it was open source.

1996 was the year the movie Independence Day premiered and the single hit song that plagued the world more than others that year was called Macarena. AOL, Webcrawler and Netscape were the most popular websites on the Internet. There were less than 300,000 web sites on the Internet (compared to some 900 million today).

I decided I should spice up the bot and make it offer a currency exchange rate service so that people who were chatting could ask the bot what 200 SEK is when converted to USD or what 50 AUD might be in DEM. – Right, there was no Euro currency yet back then!

I simply had to fetch the currency rates at a regular interval and keep them in the same server that ran the bot. I just needed a little tool to download the rates over HTTP. How hard can that be? I googled around (this was before Google existed so that was not the search engine I could use!) and found a tool named ‘httpget’ that made pretty much what I wanted. It truly was tiny – a few hundred nokia-1610lines of code.

I don’t have an exact date saved or recorded for when this happened, only the general time frame. You know, we had no smart phones, no Google calendar and no digital cameras. I sported my first mobile phone back then, the sexy Nokia 1610 – viewed in the picture on the right here.

The HTTP/1.0 RFC had just recently came out – which was the first ever real spec published for HTTP. RFC 1945 was published in May 1996, but I was blissfully unaware of the youth of the standard and I plunged into my little project. This was the first published HTTP spec and it says:

HTTP has been in use by the World-Wide Web global information initiative since 1990. This specification reflects common usage of the protocol referred too as "HTTP/1.0". This specification describes the features that seem to be consistently implemented in most HTTP/1.0 clients and servers.

Many years after that point in time, I have learned that already at this time when I first searched for a HTTP tool to use, wget already existed. I can’t recall that I found that in my searches, and if I had found it maybe history would’ve made a different turn for me. Or maybe I found it and discarded for a reason I can’t remember now.

I wasn’t the original author of httpget; Rafael Sagula was. But I started contributing fixes and changes and soon I was the maintainer of it. Unfortunately I’ve lost my emails and source code history from those earliest years so I cannot easily show my first steps. Even the oldest changelogs show that we very soon got help and contributions from users.

The earliest saved code archive I have from those days, is from after we had added support for Gopher and FTP and renamed the tool ‘urlget’. urlget-3.5.zip was released on January 20 1998 which thus was more than a year later my involvement in httpget started.

The original httpget/urlget/curl code was stored in CVS and it was licensed under the GPL. I did most of the early development on SunOS and Solaris machines as my first experiments with Linux didn’t start until 97/98 something.

sparcstation-ipc

The first web page I know we have saved on archive.org is from December 1998 and by then the project had been renamed to curl already. Roughly two years after the start of the journey.

RFC 2068 was the first HTTP/1.1 spec. It was released already in January 1997, so not that long after the 1.0 spec shipped. In our project however we stuck with doing HTTP 1.0 for a few years longer and it wasn’t until February 2001 we first started doing HTTP/1.1 requests. First shipped in curl 7.7. By then the follow-up spec to HTTP/1.1, RFC 2616, had already been published as well.

The IETF working group called HTTPbis was started in 2007 to once again refresh the HTTP/1.1 spec, but it took me a while until someone pointed out this to me and I realized that I too could join in there and do my part. Up until this point, I had not really considered that little me could actually participate in the protocol doings and bring my views and ideas to the table. At this point, I learned about IETF and how it works.

I posted my first emails on that list in the spring 2008. The 75th IETF meeting in the summer of 2009 was held in Stockholm, so for me still working  on HTTP only as a spare time project it was very fortunate and good timing. I could meet a lot of my HTTP heroes and HTTPbis participants in real life for the first time.

I have participated in the HTTPbis group ever since then, trying to uphold the views and standpoints of a command line tool and HTTP library – which often is not the same as the web browsers representatives’ way of looking at things. Since I was employed by Mozilla in 2014, I am of course now also in the “web browser camp” to some extent, but I remain a protocol puritan as curl remains my first “child”.

Removing the PowerShell curl alias?

PowerShell is a spiced up command line shell made by Microsoft. According to some people, it is a really useful and good shell alternative.

Already a long time ago, we got bug reports from confused users who couldn’t use curl from their PowerShell prompts and it didn’t take long until we figured out that Microsoft had added aliases for both curl and wget. The alias had the shell instead invoke its own command called “Invoke-WebRequest” whenever curl or wget was entered. Invoke-WebRequest being PowerShell’s own version of a command line tool for fiddling with URLs.

Invoke-WebRequest is of course not anywhere near similar to neither curl nor wget and it doesn’t support any of the command line options or anything. The aliases really don’t help users. No user who would want the actual curl or wget is helped by these aliases, and user who don’t know about the real curl and wget won’t use the aliases. They were and remain pointless. But they’ve remained a thorn in my side ever since. Me knowing that they are there and confusing users every now and then – not me personally, since I’m not really a Windows guy.

Fast forward to modern days: Microsoft released PowerShell as open source on github yesterday. Without much further ado, I filed a Pull-Request, asking the aliases to be removed. It is a minuscule, 4 line patch. It took way longer to git clone the repo than to make the actual patch and submit the pull request!

It took 34 minutes for them to close the pull request:

“Those aliases have existed for multiple releases, so removing them would be a breaking change.”

To be honest, I didn’t expect them to merge it easily. I figure they added those aliases for a reason back in the day and it seems unlikely that I as an outsider would just make them change that decision just like this out of the blue.

But the story didn’t end there. Obviously more Microsoft people gave the PR some attention and more comments were added. Like this:

“You bring up a great point. We added a number of aliases for Unix commands but if someone has installed those commands on WIndows, those aliases screw them up.

We need to fix this.”

So, maybe it will trigger a change anyway? The story is ongoing…

The curl and wget war

“To be honest, I often use wget to download files”

… some people tell me in a lowered voice, like if they were revealing one of their deepest family secrets  to me. This is usually done with a slightly scared and a little ashamed look in their eyes – yet still intrigued, like it took some effort to say that straight in my face. How will I respond to that!?

I enjoy maintaining a notion that there is a “war” between curl and wget. Like the classics emacs vs vi or KDE vs GNOME. That we’re like two rivals competing for some awesome prize and both teams are glaring at the other one and throwing the occasional insult over the wall at the competing team. Mostly because people believe it and I sort of like the image it projects in my brain. So I continue doing jokes about it when I can.

monty-python-taunt-you-a-second-time

In reality though, where some of us spend our lives, there is no such war. There’s no conflict or backstabbing going on. We’re quite simply two open source projects busy doing our own things and we’ve both been doing it for almost two decades. I consider the current wget maintainer, Giuseppe, a friend and I’m friends with the two former maintainers as well.

We have more things in common than what separates us. We’re like members of the fairly exclusive HTTP/FTP command line tool club that doesn’t have that many members.

We don’t have a lot of developer overlap, there are but a few occasional contributors sending patches to both projects and I’m one of them. We have some functional overlap in the curl tool with wget but really, I strongly recommend everyone to always use the best tool for the job and to use the tool they prefer. If wget does the job, use it. If it does the job better than curl, then switch to wget.

There’s been a line in the curl FAQ since over 15 years: “Never, during curl’s development, have we intended curl to replace wget or compete on its market.” and it tells the truth. We are believers in the Unix philosophy that each tool does what it does best and you get your job done best by combining the right set of tools. In the curl project we make one command line tool and we make it as good as we can, but we still urge our users to use the best tool for the job even when that means not using our tool.

All this said, there are plenty of things, protocols and features that curl does that you cannot find in wget and that wget doesn’t do. I’ve detailed some differences in my curl vs wget document. Some things that both can do are much easier to do with curl or offer you more control or power than in the wget counter part. Those are the things you should use curl for. Use the best tool for the job.

What takes the most effort in the curl project (and frankly that gets used by the largest amount of users in the world) is the making of the libcurl transfer library to which there is no alternative in the wget project. Writing a stable multi platform library with a sensible and solid API is much harder and lots of more work than writing a command line tool.

OK, I’ll stop tip-toeing and answer the question you really wanted to know while enduring all this text up until this point:

When do you suggest I use wget instead of curl?

For me, wget is for recursive gets and for doing more persistent and patient retries of continuing transfers over really bad connections and networks better. But then you really must take my bias into account and ignore anything I say because I live and breath the curl life.

Why curl defaults to stdout

(Recap: I founded the curl project, I am still the lead developer and maintainer)

When asking curl to get a URL it’ll send the output to stdout by default. You can of course easily change this behavior with options or just using your shell’s redirect feature, but without any option it’ll spew it out to stdout. If you’re invoking the command line on a shell prompt you’ll immediately get to see the response as soon as it arrives.

I decided curl should work like this, and it was a natural decision I made already when I worked on the predecessors during 1997 or so that later would turn into curl.

On Unix systems there’s a common mantra that “everything is a file” but also in fact that “everything is a pipe”. You accomplish things on Unix by piping the output of one program into the input of another program. Of course I wanted curl to work as good as the other components and I wanted it to blend in with the rest. I wanted curl to feel like cat but for a network resource. And cat is certainly not the only pre-curl command that writes to stdout by default; they are plentiful.

And then: once I had made that decision and I released curl for the first time on March 20, 1998: the call was made. The default was set. I will not change a default and hurt millions of users. I rather continue to be questioned by newcomers, but now at least I can point to this blog post! :-)

About the wget rivalry

cURLAs I mention in my curl vs wget document, a very common comment to me about curl as compared to wget is that wget is “easier to use” because it needs no extra argument in order to download a single URL to a file on disk. I get that, if you type the full commands by hand you’ll use about three keys less to write “wget” instead of “curl -O”, but on the other hand if this is an operation you do often and you care so much about saving key presses I would suggest you make an alias anyway that is even shorter and then the amount of options for the command really doesn’t matter at all anymore.

I put that argument in the same category as the people who argue that wget is easier to use because you can type it with your left hand only on a qwerty keyboard. Sure, that is indeed true but I read it more like someone trying to come up with a reason when in reality there’s actually another one underneath. Sometimes that other reason is a philosophical one about preferring GNU software (which curl isn’t) or one that is licensed under the GPL (which wget is) or simply that wget is what they’re used to and they know its options and recognize or like its progress meter better.

I enjoy our friendly competition with wget and I seriously and honestly think it has made both our projects better and I like that users can throw arguments in our face like “but X can do Y”and X can alter between curl and wget depending on which camp you talk to. I also really like wget as a tool and I am the occasional user of it, just like most Unix users. I contribute to the wget project well, both with code and with general feedback. I consider myself a friend of the current wget maintainer as well as former ones.

daniel.haxx.se episode 8

Today I hesitated to make my new weekly video episode. I looked at the viewers number and how they basically have dwindled the last few weeks. I’m not making this video series interesting enough for a very large crowd of people. I’m re-evaluating if I should do them at all, or if I can do something to spice them up…

… or perhaps just not look at the viewers numbers at all and just do what think is fun?

I decided I’ll go with the latter for now. After all, I enjoy making these and they usually give me some interesting feedback and discussions even if the numbers are really low. What good is a number anyway?

This week’s episode:

Personal

Firefox

Fun

HTTP/2

TALKS

  • I’m offering two talks for FOSDEM

curl

  • release next Wednesday
  • bug fixing period
  • security advisory is pending

wget

Reducing the Public Suffix pain?

Let me introduce you to what I consider one of the worst hacks we have in current and modern internet protocols: the Public Suffix List (PSL). This is a list (maintained by Mozilla) with domains that have some kind administrative setup or arrangement that makes sub-domains independent. For example, you can’t be allowed to set cookies for “*.com” because .com is a TLD that has independent domains. But the same thing goes for “*.co.uk” and there’s no hint anywhere about this – except for the Public Suffix List. Then, take that simple little example and extrapolate to a domain system that grows with several new TLDs every month and more. The PSL is now several thousands of entries long.

And cookies isn’t the only thing this is used for. Another really common and perhaps even more important use case is for wildcard matches in TLS server certificates. You should not be allowed to buy and use a cert for “*.co.uk” but you can for “*.yourcompany.co.uk”…

Not really official but still…

If you read the cookie RFC or the spec for how to do TLS wildcard certificate matching you won’t read any line putting it crystal clear that the Suffix List is what you must use and I’m sure different browser solve this slightly differently but in practice and most unfortunately (if you ask me) you must either use the list or make your own to be fully compliant with how the web works 2014.

curl, wget and the PSL

In curl and libcurl, we have so far not taken the PSL into account which is by choice since I’ve not had any decent way to handle it and there are lots of embedded and other use cases that simply won’t be able to cope with that large PSL chunk.

Wget hasn’t had any PSL awareness either, but the recent weeks this has been brought up on the wget list and more attention has been given to this. Work has been initiated to do something about it, which has lead to…

libpsl

Tim Rühsen took the baton and started the libpsl project and its associated mailing list, as a foundation for something for Wget to use to get PSL awareness.

I’ve mostly cheered the effort so far and said that I wouldn’t mind building on this to enhance curl in the future if it just gets a suitable (liberal enough) license and it seems to go in that direction. For curl’s sake, I would like to get a conditional dependency on this so that people without particular size restrictions can use this, and people on more embedded and special-purpose situations can continue to build without PSL support.

If you’re interested in helping out in curl and libcurl in this area, feel most welcome!

dbound

Meanwhile, the IETF has set up a new mailing list called dbound for discussions around PSL and similar issues and it seems very timely!

Is there a case for a unified SSL front?

There are many, sorry very many, different SSL libraries today that various programs may want to use. In the open source world at least, it is more and more common that SSL (and other crypto) using programs offer build options to build with either at least OpenSSL or GnuTLS and very often they also offer optinal build with NSS and possibly a few other SSL libraries.

In the curl project we just added support for library number nine. In the libcurl source code we have an internal API that each SSL library backend must provide, and all the libcurl source code is internally using only that single and fixed API to do SSL and crypto operations without even knowing which backend library that is actually providing the functionality. I talked about libcurl’s internal SSL API before, and I asked about this on the libcurl list back in Feb 2011.

So, a common problem should be able to find a common solution. What if we fixed this in a way that would be possible for many projects to re-use? What if one project’s ability to select from 9 different provider libraries could be leveraged by others. A single SSL API with a simplified API but that still provides the functionality most “simple” SSL-using applications need?

Marc Hörsken and I have discussed this a bit, and pidgin/libpurble came up as a possible contender that could use such a single SSL. I’ve also talked about it with Claes Jakobsson and Magnus Hagander for postgresql and I know since before that wget certainly could use it. When I’ve performed my talks on the seven SSL libraries of libcurl I’ve been approached by several people who have expressed a desire in seeing such an externalized API, and I remember the guys from cyassl among those. I’m sure there will be a few other interested parities as well if this takes off.

What remains to be answered is if it is possible to make it reality in a decent way.

Upsides:

  1. will reduce code from the libcurl code base – in the amount of 10K or more lines of C code, and it should be a decreased amount of “own” code for all projects that would decide to use thins single SSL library
  2. will allow other projects to use one out of many SSL libraries, hopefully benefiting end users as a result
  3. should get more people involved in the code as more projects would use it, hopefully ending up in better tested and polished source code

There are several downsides with a unified library, from the angle of curl/libcurl and myself:

  1. it will undoubtedly lead to more code being added and implemented that curl/libcurl won’t need or use, thus grow the code base and the final binary
  2. the API of the unified library won’t be possible to be as tightly integrated with the libcurl internals as it is today, so there will be some added code and logic needed
  3. it will require that we produce a lot of documentation for all the functions and structs in the API which takes time and effort
  4. the above mention points will deduct time from my other projects (hopefully to benefit others, but still)

Some problems that would have to dealt with:

  1. how to deal with libraries that don’t provide functionality that the single SSL API does
  2. how to draw the line of what functionality to offer in the API, as the individual libraries will always provide richer and more complete APIs
  3. What is actually needed to get the work on this started for real? And if we do, what would we call the project…

PS, I know the term is more accurately “TLS” these days but somehow people (me included) seem to have gotten stuck with the word SSL to cover both SSL and TLS…

News flash! Tech terms used almost correctly!

Ok, The Social Network isn’t a new movie by any means at this time, but I happened to see it the other day. I’ll leave the entire story and whatever facts or not it did or didn’t portrait in a correct manner.

But I did spot the use of several at least basic technical terms used in the beginning that struck me as amazingly correctly used! The movie character Mark actually used wget to download images (at about 10:05 into the movie), and as you can see on my first screenshot the initial keystrokes we get to see on the command line also actually resembles a correct wget command line. You can click on these images to get a slightly larger version of the pics. I’m sorry I couldn’t get any higher quality ones, but I figure the point is still the same!

the-social-network-wget-cmdline

After having invoked wget, as is explained he gets many pictures downloaded and what do you know, the screen output actually looks like it could’ve been a wget that has downloaded a couple of files:

the-social-network-wget-output

He also mentioned the terms ‘Apache’, ’emacs’ and ‘perl scripts’ in complete and correct sentences.

Where is the world heading?!

Update: Hrvoje Niksic, the founder of wget, helped out with some additional observations:

The options looked right to me, something like -r -A.jpg …

I was wondering about the historical accuracy of the progress bar, but it checks out. The movie takes place about a year and a half after the release of Wget 1.8, which added the feature. The department that takes care of these things did a good job. :)

Bjarni got the award 2010

The Nordic Free Software Award 2010 was given the Icelandic hacker Bjarni Rúnar Einarsson.

The formal handing over of the prize was done during the social event at FSCONS 2010, with hundreds of free software hackers attending and a lot joy. Bjarni was also immediately invited to participate in the NFSA jury for next year, in an attempt to start a tradition of getting former winners on the jury.

NFSA-award

I’m happy to say that I served in the jury for the award this year. We were a bunch of Nordic free software enthusiasts in there, involving several previous winners. The winner this year, Bjarni Rúnar, was selected by us having a nomination process in which we received I believe 11 names and then a subsequent voting within the jury.

I did the press release draft and Karsten from FSFE polished it into something much better. I think that will go out early this week and I am now even mentioned as press contact for Sweden about the award. The FSFE posted their announcement, including my last name wrongly spelled…

The social event then went on with lots of free software talks with cool people from the entire Nordic region, and I certainly met a whole bunch of friendly hackers I didn’t know before. It was also great fun to run into Giuseppe, the current wget maintainer.

(The picture might just be a fake.)