All posts by Daniel Stenberg

Absorbing 1,000 emails per day

Some people say email is dead. Some people say there are “email killers” and bring up a bunch of chat and instant messaging services. I think those people communicate far too little to understand how email can scale.

I receive up to around 1,000 emails per day. I average on a little less but I do have spikes way above.

Why do I get a thousand emails?

Primarily because I participate on a lot of mailing lists. I run a handful of open source projects myself, each with at least one list. I follow a bunch more projects; more mailing lists. We have a whole set of mailing lists at work (Mozilla) and I participate and follow several groups in the IETF. Lists and lists. I discuss things with friends on a few private mailing lists. I get notifications from services about things that happen (commits, bugs submitted, builds that break, things that need to get looked at). Mails, mails and mails.

Don’t get me wrong. I prefer email to web forums and stuff because email allows me to participate in literally hundreds of communities from a single spot in an asynchronous manner. That’s a good thing. I would not be able to do the same thing if I had to use one of those “email killers” or web forums.

Unwanted email

I unsubscribe from lists that I grow tired from. I stamp down on spam really hard and I run aggressive filters and blacklists that actually make me receive rather few spam emails these days, percentage wise. There are nowadays about 3,000 emails per month addressed to me that my mail server accepts that are then classified as spam by spamassassin. I used to receive a lot more before we started using better blacklists. (During some periods in the past I received well over a thousand spam emails per day.) Only 2-3 emails per day out of those spam emails fail to get marked as spam correctly and subsequently show up in my inbox.

Flood management

My solution to handling this steady high paced stream of incoming data is prioritization and putting things in different bins. Different inboxes.

  1. Filter incoming email. Save the email into its corresponding mailbox. At this very moment, I have about 30 named inboxes that I read. I read them in order, top to bottom as they’re sorted in roughly importance order (to me).
  2. Mails that don’t match an existing mailing list or topic that get stored into the 28 “topic boxes” run into another check: is the sender a known “friend” ? That’s a loose term I use, but basically means that the mail is from an email address that I have had conversations with before or that I know or trust etc. Mails from “friends” get the honor of getting put in mailbox 0. The primary one. If the mail comes from someone not listed as friend, it’ll end up in my “suspect” mailbox. That’s mailbox 1.
  3. Some of the emails get the honor of getting forwarded to a cloud email service for which I have an app in my phone so that I can get a sense of important mail that arrive. But I basically never respond to email using my phone or using a web interface.
  4. I also use the “spam level” in spams to save them in different spam boxes. The mailbox receiving the highest spam level emails is just erased at random intervals without ever being read (unless I’m tracking down a problem or something) and the “normal” spam mailbox I only check every once in a while just to make sure my filters are not hiding real mails in there.

Reading

I monitor my incoming mails pretty frequently all through the day – every day. My wife calls me obsessed and maybe I am. But I find it much easier to handle the emails a little at a time rather than to wait and have it pile up to huge lumps to deal with.

I receive mail at my own server and I read/write my email using Alpine, a text based mail client that really excels at allowing me to plow through vast amounts of email in a short time – something I can’t say that any UI or web based mail client I’ve tried has managed to do at a similar degree.

A snapshot from my mailbox from a while ago looked like this, with names and some topics blurred out. This is ‘INBOX’, which is the main and highest prioritized one for me.

alpine screenshot

I have my mail client to automatically go to the next inbox when I’m done reading this one. That makes me read them in prio order. I start with the INBOX one where supposedly the most important email arrives, then I check the “suspect” one and then I go down the topic inboxes one by one (my mail client moves on to the next one automatically). Until either I get overwhelmed and just return to the main box for now or I finish them all up.

I tend to try to deal with mails immediately, or I mark them as ‘important’ and store them in the main mailbox so that I can find them again easily and quickly.

I try to only keep mails around in my mailbox that concern ongoing topics, discussions or current matters of concern. Everything else should get stored away. It is hard work to maintain the number of emails there at a low number. As you all know.

Writing email

I averaged at less than 200 emails written per month during 2015. That’s 6-7 per day.

That makes over 150 received emails for every email sent.

fcurl is fread and friends for URLs

This whole family of functions, fopen, fread, fwrite, fgets, fclose and more are defined in the C standard since C89. You can’t really call yourself a C programmer without knowing them and probably even using them in at least a few places.

The charm with these is that they’re standard, they’re easy to use and they’re available everywhere where there’s a C compiler.

A basic example that just reads a file from disk and writes it to stdout could look like this:

FILE *file;

file = fopen("hello.txt", "r");
if(file) {
  char buffer [256];
  while(1) {
    size_t rc = fread(buffer, sizeof(buffer),
                1, file);
    if(rc > 0)
      fwrite(buffer, rc, 1, stdout);
    else
      break;
  }
  fclose(file);
}

Imagine you’d like to switch this example, or one of your actual real world programs that use the fopen() family of functions to read or write files, and instead read and write files from and to the Internet instead using your favorite Internet protocols. How would you do that without having to change your code a lot and do a major refactoring job?

Enter fcurl

I’ve started to work on a library that provides a look-alike API with matching functions and behaviors, but that allows fopen() to instead specify a URL instead of a file name. I call it fcurl. (Much inspired by the libcurl example fopen.c, which I wrote the first version of already back in 2002!)

It is of course open source and is powered by libcurl.

The project is in its early infancy. I think it would be interesting to try it out and I’ve mentioned the idea to a few people that have shown interest. I really can’t make this happen all on my own anyway so while I’ve created a first embryo, it will take some time before it gets truly useful. Help from others would be greatly appreciated of course.

Using this API, a version of the above example that instead reads data from a HTTPS site instead of a local file could look like:

FCURL *file;

file = fcurl_open("https://daniel.haxx.se/",
                  "r");
if(file) {
  char buffer [256];
  while(1) {
    size_t rc = fcurl_read(buffer,         
                           sizeof(buffer), 1, 
                           file);
    if(rc > 0)
      fwrite(buffer, rc, 1, stdout);
    else
      break;
  }
  fcurl_close(file);
}

And it could even actually also read a local file using the file:// sheme.

Drop-in replacement

The idea here is to make the alternative functions have new names but as far as possible accept the same input arguments, return the same return codes and so on.

If we do it right, you could possibly even convert an existing program with just a set of #defines at the top without even having to change the code!

Something like this:

#define FILE FCURL
#define fopen(x,y) fcurl_open(x, y)
#define fclose(x) fcurl_close(x)

I think it is worth considering a way to provide an official macro set like that for those who’d like to switch easily (?) and quickly.

Fun things to consider

1. for non-scheme input, use normal fopen?

An interesting take is probably to make fcurl_open() treat input specified without a “scheme://” to be a local file, and then passed to fopen() instead under the hood. That would then enable even more code to switch to fcurl since all the existing use cases with local file names would just continue to work.

2. LD_PRELOAD

An interesting area of deeper research around this could be to provide a way to LD_PRELOAD replacements for the functions so that not even any source code would need be changed and already built existing binaries could be given this functionality.

3. fopencookie

There’s also the GNU libc’s fopencookie concept to figure out if that is something for fcurl to support/use. BSD and OS X have something similar called funopen.

4. merge in official libcurl

If this turns out useful, appreciated and good. We could consider moving the API in under the curl project’s umbrella and possibly eventually even making it part of the actual libcurl. But hey, we’re far away from that and I’m not saying that is even the best idea…

Your input is valuable

Please file issues or pull-requests. Let’s see where we can take this!

HTTP/2 in April 2016

On April 12 I had the pleasure of doing another talk in the Google Tech Talk series arranged in the Google Stockholm offices. I had given it the title “HTTP/2 is upon us, and here’s what you need to know about it.” in the invitation.

The room seated 70 persons but we had the amazing amount of over 300 people in the waiting line who unfortunately didn’t manage to get a seat. To those, and to anyone else who cares, here’s the video recording of the event.

If you’ve seen me talk about HTTP/2 before, you might notice that I’ve refreshed the material somewhat since before.

decent durable defect density displayed

Here’s an encouraging graph from our regular Coverity scans of the curl source code, showing that we’ve maintained a fairly low “defect density” over the last two years, staying way below the average density level.
defect density over timeClick the image to view it slightly larger.

Defect density is simply the number of found problems per 1,000 lines of code. As a little (and probably unfair) comparison, right now when curl is flat on 0, Firefox is at 0.47, c-ares at 0.12 and libssh2 at 0.21.

Coverity is still the primary static code analyzer for C code that I’m aware of. None of the flaws Coverity picked up in curl during the last two years were detected by clang-analyzer for example.

A thousand curl forks

a fork

The curl repository on github has now been forked 1,000 times. Or actually, there are 1,000 forks kept alive as the counter is actually decreased when people remove their forks again. curl has had its primary git repository on github since March 22, 2010. A little more than two days between every newly created fork.

1000-forks

If you’re not used to the github model: a fork is typically made to get yourself your own copy of someone’s source tree so that you can make changes to that and publish your own version of the source tree without having to get the changes you’ve done merged into the original repository that you forked off from. But it is also the most common way to offer changes back to github based projects:  send a request that a particular change in your version of the source tree should get merged into the mother project. A so called Pull Request.

Trivia: The term “fork” when meaning “to divide in branches, go separate ways” has been used in the English language since the 14th century.

I’m only aware of one actual separate line of development that is a true fork of libcurl that I believe is still being maintained: libgnurl.

curl is 18 years old tomorrow

Another notch on the wall as we’ve reached the esteemed age of 18 years in the cURL project. 9 releases were shipped since our last birthday and we managed to fix no less than a total of 457 bugs in that time.

18notches

On this single day in history…

20,000 persons will be visiting the web site, transferring over 4GB of data.

1.3 bug fixes will get pushed to the git repository (out of the 3 commits made)

300 git clones are made of the curl source tree, by 100 unique users.

4000 curl source archives will be downloaded from the curl web site

8 mails get posted on the curl mailing lists (at least one of them will be posted by me).

I will spend roughly 2 hours on curl related work. Mostly answering mail, bug reports and debugging, but also maintaining infrastructure, poke on the web site and if lucky, actually spending a few minutes writing new code.

Every human in the connected world will use at least one service,  tool or application that runs curl.

Happy birthday to us all!

POWERMASTR 10: KOM OK

My phone just lighted up. POWERMASTER 10 told me something. It said “POWERMASTR 10: KOM OK”.

SMS conversation screenshot

Over the last few months, I’ve received almost 30 weird text messages from a “POWERMASTER 10”, originating from a Swedish phone number in a number range reserved for “devices”. Yeps, I’m showing the actual number below in the screenshot because I think it doesn’t matter and if for the unlikely event that the owner of +467190005601245 would see this, he/she might want to change his/her alarm config.

Powermaster 10 is probably a house alarm control panel made by Visonic. It is also clearly localized and sends messages in Swedish.

As this habit has been going on for months already, one can only suspect that the user hasn’t really found the SMS feedback to be a really valuable feature. It also makes me wonder what the feedback it sends really means.

The upside of this story is that you seem to be a very happy person when you have one of these control panels, as this picture from their booklet shows. Alarm systems, control panels, text messages. Why wouldn’t you laugh?!

powermaster10-laughing

Edit: I contacted Telenor about this after my initial blog post but they simply refused to do anything since I’m not the customer and they just didn’t want to understand that I only wanted them to tell their customer that they’re doing something wrong. These messages kept on coming to me with irregular intervals until July 2018.

Update, September 8, 2020: I got another text today (it’s been silent since September 26, 2019). The Swedish text in this message translates to “battery error”.

August 15 2021. Still happening.

Turn many pictures into a movie

Challenge: you have 90 pictures of various sizes, taken in different formats and shapes. Using all sorts strange file names. Make a movie out of all of them, with the images using the correct aspect ratio. And add music. Use only command line tools on Linux.

Solution: this is a solution, you can most likely solve this in 22 other ways as well. And by posting it here, I can find it myself if I ever want to do the same stunt again…

#!/bin/sh

j=0
# convert options
pic="-resize 1920x1080 -background black -gravity center -extent 1920x1080"

# loop over the images
for i in `ls *jpg | sort -R`; do
 echo "Convert $i"
 convert $pic $i "pic-$j.jpg"
 j=`expr $j + 1`
done

# now generate the movie
mp3="file.mp3"
echo "make movie"
ffmpeg -framerate 3 -i pic-%d.jpg -i $mp3 -acodec copy -c:v libx264 -r 30 -pix_fmt yuv420p -s 1920x1080 -shortest out.mp4

Explained

This is a shell script.

The ‘pic’ variable holds command line options for the ImageMagick ‘convert‘ tool. It resizes each picture to 1920×1080 while maintaining aspect ratio and if the pic gets smaller, it is centered and gets a black border.

The loop goes through all files matching *,jpg, randomizes the order with ‘sort’ and then runs ‘convert’ on them one by one and calls the output files pic-[number].jpg where number is increased by one for each image.

Once all images have the correct and same size, ‘ffmpeg‘ is invoked. It is told to produce a movie with 3 photos per second, how to find all the images, to include an mp3 file into the output and to stop encoding when one of the streams ends – this assumes the playing time of the mp3 file is longer than the total time the images are shown so the movie stops when we run out of images to show.

Result

The ‘out.mp4’ file, uploaded to youtube could then look like this:

(music by Bensound.com)

Summers are for HTTP

stockholm castle and ship
Stockholm City, as photographed by Michael Caven

In July 2015, 40-something HTTP implementers and experts of the world gathered in the city of Münster, Germany, to discuss nitty gritty details about the HTTP protocol during four intense days. Representatives for major browsers, other well used HTTP tools and the most popular HTTP servers were present. We discussed topics like how HTTP/2 had done so far, what we thought we should fix going forward and even some early blue sky talk about what people could potentially see being subjects to address in a future HTTP/3 protocol.

You can relive the 2015 version somewhat from my daily blog entries from then that include a bunch of details of what we discussed: day one, two, three and four.

http workshopThe HTTP Workshop was much appreciated by the attendees and it is now about to be repeated. In the summer of 2016, the HTTP Workshop is again taking place in Europe, but this time as a three-day event slightly further up north: in the capital of Sweden and my home town: Stockholm. During 25-27 July 2016, we intend to again dig in deep.

If you feel this is something for you, then please head over to the workshop site and submit your proposal and show your willingness to attend. This year, I’m also joining the Program Committee and I’ve signed up for arranging some of the local stuff required for this to work out logistically.

The HTTP Workshop 2015 was one of my favorite events of last year. I’m now eagerly looking forward to this year’s version. It’ll be great to meet you here!

Stockholm
The city of Stockholm in summer sunshine