Tag Archives: spamassassin

DMARC helped me ditch gmail

I’ve been a gmail user for many years (maybe ten). Especially since the introduction of smart phones it has been a really convenient system to read email on the go. I rarely respond to email from my phone but I’ve done that occasionally too and it has worked adequately.

All this time I’ve used my own domain and email address and simply forwarded a subset of my email over to gmail, and I had gmail setup so that when I emailed out from it, it would use my own email address and not the @gmail.com one. Nothing fancy, just convenient. The gmail spam filter is also pretty decent so it helped me to filter off some amount of garbage too.

It was fine until DMARC

However, with the rise of DMARC over the recent years and with Google insisting on getting on that bandwagon, it has turned out to be really hard to keep forwarding email to gmail (since gmail considers forwarded emails using such headers fraudulent and it rejects them). So a fair amount of email simply never showed up in my gmail inbox (and instead caused the senders to get a bounce from a gmail address they didn’t even know I had).

I finally gave up and decided gmail doesn’t work for this sort of basic email setup anymore. DMARC and its siblings have quite simply made it impossible to work with emails this way, a way that has been functional for decades (I used similar approaches already back in the mid 90s on my first few jobs).

Similarly, DMARC has turned out to be a pain for mailing lists since they too forward email in a similar fashion and this causes the DMARC police to go berserk. Luckily, recent versions of mailman has options that makes it rewrite the From:-lines from senders that send emails from domains that have strict DMARC policies. That mitigates most of the problems for mailman lists. I love the title of this old mail on the subject: “Yahoo breaks every mailing list in the world including the IETF’s

I’m sure DMARC works for the providers in the sence that they block huge amounts of spam and fake users and that’s what it was designed for. The fact that it also makes ordinary old-school mail forwards really difficult and forces mailing list admins all over to upgrade mailman or just keep getting rejects since they use mailing list software that lacks the proper features, that’s probably all totally ignored. DMARC was as designed: it reduces spam at the big providers’ systems. Mission accomplished. The fact that they at the same time made world wide Internet email a lot less useful is probably not something they care about.

It’s done

gmail can read mails from remote inboxes, but it doesn’t support IMAP (only POP3) so simply switching to such a method wouldn’t even work. I just refuse to enable POP3 anywhere again.

Of course it isn’t an irreversible decision, but I’ve stopped the forward to gmail, cleared the inbox there and instead I’ve switched to Aqua mail on Android. It seems fairly feature complete and snappy. It isn’t quite as fancy and cool as the gmail client, but hopefully it will do its job.

The biggest drawback I’ve felt after a couple of weeks is the gmail spam filter. I do run spamassassin on my server and it catches the large bulk of all spams, but having the gmail spam system on top of that was able to block more silliness from my phone than spamassassin does alone.

Absorbing 1,000 emails per day

Some people say email is dead. Some people say there are “email killers” and bring up a bunch of chat and instant messaging services. I think those people communicate far too little to understand how email can scale.

I receive up to around 1,000 emails per day. I average on a little less but I do have spikes way above.

Why do I get a thousand emails?

Primarily because I participate on a lot of mailing lists. I run a handful of open source projects myself, each with at least one list. I follow a bunch more projects; more mailing lists. We have a whole set of mailing lists at work (Mozilla) and I participate and follow several groups in the IETF. Lists and lists. I discuss things with friends on a few private mailing lists. I get notifications from services about things that happen (commits, bugs submitted, builds that break, things that need to get looked at). Mails, mails and mails.

Don’t get me wrong. I prefer email to web forums and stuff because email allows me to participate in literally hundreds of communities from a single spot in an asynchronous manner. That’s a good thing. I would not be able to do the same thing if I had to use one of those “email killers” or web forums.

Unwanted email

I unsubscribe from lists that I grow tired from. I stamp down on spam really hard and I run aggressive filters and blacklists that actually make me receive rather few spam emails these days, percentage wise. There are nowadays about 3,000 emails per month addressed to me that my mail server accepts that are then classified as spam by spamassassin. I used to receive a lot more before we started using better blacklists. (During some periods in the past I received well over a thousand spam emails per day.) Only 2-3 emails per day out of those spam emails fail to get marked as spam correctly and subsequently show up in my inbox.

Flood management

My solution to handling this steady high paced stream of incoming data is prioritization and putting things in different bins. Different inboxes.

  1. Filter incoming email. Save the email into its corresponding mailbox. At this very moment, I have about 30 named inboxes that I read. I read them in order, top to bottom as they’re sorted in roughly importance order (to me).
  2. Mails that don’t match an existing mailing list or topic that get stored into the 28 “topic boxes” run into another check: is the sender a known “friend” ? That’s a loose term I use, but basically means that the mail is from an email address that I have had conversations with before or that I know or trust etc. Mails from “friends” get the honor of getting put in mailbox 0. The primary one. If the mail comes from someone not listed as friend, it’ll end up in my “suspect” mailbox. That’s mailbox 1.
  3. Some of the emails get the honor of getting forwarded to a cloud email service for which I have an app in my phone so that I can get a sense of important mail that arrive. But I basically never respond to email using my phone or using a web interface.
  4. I also use the “spam level” in spams to save them in different spam boxes. The mailbox receiving the highest spam level emails is just erased at random intervals without ever being read (unless I’m tracking down a problem or something) and the “normal” spam mailbox I only check every once in a while just to make sure my filters are not hiding real mails in there.

Reading

I monitor my incoming mails pretty frequently all through the day – every day. My wife calls me obsessed and maybe I am. But I find it much easier to handle the emails a little at a time rather than to wait and have it pile up to huge lumps to deal with.

I receive mail at my own server and I read/write my email using Alpine, a text based mail client that really excels at allowing me to plow through vast amounts of email in a short time – something I can’t say that any UI or web based mail client I’ve tried has managed to do at a similar degree.

A snapshot from my mailbox from a while ago looked like this, with names and some topics blurred out. This is ‘INBOX’, which is the main and highest prioritized one for me.

alpine screenshot

I have my mail client to automatically go to the next inbox when I’m done reading this one. That makes me read them in prio order. I start with the INBOX one where supposedly the most important email arrives, then I check the “suspect” one and then I go down the topic inboxes one by one (my mail client moves on to the next one automatically). Until either I get overwhelmed and just return to the main box for now or I finish them all up.

I tend to try to deal with mails immediately, or I mark them as ‘important’ and store them in the main mailbox so that I can find them again easily and quickly.

I try to only keep mails around in my mailbox that concern ongoing topics, discussions or current matters of concern. Everything else should get stored away. It is hard work to maintain the number of emails there at a low number. As you all know.

Writing email

I averaged at less than 200 emails written per month during 2015. That’s 6-7 per day.

That makes over 150 received emails for every email sent.

My best spam rules right now

I’ve already before mentioned my antispam setup, but today I just ran a little check on my “hispam” mailbox (the spams with so high spam points that I never even bother to check them for false positives), 43MB of 7900+ spams (received during ~40 hours), to see which ones of my own handicrafted rules that get triggered the most. I use a set of 40+ custom spamassassin rules to help it trigger more mails as spam, since some of the very short mails seem to be hard to catch otherwise, and some of the mails are in many ways looking like mail I would normally get.

Anyway, my top-10 rules are:

  1. 1624 6.0 DS_BODY_DRUGBRAND      BODY: mentions drug brand
  2. 1428 6.0 DS_SUBJECT_DRUGBRAND   Subject mentions drug brand
  3. 828 6.0 DS_FROM_HAXX     spoofed haxx.se address
  4. 769 4.0 DS_BODY_DISCOUNT    BODY: mentions percent discount
  5. 745 4.0 DS_SUBJECT_DISCOUNT   subject mentions percent discount
  6. 415 2.1 DS_TO_OWNER   To contains -owner
  7. 200 6.0 DS_BODY_NODOCTOR  BODY: mentions “no doctor”
  8. 195 2.0 DS_MAILER_THEBAT  sent with the bat
  9. 189 6.0 DS_BODY_DESIGNBRANDS  BODY: mentions designer brand(s)
  10. 158 3.0 DS_BODY_REPLICAS  BODY: speaks of replicas

The first number is number of hits. The second is the “spam points” I assign a match. Then there’s the name of the rule and my description for it. The “spam points” can best be seen relative to the other rules, as what makes a single mail a spam in the end involves multiple factors that aren’t shown here.

My Antispam Measures

I get a fair share of spam. I have something like 10 working private email addresses, I’m listed as recipient in numerous email aliases and they all end up in the same physical mailbox where I read them. I’ve also had my existing emails for many years and I’ve shown and used them publicly on the internet all the time. I’m a major spam email target now. A good day I get just 2000 spams, but bad days I’ve been well over 13000 spam emails.A can with spam

My biggest friends in this combat are: spamassassin and procmail.

I’ll describe how I have things setup, not as much as to inspire others but more to be able to get feedback from you on how I can or perhaps should improve my setup to get an even better email life.

  • I consider all mails with spam points >= 3 to be spam. I’ve also tweaked my spamassassin user_prefs to be harsher on (pure) HTML mail and a few other rules, and I’ve added a couple of my own rules to catch spams that previously did slip through a little too easy.
  • First, I filter out mail from trusted mailing lists that have their own antispam measures.
  • I catch what appears to be bounces (I have a huge regex) and if it looks like a bounce to an address I don’t send email from I nuke it immediately (and those could be a true bounce are saved in a dedicated mbox)
  • I have a white-list system that marks all incoming mails from previously marked friends as coming from a friend.
  • Mails from non-friends are passed through spamassassin. Those with spam points higher than N are put in the ‘hispam’ folder – of course with the intention that these are very very very unlikely to every have any false positives and can almost surely be deleted without check. N is currently 10 but I ponder on lowering it somewhat. Spams with less points than N are put in the ‘spam’ folder, and I need to check that before I kill it because it happens that I get occasional false positives that end up there.
  • So, mails that aren’t from friends (or from a trusted mailing list) and aren’t marked as spam are then stored in the ‘suspicious’ mailbox
  • Mails from friends or from trusted lists go directly into my mailbox, or into a dedicated mailbox (for lists with somewhat high traffic volumes).
  • Oh, a little additional detail: I “mark” my own outgoing mails with an additional custom header with no point whatsoever but to be able to detect when someone/something sends me mail using my own address…

My weakest point in all this right now is the fact that I don’t spam-check white-listed mails at all, so spams that are sent to me using my friends’ email addresses go through and annoy me.

BTW, I did use bogofilter in the past and for a while I actually ran both in parallel (both trained with rougly the same spam/ham boxes for the Bayes stuff) but quite heavily testing I performed at that time (a few years ago) showed that spamassissin caught a lot more spams than bogofilter, while bogofilter only caught a few extra so I dropped it then.