My best spam rules right now

I’ve already before mentioned my antispam setup, but today I just ran a little check on my “hispam” mailbox (the spams with so high spam points that I never even bother to check them for false positives), 43MB of 7900+ spams (received during ~40 hours), to see which ones of my own handicrafted rules that get triggered the most. I use a set of 40+ custom spamassassin rules to help it trigger more mails as spam, since some of the very short mails seem to be hard to catch otherwise, and some of the mails are in many ways looking like mail I would normally get.

Anyway, my top-10 rules are:

  1. 1624 6.0 DS_BODY_DRUGBRAND      BODY: mentions drug brand
  2. 1428 6.0 DS_SUBJECT_DRUGBRAND   Subject mentions drug brand
  3. 828 6.0 DS_FROM_HAXX     spoofed address
  4. 769 4.0 DS_BODY_DISCOUNT    BODY: mentions percent discount
  5. 745 4.0 DS_SUBJECT_DISCOUNT   subject mentions percent discount
  6. 415 2.1 DS_TO_OWNER   To contains -owner
  7. 200 6.0 DS_BODY_NODOCTOR  BODY: mentions “no doctor”
  8. 195 2.0 DS_MAILER_THEBAT  sent with the bat
  9. 189 6.0 DS_BODY_DESIGNBRANDS  BODY: mentions designer brand(s)
  10. 158 3.0 DS_BODY_REPLICAS  BODY: speaks of replicas

The first number is number of hits. The second is the “spam points” I assign a match. Then there’s the name of the rule and my description for it. The “spam points” can best be seen relative to the other rules, as what makes a single mail a spam in the end involves multiple factors that aren’t shown here.

Mail turned unreliable

I’ve always been proud of my ability to read and respond to email in a swift and reliable manner. I read and write emails every day, and most days I read mails more or less immediately as they land in my inbox.

However, during the recent year or so I’ve noticed that I’m no longer a reliable mail recipient. The amount of spam I get has made me tighten the screws so hard I get my share of false positives. The kind of mails that I need to rescue from my spam bin as they will otherwise suffer the death by delete. But how many do I miss? How often do I lose legitimate mails?

On some of the mailing lists I participate in, the spammers have started to send posts with my email in the From: field (circumventing the subscribers-only limitation), leading to me having to set my own mails as moderated to prevent spam to get posted… 🙁

alpine in, pine out

As one of the last living dinosaurs on the planet still using text-based email clients, I realized that pine has been replaced by alpine and I upgraded to that. When doing some reading up on the subject, I noticed that there’s another old grumpy guy still using this client. I’m not sure exactly what that says…

Anyway, the upside of this switch is that this client is now distributed under a proper open source license (Apache license 2.0), as that’s what I’ve been getting in my face from mutt users for years when I’ve explained what I use! (I mean the complaint that pine wasn’t proper open source)

My Antispam Measures

I get a fair share of spam. I have something like 10 working private email addresses, I’m listed as recipient in numerous email aliases and they all end up in the same physical mailbox where I read them. I’ve also had my existing emails for many years and I’ve shown and used them publicly on the internet all the time. I’m a major spam email target now. A good day I get just 2000 spams, but bad days I’ve been well over 13000 spam emails.A can with spam

My biggest friends in this combat are: spamassassin and procmail.

I’ll describe how I have things setup, not as much as to inspire others but more to be able to get feedback from you on how I can or perhaps should improve my setup to get an even better email life.

  • I consider all mails with spam points >= 3 to be spam. I’ve also tweaked my spamassassin user_prefs to be harsher on (pure) HTML mail and a few other rules, and I’ve added a couple of my own rules to catch spams that previously did slip through a little too easy.
  • First, I filter out mail from trusted mailing lists that have their own antispam measures.
  • I catch what appears to be bounces (I have a huge regex) and if it looks like a bounce to an address I don’t send email from I nuke it immediately (and those could be a true bounce are saved in a dedicated mbox)
  • I have a white-list system that marks all incoming mails from previously marked friends as coming from a friend.
  • Mails from non-friends are passed through spamassassin. Those with spam points higher than N are put in the ‘hispam’ folder – of course with the intention that these are very very very unlikely to every have any false positives and can almost surely be deleted without check. N is currently 10 but I ponder on lowering it somewhat. Spams with less points than N are put in the ‘spam’ folder, and I need to check that before I kill it because it happens that I get occasional false positives that end up there.
  • So, mails that aren’t from friends (or from a trusted mailing list) and aren’t marked as spam are then stored in the ‘suspicious’ mailbox
  • Mails from friends or from trusted lists go directly into my mailbox, or into a dedicated mailbox (for lists with somewhat high traffic volumes).
  • Oh, a little additional detail: I “mark” my own outgoing mails with an additional custom header with no point whatsoever but to be able to detect when someone/something sends me mail using my own address…

My weakest point in all this right now is the fact that I don’t spam-check white-listed mails at all, so spams that are sent to me using my friends’ email addresses go through and annoy me.

BTW, I did use bogofilter in the past and for a while I actually ran both in parallel (both trained with rougly the same spam/ham boxes for the Bayes stuff) but quite heavily testing I performed at that time (a few years ago) showed that spamassissin caught a lot more spams than bogofilter, while bogofilter only caught a few extra so I dropped it then.

Shootout cancelled for now

Yeah, I wasn’t thinking clearly when I started this test at this time, as I then had mail servers taken down and replaced and what not, and all that extra bouncing-around of mails no doubt will affect what ends up or not in my (spamassassin) end so I’ll just stop the test right now and once everything is settled again I may restart it. If the mood and energy for it returns!

The spamassassin gmail shootout

I’ve seen and heard so many people saying good things about gmail’s spam filter, and yet the few times I’ve redirected some of my (fairly large) mail feed through gmail I’ve not been that impressed. So, Igmail logo decided I’d fire off a test. I forward a good deal of my mail through both my local spamassassin-protected mailbox and to my gmail account and I do some detailed notes about what happens. It’ll be fun.

gmail hiccups

gmail is fancy and offers lots of space

gmail is often praised these days by people all over, and yeah it is a neat web app and the amount of disk space they offer for this free service is daunting!gmail logo

I do however have several arguments against using gmail that make me not using it myself for anything that is critical.

gmail blocks zip files

The main and major complaint can be phrased like this:

(reason: 552 5.7.0 Illegal Attachment p9si2809195fkb)

That’s the exact message gmail includes in the reject mail when I try to mail my own account with a zip file attached. The zip file itself is perfectly harmless (and contains source code).

I actually get completely legitimate zip files from people every now and then, perhaps even once per week or so and having it reject these mails without even properly explaining why to the user is quite a show-stopper!

gmail spam filter is inferior

The other issue I have with gmail is its annoying spam filter. This too is often claimed to be one of the better things with gmail, and given that they have millions of users and can do pretty detailed statistics on received mails they do have the opportunity to make a decent filter.

But, given that the spam filter is one huge you-have-no-choice-but-our-way there’s no way for me to alter configs, tweak it for my specific spams or make it better deal with the false positives that it picks. And I’ve had it catch far more false positives than my regular spamassassin filter on my main mail account and then I get probably a thousand times more mail and spam on that account.