Nine strategies that have defeated spammers

On this page, I give you information about the way I have filtered out most of the spam from my mailbox. You are free to use my experience to develop your own anti-spam strategy. I am not willing to enter into correspondence about these strategies. If you can use the information below as is, then you are welcome to do so. If you can't, then you can always consider learning how to (but not from me, I don't have time).

It's important for you to understand that I don't just mean that I've stopped deleted the spam before it gets to me - in fact, because of the steps I have taken, spammers have actually stopped sending it.

Now, please note that I am using a broad-brush approach in the notes that follow. If I give every little detail, then some spammer, somewhere will cotton on to what I am doing, and will work around it. I don't want that, so I am not giving every little detail here. Now, I am a professional analyst/programmer. I know how to write programs. I am writing primarily for people who can also write programs. The fact that I haven't given every last detail won't matter to such folk - the last details may well vary in any case, depending on how they choose to implement the solution.

Strategy 1: Blocking known spam domains.

This does not require much explanation - some ISP's do so little to stop spammers that they may as well be spammers themselves. Once you know which they are, just block them.

Strategy 2: Don't post your email address anywhere unless you actually want spam, or have taken some other steps accordingly

Spammers harvest email addresses from web pages and any other place you write them down. There are ways to put your email address on a page so that it can't be harvested - the easiest way is to use a graphic image of some text that spells out the address, rather than using text itself. If you must post your email address somewhere public, then get another email address that can be protected against spam in some other way (see strategy 9 below for an example.

Strategy 3: Filter email based on content

Using MS Outlook, one can write macros to process the emails as they are received. I developed several routines that did this, and learned as they went, building up a log of how many times they had seen each word, and whether the context in which the word appeared was spam or legitimate.  Here's an extract from my logs. The first column shows two numbers followed by a word (or more accurately, a set of characters). The first figure shows how many times that word has been seen in a spam context, and the second shows how many times it has been seen in a legitimate context. The second column shows what percentage of occurrences were spam occurrences. (This is a stretched percentage scale, which runs from -100 to +100%, rather than 0 to 100, the formula is just (s-l)/(s+l)*100%

So, this table tells you that I have seen spams (before I started bouncing them!) that contained the combination "&#" a total of 5067 times, always in a spam context - those would be foreign language spams using double-byte character sets. Similarly, a dollar sign is a very good indicator of spam, since it isn't the currency in my country. Only 8 times has it been in legitimate emails, whereas 1167 times it was in spams.

You'll note a few, a very few, words, get red numbers in column 2. Those are the rare cases when a word is more common in legitimate emails than spams - one of these, naturally enough, is my name, which friends know, and spammers don't know. (Though it is also the name of other people who are victims of the same spam as me, which is why it gets 66 spam hits).

My method is simple - look at every word in an incoming mail. Work out the percentage figure for it. Discard anything between -75% and +75%. That just leaves the words that are almost always spam words or almost always legitimate words. Then just count them up, and calculate if it is a spam or a legitimate email.

In my scheme, when I am analysing the emails, I disregard numbers and convert any accented characters to their non-accented equivalents, to frustrate spammers who think it is ¢lëvé® to change such characters. I also discount most punctuation characters. I also count the frequency of comments compared to the frequency of words, to catch out those spammers who put comments in the middle of words to try to fox pattern matching. At the end, it all evaluates to a spam weighting. On my system, legitimate emails tend to get spam weightings from zero to about 50. Spam messages get weightings from 100 upwards - the worst one I ever got was a 3-pager that ended up rated at 4823!

0005558 0001852 YOU 50.013495%
0005122 0001587 OF 52.690416%
0004261 0001982 A 36.504885%
0005067 0000000 &# 100.000000%
0003434 0001151 IN 49.792803%
0003681 0000720 YOUR 67.280164%
0003245 0001128 FOR 48.410702%
0003163 0000768 THIS 60.925973%
0002546 0001109 IS 39.316005%
0002454 0000769 COM 52.280484%
0001932 0000773 I 42.846580%
0001953 0000704 ON 47.007904%
0001600 0000937 THAT 26.133228%
0001821 0000566 WITH 52.576456%
0001558 0000794 IT 32.482993%
0001382 0000948 HAVE 18.626609%
0001589 0000648 BE 42.065266%
0001655 0000431 OR 58.676894%
0001554 0000472 ARE 53.405726%
0001452 0000526 WE 46.814965%
0001317 0000466 WILL 47.728547%
0001165 0000587 IF 32.990868%
0001289 0000424 AS 50.496205%
0001429 0000259 OUR 69.312796%
0001251 0000408 NOT 50.813743%
0001392 0000233 EMAIL 71.323077%
0001049 0000563 AT 30.148883%
0001158 0000322 BY 56.486486%
0001064 0000385 ALL 46.859903%
0001083 0000328 CAN 53.508150%
0001011 0000335 MY 50.222883%
0001057 0000213 HTTP 66.456693%
0001110 0000091 HERE 84.845962%
0001167 0000008 $ 98.638298%
0001162 0000006 CLICK 98.972603%
0000910 0000203 MAIL 63.522013%
0000804 0000309 PLEASE 44.474394%
0000969 0000115 NO 78.782288%
0000741 0000317 WAS 40.075614%
0000955 0000062 E 87.807276%
0000916 0000087 FREE 82.652044%
0000855 0000104 MORE 78.310740%
0000712 0000233 DO 50.687831%
0000792 0000137 OUT 70.505920%
0000751 0000176 WWW 62.028047%
0000734 0000186 GET 59.565217%
0000670 0000249 AN 45.810664%
0000577 0000342 ME 25.571273%
0000733 0000174 NOW 61.631753%
0000628 0000278 ONE 38.631347%
0000620 0000275 ANY 38.547486%
0000820 0000072 LIST 83.856502%
0000647 0000227 MARCH 48.054920%
0000626 0000230 JUST 46.261682%
0000623 0000195 US 52.322738%
0000534 0000277 NET 31.689273%
0000507 0000271 SO 30.334190%
0000739 0000019 % 94.986807%
0000505 0000246 UP 34.487350%
0000381 0000360 BUT 2.834008%
0000066 0000672 CHRIS -82.113821%
0000491 0000246 MESSAGE 33.242877%
0000578 0000150 ONLY 58.791209%
0000461 0000261 HAS 27.700831%
0000417 0000299 ORDER 16.480447%
0000624 0000084 ADDRESS 76.271186%
0000549 0000119 NEW 64.371257%
0000398 0000266 ABOUT 19.879518%
0000579 0000079 INFORMATION 75.987842%
0000434 0000213 TIME 34.157651%
0000389 0000250 BEEN 21.752739%
0000609 0000003 MONEY 99.019608%
0000341 0000240 WOULD 17.383821%
0000457 0000119 LIKE 58.680556%
0000553 0000019 RECEIVE 93.356643%
0000325 0000233 THERE 16.487455%
0000549 0000002 REMOVE 99.274047%
0000320 0000225 THEY 17.431193%
0000351 0000183 WHEN 31.460674%
0000488 0000041 NAME 84.499055%
0000413 0000116 FEBRUARY 56.143667%
0000415 0000112 HOW 57.495256%
0000461 0000062 S 76.290631%
0000346 0000165 NEED 35.420744%
0000338 0000167 SEE 33.861386%
0000229 0000271 KNOW -8.400000%
0000419 0000075 WANT 69.635628%
0000332 0000160 MAY 34.959350%
0000399 0000089 MAKE 63.524590%
0000288 0000192 USE 20.000000%
0000387 0000091 SEND 61.924686%
0000264 0000213 AM 10.691824%
0000373 0000100 OVER 57.716702%
0000326 0000143 WHAT 39.019190%
0000350 0000105 HOME 53.846154%
0000375 0000078 THESE 65.562914%
0000395 0000057 APRIL 74.778761%
0000345 0000104 THEIR 53.674833%
0000347 0000100 PEOPLE 55.257271%
0000278 0000165 WHICH 25.507901%
0000304 0000137 WHO 37.868481%
0000265 0000169 SOME 22.119816%
0000379 0000052 & 75.870070%
0000268 0000163 SERVICE 24.361949%
0000120 0000306 RE -43.661972%
0000399 0000025 SPAM 88.207547%
0000224 0000198 HE 6.161137%
0000282 0000134 WORK 35.576923%
0000233 0000181 HAD 12.560386%
0000309 0000104 OTHER 49.636804%
0000357 0000044 LINK 78.054863%

Strategy 4: Filter emails based on routing 

Plenty of spammers use email servers in places like Russia, Argentina, China, Brazil and so on. I just don't know anybody in these places , so why should I get email from them? Obvious, when you think about it!

Strategy 5: Filter emails based on other people in the distribution list 

This only works with those spammers who are too greedy to send out messages addressed to only one person. (Fortunately, that's most of them!). This is also obvious, when you think about it. Most people sending legitimate emails send them either to one person or to a groups of people who tend to know each other. For any message that comes to me, I check the other names in the "to" and "cc" lists. If there are lots of names, and I don't have any of them in my address book, then it's a spam.

Strategy 6: Filter emails that contain hypertext links 

I discourage html emails anyway, but one dead giveaway that an email is a spam is if it includes a link, but the text of the link does not match the target of the link. Friends sometimes send emails with links in, but in those, the text always matches the target.

Strategy 7: Filter emails that contain scripts 

No friend of mine has ever sent me an email containing scripting language. Plenty of spammers have.

Strategy 8: Send a bounce message back to the originator of any spam 

If the person who sends you spam gets a bounce message to say that your address does not exist, then they will delete you from their circulation list. The more spams they have been sending you, the quicker this will happen. My most persistent spammers all stopped inside a fortnight! Of course, not every spammer checks for bounces. But those that don't are unlikely to be running a cost-effective operation anyway, so they tend to be short-lived annoyances. Warning: a bounce message is not the same as an email asking to be removed. You should never send those. Never, not ever, no way, no how.

Strategy 9: how to use a Hotmail inbox for (almost) a spam-free experience

a) Decide on a magic word. (My magic word is squirrel - make yours something else) Then publish your hotmail address, but tell people to put your magic word in the subject line of any email they send you.

b) Set up 3 hotmail filters.

    -    if the subject line contains <magic word> deliver mail to inbox

    -    if the from contains "@" delete email

    -    if the from does not contain "@" delete email

End result, all messages containing your magic word are delivered into your inbox. Everything else is deleted as soon as it arrives. The only spam you see is spam that happens to include your magic word in the subject - so, here's a tip: "viagra" and "money" would not be good magic words.

Remember - the spammers just want to get you to part with money. The more you can do to make their operation less cost-effective, the sooner they will go broke.

Have fun, and happy spam-hunting!

This page is a part of Chris Tolley's web-site.                                             Latest update: Tuesday, June 10, 2003 03:48

Links on my pages can point to other web-sites. If you find that the administrators of those web-sites have made changes which mean you can't access them, please let me know, so I can update or remove the links. As far as I know, none of my links point to sites likely to contain offensive material - but if you discover otherwise, please let me know, as I would like to remove such links from my pages.

Here the Spiritual line joins the Main line  
return to my home page.

Frequently asked Questions
 
send me an e-mail
 ©1996 to 2003: Christopher J. Tolley