Self professed "spam queen" Laura Betterly was caught by WSJ for an interview. I really can't believe that there is that much money to be made in spam, but it figures when one can break even at even a 0.001% response rate. Anyway its silly how she thinks its fine to abuse resources that are meant for public access to send people material they never wanted to begin with (0.001% response). Furthermore she goes on to say that this job allows her to spend time with her children and enjoy her life, yea but what about the IT guys and ISPs whose life she makes hell? What about the people who get this garbage and have to sort through it? Don't these people have lives they want to enjoy? Somehow she also seems to harbour the idea that anyone who has ever even mistakenly forgot to uncheck the checkbox to opt out of third party offers is eligible to get any and all offers. Ironically even offers for anti-spam software.
Furthermore Ms. Betterly needs to come to terms with the fact that she is stealing bandwidth. A 40K message going through a conservative 8 hops to get to its recepient wastes 320K worth of bandwidth along the way. Multiply this by say a 2,500,000 possible recepients and you've got 762.94GB of data being transferred and this doesn't count any sort of bounce backs. The concept of spam mail in the traditional postal sense doesn't transfer over to the email world because with postal mail an immediate upfront transport cost is exacted upon the sender. This is not so with email and it shouldn't be so with email. She is abusing a medium of communication for her monetary benefit, and yes it is abuse, just read the comments from her server admin to hear first hand what methods they use to send mass emails and how they manage to avoid filters.
With all that in mind, I'm sooo happy to say that there is the coming of a new defense against spam thats very very promising, not to mention smart, called a Bayesian filter. Currently I'm the happy user of POPFile, which is a very simple Perl script implementing the Bayes Theorem (formal definition). POPFile acts like a proxy between your mail program and your mail server, classifying every mail you get into as many different categories as you defined. Personally I just want it to filter spam from non-spam, I've got regular mail program organization filters to categorize the rest. The way Bayesian filters work is that you give it a lot of your regular non-spam messages and let it make a word frequency table out of that and then you give it a bunch of spam and let it make a word frequency table of this. These filters go through the entire message from the headers to the actual body including HTML, as opposed to just the headers like regular filters do. The more of each type of email you give it the more accurate it will be (paper explaining this). When these types of filters see a new message they can check out the words in the message and compare them to the existing word tables to see how many of the words get categorized as appearing more often in spam or normal mail. Then it calculates the total probability of the message and then classifies it as either spam or normal mail. This is how it works for me, Bayesian filters can be taught to classify messages into as many categories as you'd like.
These Bayesian filters are a totally different idea than the group based block lists that filters like SpamNet, SpamAssassin and Brightmail. Those systems work on the concept of block lists. For SpamNet users can directly contribute addresses to this list, I'm not too clear on how SpamAssasin and Brightmail develop their block lists. There is a fundamental flaw in this system because potential friendly addresses could be blocked because enough people in the community get spam from them. For example, an entire address block like mail.com could be blocked because enough people who contribute to the community block list feel that @mail.com addresses cause too much spam. Bayesian filters are by nature designed specifically for each and every user.
Scary thing is Microsoft owns a general patent on all probabilistic email filters. Oddly MS has owned this patent since 1998, but even today Outlook (all of the different versions) doesn't seem to have any robust junk mail filtering. Apple on the other hand is already using a probabilistic algorithm in Mail.app that is now shipping with Jaguar. Mozilla's mail app will have its very own version of a Bayesian filter soon as well. I wonder when Eudora will respond with something along these lines, not to mention MS and their gamut of Outlooks.
When Bayesian or other types of probabilistic/learning mail filters become common, what will spammers do? Based on the trend I've personally seen they will just make their messages more and more innocuous to the point where a spam message will actually be a very subtle, almost subliminal, sales pitch. Hmm, I wonder how effective that will be. I guess as long as they are 0.001% effective, it won't matter.
In the mean time though, I raise my champagne in a toast to a wonderfully spam free Inbox.
Update: Godpimp Frank pointed me this very useful page for webmasters who don't want email harvesters to siphon addresses from your page.
Rant:
It sucks when you have someone on your buddy list that you don't want to talk to, but they catch your attention like free money. You know you shouldn't talk to them, but you can't take them off your list. And then when they're no longer online, you sort of feel sad. Yea I'm weird. Sosumi.
Tee hee Quoteses:
"If he gets a girlfriend, he is going to cease to be." - Daniel H.
"...those motherfucking average bears!" - Juan R.
P.S. If any of yous want to learn how to setup POPFile on your computers, contact me and I'd be more than happy to help.
Posted by Mr. Keyur at November 13, 2002 11:06 AM