Tag Archives: spam

My interview on Radio 4 about spam, 3 years ago

The interview (MP3 file) is here.

Just over 3 years ago, I was interviewed by Eddie Mair for the iPM programme on Radio 4 about spam.

This happened because I used to have a spam archive, collecting every spam email I had received since c.1997. I don’t exactly know how the interview came about — I was just contacted on the morning of the interview by a researcher from Radio 4. But being a big fan of the BBC of course I was happy to come in to London and talk to them.

I expected to be in a studio discussion, perhaps with others, but that wasn’t quite how it happened. About an hour after I arrived, I was shuffled into a small booth and connected up to Eddie who was “on the line” in some other London studio. Nevertheless we had a conversation about spam and other things. What was finally broadcast was heavily edited and condensed down to about 4 minutes, from an original which must have been 10 minutes or more.

I still to this day collect every email — spam or otherwise — that I receive, but I no longer put them online because that caused all sorts of problems. My total mail archive is 11 gigabytes, of which 7.5 gigabytes is classified as spam [note this is just a “du” of various compressed files]. The earliest spam is from Oct 2 1997 (an advert for “Teeth Bleaching and Whitening”), and the latest is almost certainly from about 1 minute ago, whenever you are reading this posting.

Leave a comment

Filed under Uncategorized

What’s the scam here?

6 different users used the WordPress “like” button on this post, 3 from the same IP in Seattle, 3 from the same IP in Germany. All the user names are spammy.

But I don’t get what the spam/scam is. They don’t get links back as far as I can tell.


Filed under Uncategorized

Half-baked ideas: reputation system for IP addresses

For other half-baked ideas, see my ideas tag.

I’m an obstinate log watcher. Watching web server logfiles in particular gives me a fascinating insight into how the bottom-feeders on the internet work, comment spammers, email harvesters, crap search engines and the like.

As a pretty random example, a single spammer (or more likely “illegal spam botnet”) just tried to fill in the comment form on one particular website I run 26 times in roughly 90 minutes. If you still have any myths about how sophisticated spammers are, read on.

Myth: spammers promote a particular website. Reality: spammers are still able to register huge numbers of random domains, and use very complex multi-step redirection.

Myth: spammers must operate from a limited set of IP addresses. Reality: spammers have access to virtually unlimited numbers of IP addresses.

Myth: each attack comes from a single IP address. Reality: attacks jump from IP addresses separated around the world, and those attacks are coordinated and look just like a single multi-step transaction, complete with correct cookies which must be passed between the hosts using a higher “back end” layer.

Myth: spambots don’t run Javascript, download images or solve captchas. Reality: …

The jury is still out on the last one. Certainly it’s not common, but a significant subset of comment spam does appear to come from real browsers, which run Javascript, download images and solve captchas. However I believe much or all of this must come from real people operating from sweatshops in countries with very low wages. That’s hard to tell just from looking at logfiles.

Each of the 26 completed transactions I saw involved multiple HTTP requests, and every single HTTP request came from a different IP address. But each completed transaction had a consistent cookie. In some cases the IP addresses were separated by half the earth, but HTTP requests followed each other in sub-second, indicating a sophisticated second level operation coordinating it all. Each request contained URLs for 4 websites, generated using random characters, and only some of these sites resolve.

So on to the half-baked idea.

Why don’t we have a proper, distributed reputation system for IP addresses?

A spammer can’t source an HTTP request from just any IP address, so they need to take over some grandma’s Windows PC, or someone’s web server, or persuade people to route some bogus AS. Every time an honest website owner (like me!) sees a bad IP, they register it.

Of course, spammers themselves will try to game the system, but they will do so from their own random IP addresses. We need to make sure that their “votes” count for less, and a reputation system should be able to decide this (eg. bad IP votes for bad IP? those votes count negatively).

If grandma tries to post a good comment, her IP may well cause that comment to be rejected. Good thing! She needs to clean up her (Windows) PC.

And what about ISPs who rotate IP addresses between good and bad customers? Those ISPs need to police their users and make sure they clean up their Windows PCs, or force the users on to better operating systems that don’t allow these exploits.

Note There are people classifying IPs now, eg. project honeypot and stop forum spam, but these guys don’t implement a reputation system and in some cases have nasty licensing terms which make the data that we provide for free into proprietary databases. No thanks.


Filed under Uncategorized