Spam protection sucks - and it will get worse

Posted on
software captcha spam forms spam protection

Do you have a website with a contact form, which sends you emails? Do you use Captchas to protect yourself from spam? Then I have bad news for you! But let me elaborate:

Captchas

Ever since I started hosting my websites I have a problem. How do I protect myself from spam? The obvious choice would be a captcha. Captchas are basically small riddles only a human can solve. The reasoning behind it goes along the lines: Because almost every other website uses some form of captcha as spam protection, it must be the best solution.

The most used service for this is Google Recaptcha. It shows you a puzzle and you have to select storefronts or traffic lights, helping Google to train its machine learning algorithms and wasting people’s time by making them solve puzzles. But I do not want services from Google on my website nor did I find good captchas from alternative vendors. So captchas were out of the question.

Also, Captchas have one fundamental problem: They are a short term solution for a long term problem. Captchas rely on the assumption, that there are problems that are easy for a human to solve, but nearly impossible for a machine. These problems get narrowed down, by the context within which humans live. Someone who never saw a traffic light in his life, won’t be able to select them and therefore won’t be able to solve the captcha.

Problems used for captchas mostly rely on pattern matching and recognition of some sorts, be it object detection, text detection, or whatever. At the same time, there is a lot machine learning and artificial intelligence research going on and we are on our way to build autonomous cars. These cars rely on sensing their surroundings. This means computers get better at pattern matching and recognition with every day that passes. We humans on the other hand stagnate which leads to smaller and smaller margins between humans and computers at the relevant skills for solving captchas. At some point in the future computers will be able to solve the same captchas we humans do. At this point, captchas will become useless.

So captchas aren’t a good solution to protect yourself from spam! At least not in the long term, but they could be part of a solution.

What other options do we have?

Fuzzing and obscurity on the website level

Fuzzing and obscurity on the website level is the low hanging fruit approach. Because software perceives websites different from humans it is possible to trick spam software into filling out fields, a human user would not. This is done by including fields that are hidden from the user in one way or another, but visible for software. Araweb has a nice article with things you can do.

I think this is a good approach for now without including third party JavaScript and the like. But this approach will also suffer from diminishing returns over time, depending on how fast spammers improve their software. This means it will help you get rid of the simple attacks, cutting out the weed. But it won’t get rid of all the spam.

On to the next option for spam protection from web forms…

Filtering

The third option is filtering after the form has been submitted. This role is usually done by your E-Mail Spam filter. You analyze the messages and determine by algorithm or machine learning classification whether the message is spam or not.

But filtering is not perfect. There are messages, that slip through and there are false positives. It is therefore necessary but not sufficient.

Filtering will advance with further research in AI and machine learning but suffers from the same fundamental problem as captchas. As spam gets more sophisticated, it will become harder to separate spam from legitimate messages.

Conclusion

In the end, I think the best approach in the short term is a defense in depth scenario, where we combine all solutions mentioned above. Doing this we always have to ensure, that users won’t get obstructed by useless and time-sucking captchas. In the long term, all solutions above will fail. Some sooner, some later. This will also be the time when we have to start working on the root cause of spam. The spammers themselves.