What is a CAPTCHA? In short, a CAPTCHA (hereinafter referred to as captcha, because I hate typing words in caps!), is a device which allows you to eliminate or at least reduce spam on forms that are available to the public to post to, eg. comment forms on blogs, quick contact forms on your website, etc. Captcha stands for “Completely Automated Public Turing test to tell Computers and Humans Apart”, quite a mouthful isn’t it? Essentially, captchas use an image that is usually heavily distorted in order to fool automated systems from posting to your forms as mentioned above. These automated systems, otherwise known as SpamBots, are not able to read the text in the image and fill out the text field which humans are required to do, and therefore their comment or email is never submitted.

So why is having a good strong captcha important? Well, so these spambots don’t render the comments section on your blog useless with links for viagra or links to malware sites, or worse yet, sites which contain less than work-friendly content! Or if it’s an e-mail form, you certainly do not want to receive 900 emails a day with these exact same links and other garbage in them, now do you?

Great, now you understand the importance of captcha’s, but you should also understand how they really work. During my tenure here at Trademark, I’ve installed 3 separate styles of captcha’s on nearly all of our clients numerous websites. We’ll take a look at those:

The first captcha we used here has since been dubbed “The Crayola Capatcha”, noting its large size and squiggly text. Yeah, it’s kind of ugly. It’s also pretty large. But for the most part, it was easy enough for a human to read, and hard enough for a robot to read (oh yes, robots can read!). Number of spam or otherwise unwanted content that got through this captcha? Zero.

Now we have the second edition of the Trademark Captcha. This captcha came about after endless mocking and hatred for the Crayola Captcha by a select…well, only one person :) At any rate, the Crayola Captcha was slowly erased from humankind and in place of it was the, well, we don’t have a name for this one, so lets call it Captcha2. This captcha is slightly smaller, uses a nicer font type, and is overall easier to read for everybody. Noting the ease of use for this captcha, our total number of spam that came through while using this captcha? Zero again. Score.

Now we move on to Captcha3, or what I am hereby dubbing SpamMagnet©. This Captcha came about after more incessant mocking and hatred for Captcha2, because it’s too big and ugly and hard to read, apparently. So here we are. A gorgeous captcha that is easy for users to read, and at long last, it’s small! It’s like going from a big bulky SUV to a nice compact Maserati, complete with 1000w stereo system. Everything was great, everyone was happy, except those 2 poor captchas from the distant past. Who cares, we don’t need them, we have SpamMagnet! We rejoiced in our excellence. For about an hour. Then we got spam on a contact form with this captcha on it. Then another. …and another. Pretty soon, the SpamMagnet lived up to it’s name, and any form we put it on nearly instantly had spam seeping through its megahertz.

Why? Because it was too easy to read. Thats why. OCR software and spambots using such software all over the world had been trying to crack our previous two captchas on several of our higher profile sites. Once they saw this new captcha and how easy it was to crack, the spam came flowing in.

So what do we do now? We go back to Captcha2. Or we invent a new captcha. Moral of the story is, pretty does not necessarily mean most functional. We had a beautiful captcha that cost us bytes and bytes of bandwidth (yes, that’s sarcasm, but spam is annoying no matter what). You need to find a happy medium between your captcha’s ease of use for users, and ease of use for spambots. Unless you enjoy spam. In which case, I’ll send you the code for SpamMagnet!

2 responses to “CAPTCHA images and your website

Posted by Eric

Thanks for commenting Zac,
Theres no doubt in my mind that the Crayola Captcha was easy to read. A handful of people out of thousands complained about its ease of use. But the guy that signs our paychecks decided it was too ugly and therefore we changed to captcha 2. Crayola captcha was indeed ugly, but effective, regardless of case or actually being a real word. All of the captchas I’ve built are based on a string of possible characters, with characters such as 1, l, i, etc stripped out so that there is no way to confuse such characters with another that looks like it.

Math questions as captchas are, in my experience, about as crackable as a static word inside a p tag. I’ve been urged to try them out, and they nearly always fail. And you’re right about brute force attacks, as most people don’t want to be bothered by answering 521671+314229/99 * (6296+302-619), squared, and therefore they’ll be a simple math problem which results in a grand total of 100 possible answers, which could be brute forced in about 1/1,000,000 of a second on any PII processor with 4mb RAM.

I wish we were that creative and ironic though Zac. The reason we don’t use any kind of Captcha on our blogs (or our clients blogs for that matter) is because we use a service called Askimet, which stops spam in its tracks without the use of a capatcha or other security device. Go check it out, especially if you have a blog!

Posted on October 17, 2008 at 3:17 pm

Posted by Zac

As long as the “Crayola” captcha results in a real word (as yours shows “insane”), it’s very easy for a human to determine the answer. For example, if I couldn’t quite tell if the first letter was an “l” or an “i,” the second choice is obvious since there’s no English word “lnsane.” It’s when the captcha is simply a string of random letters (or letters plus numbers) that people have problems — too many letters look like other letters if they’re distorted enough, and also depending upon the font in use.

I’ve also seen math questions employed, such as, “What is four plus three (type the numerical answer in the box below).” At first glance this would seem to be very effective, especially since the addends are spelled out rather than expressed in numbers. But I also wonder if a brute-force attack might defeat it, since I don’t recall seeing anything beyond a single digit answer. Thoughts?

I also found it curious that my ability to post a comment here was unrestricted by the use of a captcha. Irony in play?

Posted on October 17, 2008 at 2:54 pm

Leave a Reply

Your email address will not be published. Required fields are marked *

Read Related Posts