Are CAPTCHA’s enough?

We have all seen CAPTCHA’s on the various web sites we visit. Something that never occurred to me though is that it seems that there is a whole industry built around circumventing or cracking CAPTCHA’s. Read the CAPTCHA topic on Wikipedia or do a Google search on ‘CAPTCHA cracking’ and  you will find many articles on the topic.

There are various methods that are implemented to automatically crack them that range from using  simple OCR software to using neural networks. I have also read some articles that mention that they can be cracked with JavaScript.

For those CAPTCHA’s that are too complex to automatically crack, there is always the problem of human intervention.

We did some research and it doesn’t seem that there is one foolproof way to prevent your CAPTCHA’s from being cracked. Your best bet is to use a combination of deterrents to make it as difficult as possible for potential ‘crackers’. The more obstacles you put in their way, the more and more customized the cracking mechanism must become and this should therefore discourage at least some.

Here is a (by no means exhaustive) list of potential solutions that could be implemented:

  • If your free solution is not working well, try out commercial CAPTCHA tool – it will probably be more secure.
  • Use images (as text can be read using OCR). Ask the user how many times something occurs in an image. Examples are KittenAuth and Asirra. Other options that use animations exist as well. The obvious disadvantage of  this method is that it cannot be used by visually impaired users.
  • Check devices rather than users. Iovation.com provides a device reputation service that checks if a device is known to be used for fraud or spam. The advantage of this method is that it can be leveraged off a centralised device database – any spam or fraud created on any site using this will therefore result in a recommendation to block.
  • Use random field names like GUID’s. Mapping of fields to their GUID equivalent will be kept in the session. This will mean that bots must be altered specifically for your site to support the ever changing field id’s.
  • Simpler spam bots will rarely set a user agent (HTTP_USER_AGENT) or a referring page (HTTP_REFERER). You should also ensure the referrer is the page where your form is located.
  • Have a hidden field with a random name e.g. GUID and that as a random value also GUID that must be echo’d back to the site in the post. This would make it more difficult for manual human intervention.
  • Have the same process as mentioned above but with a cookie value. Note that some users don’t have cookie support enabled so this option may not be useful in all cases.
  • Bots and automated processes are normally much faster than a human could ever be so timing the user post-back could be a useful indication if the page was submitted by a bot or human. Encrypt the time into a hidden field on the form. Store the encryption key in session so that client side never knows the encryption key. When the value is posted back decrypt the time and check how long it took for the post-back if it took less than n amount of seconds you can assume it is a bot.
  • Add an IP address check – e.g. if  invalid registration attempts are done from ip x.x.x.x then block the IP. A manual verification from the site administrator could be done for it to be unblocked. This will prevent brute force cracking attempts.
  • Record the IP addresses of users when registrations are done. If the IP is used to register more than a number of accounts then a manual verification from the site administrator must be done to unblock account.
  • Some automated mechanisms do not support scripting so ensure that script is running on the client. For example: when the registration page is browsed the answer to a calculation is stored in session. E.g. 8+8=16. Then have a simple JavaScript complete the same calculation when the page is rendered in the browser. This value will be assigned to a hidden field. Compare the answers on the server side when the page is posted back. A disadvantage is that some browsers do not support JavaScript.
  • Most bots fill in all form fields. Hide one of the fields using CSS. If it is filled in, then there is a good chance that it is a bot.

Try those out and let us know which ones work for you. And if you have any additions, we’d love to hear them.

Categories: Development

Hennie Grobler

Technical Architect

I'm an architect at SWAT where I fulfill a range of duties that range from being involved in development to R&D projects to assisting with the technical aspects of MIH mergers and acquisition's. Past experiences are mostly with SOA related projects but have since moved to mostly working with web and related technologies. I also try to keep up with the latest mobile trends and any gadgetry that may be out and about. And of course everything to do with gaming.
blog comments powered by Disqus