<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Dominique Stender &#187; automation</title>
	<atom:link href="http://www.st-webdevelopment.com/tag/automation/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.st-webdevelopment.com</link>
	<description>Good software is only the beginning</description>
	<lastBuildDate>Sun, 26 Feb 2012 16:24:44 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>The CAPTCHA arms race</title>
		<link>http://www.st-webdevelopment.com/general/2010/01/captcha-arms-race/</link>
		<comments>http://www.st-webdevelopment.com/general/2010/01/captcha-arms-race/#comments</comments>
		<pubDate>Sun, 17 Jan 2010 13:50:25 +0000</pubDate>
		<dc:creator>Dominique</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[automation]]></category>
		<category><![CDATA[CAPTCHA]]></category>
		<category><![CDATA[OCR]]></category>
		<category><![CDATA[reCAPTCHA]]></category>
		<category><![CDATA[SPAM]]></category>
		<category><![CDATA[text regognition]]></category>

		<guid isPermaLink="false">http://www.st-webdevelopment.com/?p=262</guid>
		<description><![CDATA[A discussion on various CAPTCHA methodologies and their success rate. Inspired by a paper by Jonathan Wilkins where he describes how the famous reCAPTCHA algorithm was broken.]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-272" title="captcha" src="http://www.st-webdevelopment.com/wp-content/uploads/2009/12/captcha.jpg" alt="captcha" width="190" height="400" />CAPTCHAs... we have all seen them. <a title="The Wikipedia on CAPTCHAs" href="http://en.wikipedia.org/wiki/CAPTCHA" target="_blank">CAPTCHA</a> means <em>Completely Automated Public Turing test to tell Computers and Humans Apart</em> and is a family of techniques to make sure a user (typically on a website) is indeed a human being and not a program trying to act like one.</p>
<p>When you leave your comment on this blog you will be asked to type in two words which are displayed as distorted graphic. Most bulletin boards and free mail providers ask you to do the same before they allow you to create an account.</p>
<h4>CAPTCHA 101</h4>
<p>The reason behind is the same most of the time: Preventing SPAM. Spammers use forums, blog comments and contact forms to post their ads. They use bots (quite similar to the bots that update the search index on Google, Yahoo and all other search websites) to automate that process.</p>
<p>So the idea of CAPTCHAs is to present a task to a website visitor that is difficult to solve for a machine, but easy to solve for a human. The graphical CAPTCHA is the most commonly used one.</p>
<p>There are other CAPTCHA variants such as audio-based ones or image recognition based CAPTCHAs. I've even seen a simple math question as CAPTCHA.<span id="more-262"></span></p>
<h4>The arms race</h4>
<p>In December 2009 <a title="Website of Jonathan Wilkins" href="http://bitland.net" target="_blank" class="broken_link">Jonathan Wilkins</a> announced that Googles' most prevalent CAPTCHA method, reCAPTCHA has been broken. It is now possible to identify the words presented by reCAPTCHA with an accuracy of around 20%. For spammers that is good enough. WIth a 20% sucess rate, every fifth attempt will result in a successfully placed ad. Mr. Wilkins argues that even a success rate of 1% is good enough since the resources used by spammers often are not their own, thus their utilization is free (think bot nets).</p>
<p>This is bad news for everyone. I really hope Google updates its reCAPCTHA algorithm to a variant that is harder to solve by machines. For the record, I also use reCAPTCHA here in this blog.</p>
<p>Update: As of December 31st 2009 reCAPTCHA seems to be updated. Google responded quite quickly. So far I can already say that the number of spam posts I get in this blog has reduced drastically, albeit not come to a stop. Anyways, thanks for a quick response Google!</p>
<p>This situation is critical because spammers do not need to be particularly good at breaking CAPTCHAs. If one out of five CAPTCHAs can be broken and spammers still make a living out of this the CAPTCHA itself is useless in the sense that SPAM will enter your system.</p>
<p>Jonathan Wilkins has a <a title="Paper on Strong CAPTCHA Guidelines by Jonathan Wilkins" href="http://bitland.net/captcha.pdf" target="_blank" class="broken_link">.pdf paper</a> where he gives guidelines for the creation of strong CAPTCHAs. It is a really interesting read even if you're not involved with CAPTCHA development  directly.</p>
<h4>Which is the best solution?</h4>
<p>Well I guess your mileage may vary.</p>
<p>For now I will stick to <a title="The Wikipedia on the reCAPTCHA project" href="http://en.wikipedia.org/wiki/ReCAPTCHA" target="_blank">reCAPTCHA</a> [<a title="the official homepage of the reCAPTCHA project" href="http://recaptcha.net/" target="_blank">official homepage</a>] although it is broken and I need to remove a few unapproved comments every day. I like <a title="The idea behind reCAPTCHA" href="http://recaptcha.net/learnmore.html" target="_blank">the idea behind the project</a> so I'm willing to accept the minor annoyance that it currently imposes.</p>
<p>Text-recognition CAPTCHAs such as reCAPTCHA require strong OCR solutions and to my personal surprise, that is still a field what needs a lot of improvements. So even if reCAPTCHA becomes to cumbersome for me, I'll stick to another visual CAPTCHA method.</p>
<p>Audio CAPTCHAs are not recommended by Jonathan Wilkins because he argues that the field of speech regognition is more advanced than that of OCR. Aside from their security, I don't want my visitors to do something unfamiliar, and listening to an audio file to fill out a form certainly is.</p>
<p>I like the idea of asking a simple question, such as "What color is an orange?" or "What is 3+5?". Not sure about the security though. The latter one can be <a title="Google calculate" href="http://www.google.com/search?q=What+is+3%2B5%3F" target="_blank">automatically solved</a> by Google itself for example. However, I'm half way convinced that this is an approach that has a bright future.</p>
<h4>Promising examples of what might be next</h4>
<p><strong>SQUIGL-PIX</strong><br />
On the reCAPTCHA website you can find a link to the <a title="SQUIGL-PIX CAPTCHA solution" href="http://server251.theory.cs.cmu.edu/cgi-bin/sq-pix" target="_blank">SQUIGL-PIX</a> project, apparently the latest project by the reCAPTCHA guys. It presents you with three images and asks you to outline a certain object. Only if you outline the object correctly (after chosing the correct image) the CAPTCHA is solved.</p>
<p>Give it a try. It is fun, easy (for us) and I sure hope it is hard for machines.</p>
<p><strong>CAPTCHA The Dog</strong><br />
Another interesting approach is <a title="Captcha the dog website" href="http://www.captchathedog.com" target="_blank">Captcha The Dog</a>. You are presented with nine images total and have to pick the one that shows a dog while all others show a cat. You have to pick the single dog several times (from different picture sets) and click 'ok' once there are only cats.</p>
<p>The idea is brilliant and the basic reasoning behind it is the same that makes SQUIGL-PIX good: Object recognition instead of text recognition.</p>
<p>Captcha The Dog goes one step further and allows you to use your own set of images, tapping on the financial feasibility to break <em>your individual </em>set of images. There is even a <a title="Captcha The Dog WordPress plugin" href="http://wordpress.org/extend/plugins/captchathedog/" target="_blank">WordPress plugin available</a> but I have to give a <strong>warning</strong>: According to the installation page the plugin requires allow_url_fopen and allow_url_include both to be active. Sounds like trading one evil for another. XSS anyone? Too bad, the idea is great.</p>
<p><strong>3D image rotation</strong><br />
The third approach I'd like to present is proposed by <a title="Blog by Taylor Hayward" href="http://taylorhayward.posterous.com/" target="_blank">Taylor Hayward</a> and apparently does not have a name. It asks you to identify an object appearing in <a title="3d images for CAPTCHAs" href="http://taylorhayward.posterous.com/3d-images-as-a-captcha" target="_blank">two rotated 3d renders</a>. You are presented with one control image, and a set of nine randomly rotated images out of which one is the (rotated) control image. I found it hard to imagine so go see the blog - it'll be much more clear.</p>
<p>Once again, the method relies on object recognition. Great idea.</p>
<h4>"Do not try this at home"</h4>
<p>When you google for "<a title="google search for &quot;php captcha library&quot;" href="http://www.google.com/search?q=php+captcha+library" target="_blank">php captcha library</a>" you find literally thousands of home grown 'solutions' for secure CAPTCHAs. Once again I can only urge you to read <a title="Paper on Strong CAPTCHA Guidelines by Jonathan Wilkins" href="http://bitland.net/captcha.pdf" target="_blank" class="broken_link">Jonathan Wilkins paper</a> on secure CAPTCHAs before you use one of them. The bottom line probably is that they will not work very well for you because their authors try to obfuscate the letters in a way that poses no or only very limited issues to OCR software.</p>
<p>Just assume that if Googles' solution to CAPTCHA is broken, yours will get broken too once the incentives to try are high enough for the bad guys.</p>
<p>That being said, developing your own approach while sticking to Jonathans guidelines will most likely be an interesting spare time project.</p>
<h4>Possible benefits</h4>
<p>Believe it or not, I really see possible benefits coming out of this arms race. As the spammers' tactics to solve CAPTCHAs improve, the good guys are forced to improve their generation of CAPTCHAs. After some time, the newer CAPTCHAs will also be broken. The cycle continues.</p>
<p>Naturally, the only way for the good guys to verify that the new version of a CAPTCHA is indeed more secure than the old is to think like a bad guy and try to break your own CAPTCHAs.</p>
<p>This might - and I believe it surely will - lead to better OCR software, better audio recognition and in general a higher standard in 'intelligent' algorithms that are able to solve every day problems.</p>
<p>Audio recognition might help the deaf to read what people say, by other means than lip reading. A universal translator (famous in Star Trek) is not completely out of scope either, although it is still really far away.</p>
<p>Text-recognition is highly in demand on mobile devices. If a program can identify highly distorted characters from a CAPTCHA, I'm sure the same ideas can be applied to read hand writing.</p>
<input id="gwProxy" type="hidden" />
<input id="jsProxy" onclick="jsCall();" type="hidden" />

<!-- using Like-Button-Plugin-For-Wordpress [v4.5.2] | by Stefan Natter (http://www.gb-world.net) -->
<iframe src="http://www.facebook.com/plugins/like.php?href=http://www.st-webdevelopment.com/general/2010/01/captcha-arms-race/&amp;layout=standard&amp;show_faces=false&amp;width=550&amp;action=like&amp;colorscheme=light&amp;height=30&amp;locale=en_US" scrolling="no" frameborder="0" allowTransparency="true" style="border:none; overflow:hidden; width:550px; height:30px"></iframe>
<!-- using Like-Button-Plugin-For-Wordpress [v4.5.2] | by Stefan Natter (http://www.gb-world.net) -->
]]></content:encoded>
			<wfw:commentRss>http://www.st-webdevelopment.com/general/2010/01/captcha-arms-race/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

