I need to write a captcha service for the integration software I am working on. After some thinking I think I don’t fully understand how captcha works technologically (I do understand how it works functionally) and therefore haven’t been able to make some design decisions. A few things that bother me are:
- Should I keep a session for each user? (i.e. remember their IP, domain, etc)
- Should I regenerate a passphrase on fail? (I know that sites like google and digg do it)
- Every call will hit database I am not sure if this will impact performance on the server but I will consider using things like memcahed. But I can’t think of anyway to not hit db or cache becuase you need to first read, then validate then update.
- Do I need an expiry time for the captcha? say 15 mins?
If 1 is yes then I think the logic becomes complex because I need to do things like:
has this passphrase been validated before? has it expired? is it from the same ip? etc
And if I need to remmeber the IP and validate against, after too many invalid request what do I do? Do I block them?
So I am thinking captcha should work this way, the simple way:
Sort of stateless which means each captcha generated will only survive 2 requests, the initial request and the subsequence request. And the result will either be failed or passed. If failed then create a new one.
I appretiate someone who can make some suggestions or explain how a proper captcha works. Thanks.
Update:
I need to explain the functional requirement a bit:
Terms:
- customer is someone else out in the www
- my service includes: captcha service and other service which customer can access via http request.
Workflow:
- customer makes request to captcha service
- captcha service generates token, passphrase and save to db
- customer make http request to captcha web to retrieve image
- customer makes request to our other service and pass in passphrase
- our other service will use passphrase to validate against our captcha service
etc…
Also I am thinking if 3 is necessary. Or should I just renturn the image stream in step 2.
Depends on the server side web programming language you’re using. Most them just offers builtin ways to manage the session, in for example PHP use
session_start()and access$_SESSIONand in for example JSP/Servlet you can get it byHttpServletRequest#getSession(). As you didn’t mention which one you’re using, I can’t give a more specific/detailed answer. All I can suggest is to just consult the docs/tuts/books of the programming language in question.You don’t need to remember the IP. Just setting a key/token in the session is enough –which in turn is usually already backed by a cookie, so you could in theory also just use a cookie for this if you intend to homegrow this all (note: do NOT put the answer in the cookie, but just some unique key to identify the client!).
Certainly you should. Otherwise it’s easy for bots to do a brute force on the captcha.
That said, is there any reason that you don’t use an existing captcha API which you could just plug in, such as reCAPTCHA?