I am currently logging every “failure” on my site (login/signup/etc) to a database so I can monitor what is giving my users a hard time – or which ips/users are doing suspicious things.
However, I find that I only really need the data for about a week or so since I check it every day and, at most, need to see the activity from the past week.
I was thinking that perhaps I should try to save some of the load my database is taking from all this logging and place the data in something like memcached or couchdb. However, I’m not sure how I could query the data into result sets.
How could you use a key-value store or document-database to monitor logs and track relations between activity? And is it even worth it to add another data store to the server or just keep the database from handling it? I mention memcached and couchdb because both can have very light RAM usage if needed (unlike mongodb and redis).
Let me give an example. IP 0.0.0.0 failed login 37 times in 3 hours (each recorded) it also failed to reset a password for a valid email 84 times in 2 hours. Thanks to my logs I can now research (and block) this bot. On the other hand, I see that out of the 5827 users registered – there were 2188 failed register attempts. This tells me that there is something wrong with my signup form causing many of the people to fail the form at least once.
Again, the bounty is for a working example of using key-value or document store to log data.
Just write to a log file and analyse it offline. Logging is a solved problem, and writing a line of text to a file on disk is about as cheap, IO and CPU-wise, as you can possibly get. Log rotation is also a solved problem, and there’s really no point in reinventing that wheel.
Once the log data is on disk, you can copy it off to another machine for parsing and analysis using whatever toolkit you want, and if you want to use a document store, that’s the place to introduce it. There’s no need to burden your front-facing production machines with that job.