I’m working on a site analysis project. Where users will be able to record/view their site traffic reports by using my API (like Google Analytics).
The problem is, I’m not sure how I should setup the database structure.
I already have some tables setup for user management purposes:
User table: || userID || userName || datReg ||
Account information table: || accountInfoID || userID || fName || lName || emailAddress ||
So I was thinking I could do something like:
Site analysis table: || analyID || userID || visitorIP || visitorCountry || pageviewCount || pageviewData
But would that be scalable? I mean, with that structure there could be tens of thousands of rows inserted every day, so would that not result to being very slow after a few months?
With the idea above, I would run a query similar to this, for each unique visit:
INSERT INTO siteAnaly (userID,visitorIP,visitorCountry,pageviewCount,pageviewData) VALUES ("the accounts holders user ID","the visitors IP","the visitors country","the visitors apge view count","a JSON array of the visitors pageview URI's")
and then, on every pageview, that row inserted fro the query above, would be updated. Incrementing the pageviewCount and appending to pageviewData
The other idea I had (which you may think is stupid) was to have a new table for every user, named with the users ID.
What do you think is the best approach to take with a project like this?
It will indeed have lots of data input, what you will need to do in this case is to split data in different tables and even databases at one point to make sure you don’t clutter your main datasource. Rarely will you need to query large portions of data that will not have been processed so your goal is to:
A good conference i went to see and did a review on it was posted on my blog, you might want to read it:
http://crazycoders.net/2012/03/confoo-2012-continous-data-processing/
Good luck