I am doing a project where I am tracking users on a website, I log their every hit across the site. Whenever they hit a URL, I will create it in the database, and tag it with some tags.
Every URL is named a ‘resource’ in my database, and a resource can be tagged with multiple tags. A Visitor is connected to resources when they visit a URL, and when a user hits a resource, i also connect the date to it.
The thing I want to do is finding the resources with the correct tags, that has been watched this month or today for an example.
The query I am currently building is here:
SELECT r.resource_id, r.resource_url
FROM resource r
JOIN visitor_resource vt ON vt.resource_id = r.resource_id
JOIN resource_tags rt ON rt.resource_id = vt.resource_id
JOIN tags t ON t.tag_id = rt.tag_id AND t.tag_name = '42'
GROUP BY r.resource_id
To give you an idea for the structure you can see here:
tracking database structure http://kaspergrubbe.dk/db-overview.png
So basically I will have to count how many visitor_resources there is in a given month by looking at visitor_resources.last_visited for the last month, and get the 5 most visited resources.
How to approach this?
The above query also seems very slow without query-caching, I suspect it is because t.tag_name is not an index, and that is a varchar, but is there anyway to speed up the process other than adding this index?
Thanks.
You’ve left out any criteria based on the date, so you should add that and see how the performance changes. Also, if you’re looking for a count then you should add that as well. I think that mySQL supports the LIMIT clause (as opposed to TOP), so add that for limiting it to the 5 most visited resources. With everything together it will probably look something like this:
Sorry, I don’t do a lot of mySQL these days, so I don’t know what the start and end date parameters would look like in the last line.
Unless your Tags table is very large an index probably won’t matter much. An index on the Visitor_Resources.last_visited might be a good idea though.
Also, I changed your table names in the query to be more consistent. Personally I like plural names, but singular are ok too. No matter which you choose though, pick one and stick to it.