I’m about to implement a feature on my website that recommends content to users based on the content they already have in their library (a la Last.fm). A single table holds all the records of the content they have added, so a row might look something like:
--------------------
| userid | content |
--------------------
| 28 | a |
--------------------
When I want to recommend some content for a user, I use a query to get all the user id’s that have content a added in their library. Then, out of those user id’s, I make another query that finds the next most common content among those users (fx. ‘b’), and show that to the user.
My problem is when I’m thinking about the big picture here. Say that eventually my site will hold something like 500.000 rows in the table, will this make the MySQL response very slow or am I underestimating MySQL here?
You will not know this until you’ve tested it, so start prototyping.
Typically, 500 000 rows is next to nothing. I worry a bit when my tables reach 50 millions, cause then it takes a while when I have to purge old data – though querying data is still fast.
But this all depends on the kinds of queries you need. Queries spanning all those 50 million rows would indeed be very slow, queries only touching 50k of those 50 millions are fast.
And for your problem, you need to measure your queries, tune your queries, tables/indexes and mysql itself.