Users table
user_id
pic_url
name
friends table
auto_id
userid
friendid
status
actions table
auto_id
userid
type
subject
body
datetime
I want to make a friend stream of updates thats shows updates, could be a blog post, status change, anything but should only show the ones that are from a friend of the logged in user
Here is what I came up with but my userbase is very large so performance is a must, is there a better way of doing this? Please show me
SELECT u.user_id, u.pic_url, u.name, a.auto_id, a.userid, a.type, a.subject, a.body, a.datetime
FROM actions AS a
LEFT JOIN users AS u ON u.auto_id=a.userid
LEFT JOIN friends AS f ON f.userid=a.userid
WHERE f.friendid=1 //1 would be my user ID
AND f.status=active
Please help I don’t think this is correct.
Lets say there is 50,000 users and my userid is #1 and I am friends with 20,000 users, it should return all entries in the actions table that is published by a user I am friends with, also need to modify to include actions from myself
I have heard some people talki about using some sort of hash table for faster lookups would something like that be possible here?
Thanks for any help
It’s called an index, and you should add one to each column that you’re planning on using a JOIN (or matching with an explicit constraint like
>, >=, =, <=, <or anIN ()clause which matches only items in a stated list). This way the database server can jump right to the correct entries in the index, rather than having to do a brute force search through all of the table rows. It’s exactly like an index in a book. If you wanted to find the pages in a book in which the name “Knuth” appears, you have two choices. If the book has an index, you look in the index and hope the name is there. If the book does not have an index, you’ll just have to read through the whole thing yourself, and that will take much longer.If you care about ordering/sorting (or doing any kind of relative numeric/string comparison), it should be a sorted index. Otherwise it can be a hashtable index, which is faster for tables with lots of rows, but carries no sorting information. These types of details are likely to have different syntaxes/options depending on which type of database server software is used.** (see note below)
Note that primary keys already have an automatically generated index, so you don’t have to add one yourself. Note also that if you have a multiple-column primary key, e.g. (State, City, Zipcode) then there will effectively be indices on the leftmost subsets of the primary key, e.g. you get an index on State, and (State,City), and (State,City,Zipcode) for free, but if you want to JOIN on Zipcode or City or (City,Zipcode) then you need to create your own indices in addition to those provided by the primary key.
In your case, it looks like you should have indices on these columns (I’ve *-ed the columns I am assuming are already primary keys). Unless you have any significance on the numerical order of your user IDs, those would be good candidates for hashtable indices.
**For MySQL, you add a clause to the CREATE INDEX statement that says USING HASH for a hashtable index, or USING BTREE (for a sorted index)… ignore RTREEs as those are for spatial data. Note also that MySQL doesn’t allow HASH indices on the common storage engines InnoDB and MyISAM. Really large datasets that need high-performance probably need to have data mirrored on an in-memory table with a HASH index. With 50,000 rows you probably don’t need to worry about it; a BTREE’s search time is O(log n) whereas HASH is O(1) and there’s probably not that much difference. BTREEs are very wide and are designed not to be deep; to require a single additional comparison in the search step, you may need to increase the # of rows by a factor of 10 or 100.