I have a table with friends id, u1, u2 and about < 500,000 entries on a single mysql server
and i want to take userA and userB and check whether they have any mutual friends.
Is it faster to do
select u2 from friends where u1 = userA and u2 IN (select u2 from friends where u1 = userB)
than to run a shortest path algorithm on a graph (on one server)?
What is the standard way, big networks like LinkedIn and Facebook use to handle this?
Thanks!
If the table friends is indexed by both u1 and u2, then SQL query is to take intersection of 2 subsets and is pretty fast. It is because indexing is already done. If you do computations in memory, time depends on whether you have prebuilt indexes: if you have, you’ll be faster because of no database connection overhead. If indexing is included in computational time, and database is warmed (all data in memory), you can lost.
I’m talking on indexing, not shortest path algorithm, because shortest path algorithm computes more data than you need.