I’m a database newby. I am wondered about some things in Databases. For example, I saw the structure of how Facebook stores a friend relationship (see : https://developers.facebook.com/docs/reference/fql/friend). There only two columns, 1st users id and the second users id. Well that’s okay.
As wikipedia says facebook has around 1 billion active users. So in that friend relationship table, there maybe about 100 billion rows. Searching through that table is super-fast (eg: seeing my list of friends). I wonder how they do it at that kind of speed.
Is it because that Facebook has some magic back-end ? Or is it the magic of Databases ? Can I do it also (have millions of users and search through the database within seconds) using PHP and MySQL ?
(I may be asking a silly question, but it always bothers me to know, please leave an answer)
The database makes use of indexes. This way it can find quickly data related to a given userID.
Depending on the index structure, space occupation etc… there is a gain, i.e. instead of searching N columns, it searches – for instance –
log(N). A search by dichotomy of 100 billions rowswould be
Instead a searching 10^12 rows, only 36 rows have to be analyzed.
In the case you mention, friends, each user may have several friends, so
meaning user1 is friend with userX, userY etc…
ie you don’t have a unique index per user. But you have a unique index per couple of users.
on Mysql that would be
meaning the couple (user1,user2) is only once in the table. The syntax would be
friendsindex being the index name, friends being the table. Or as you said, declaring the table primary key to be
(user1,user2)(primary keys are unique per table).—
The strategy to win a game that consists of finding the exact price of a given object is based on the same principle. Say the price is between 1 and 10000. You tell a price, and the handler says
+or-. You have to find the price in as less tries as possible. E.g. the price is 6000.You could start from
1and give all prices until6000(ie 6000 tries), but you could also proceed by dichotomyyou divide the remaining range by 2 at each iteration. Instead of 6000 tries, you can find in 12 tries (log2(6000)).
—
About logarithms
For instance, how to find x in
2^x = 1024? Orx = log2(1024)meaning logarithm of 1024 in base 2 (answer: 10). In our story, a 1024 rows table having an index based on a binary tree would need 10 tries (max) to find the right element (instead of 1024 max).