I am new to distributed computing but was wondering how page ranking algorithm works across multiple machines. Like
-
When do they decide data should be replicated (if needed at all),
-
If data is not copied, do they ask serves at other places to give them the result?
-
Or do they send “modules” to different serves (say part of a HUGE-HUGE – linked-graph) to one server, another module to another server and the combine the results they received?
-
I search something — how does it fetches pages from my country (you know, search pages from
<insert country>only)
This is not homework. Just a question I had. I welcome all ideas, even if they are very general or very detailed or do not answer all of my questions.
Right now, I know next to nothing, my hope is to know something after going through the answers.
There’re three whales: MapReduce, Google File System, BigTable