I was recently asked this question in an interview. Lets suppose I have 2000 servers. I want to transfer a 5GB file to all these servers from a centralized server. Come up with an efficient algorithm.
My response:
I will use perl/python to scp the file over from the centralized server to the first server.
In parallel, I will also start sending files to other servers. I feel doing one by one is very inefficient hence doing in parallel would speed up.
Is there a better way to do this ?
Sure, you would use some sort of script, since you don’t want to do that manually.
But instead of sending all the files from one server to all the others, you would start sending the file to k Servers. As soon as these k Servers received the file (let’s say at time t), they can start distributing the file too, so after approx. time 2*t already k^2 servers have the file instead of 2*k in the original solution. After time 3*t already k^3 Servers have got the file… You continue with that algorithm until every server has got it’s file.
To make the whole process yet a bit faster, you could also divide the file in chunks, so that a server can start redistributing it before it has received the whole file (you will end up with something like torrent)