As the title says, what I would like to accomplish is given a package(usually the size may vary between 500Mb and 1Gb), I would like to copy over something around 40 servers at the same time(concurrently), I’ve been using a script that run a copy at the time, therefore I’m considering these possibilities:
1- Multiprocess library and create a single process for each copy function so that, they can run concurrently;
-although I think I might end up having an I/O bottleneck, and process cannot share the same data.
2-I m not using a single internet connection, but a huge corporate WAN.
Can anyone tell me whether is there any other more effective way(faster) to achieve the same thing? Or some other way to solve it?(I can run this task from a 2 core workstation).
Assume your machines have 1Gbit connections. You’ll get 800Mbit/s if you’re lucky/work at it, and it’ll take ~10s to copy each 1GByte and 6-7 minutes to update those machines. If that’s good enough, the only thing you need to do is work on using the 1Gbit efficiently to hit that target (what are you seeing from your current scripts ? OK 1Gbit may be ambitous on WAN, but you can do a similar analysis). Multiprocessing might or might not help here… but it’s not going to magically get you more bandwidth.
If it’s not good enough, I’d either consider:
go P2P (see miku;s answer), so as soon as one machine has a bit of
the data it can share it with other machines using it’s own
bandwidth. How much this helps depends to some extent on your
network topology (existence of other bottleneck points).
Look into multicast, if the network is enough under your control that you can get the stuff routed appropriately (this seems pretty
unlikely for WAN, but maybe one day in an IPv6 wonderland…).
Instead of copying the same data 40 times (assuming it is the same
each time), you just broadcast it once and all the receivers pick it
up simultaneously. Multicast UDP isn’t reliable (intended more for
IPTV I think) but there have been attempts to build reliable file
transfer tools using multicast tech e.g OpenPGM and MS’s
own implementation.