I have a fairly large amount of data (~30G, split into ~100 files) I’d like to transfer between S3 and EC2: when I fire up the EC2 instances I’d like to copy the data from S3 to EC2 local disks as quickly as I can, and when I’m done processing I’d like to copy the results back to S3.
I’m looking for a tool that’ll do a fast / parallel copy of the data back and forth. I have several scripts hacked up, including one that does a decent job, so I’m not looking for pointers to basic libraries; I’m looking for something fast and reliable.
I think you might be better off using an Elastic Block Store to store your files instead of S3. An EBS is akin to a ‘drive’ on S3 that can be mounted into your EC2 instance without having to copy the data each time, thereby allowing you to persist your data between EC2 instances without having to write to or read from S3 each time.
http://aws.amazon.com/ebs/