I have many data files (let’s call them input_files) that are stored in Amazon S3.
I would like to start about 15 independent Amazon EC2 linux instances. These instances should load the input_files (that are stored in S3) and process them independently.
I’d like all the 15 independent Amazon EC2 linux instances to write to the same output file.
Upon completion, this output file will be saved in S3.
Two questions:
(1) Is it possible for Amazon EC2 linux instances to connect to S3 and read data from it?
(2) How can I arrange that all the 15 independent Amazon EC2 linux instances would write to the same output file? Can I have this file in S3, and all instances will write to it?
(1) Yes. You can access S3 from anywhere on the internet using the S3 public API
(2) You are describing a database it seems. S3 is simply a file store, you don’t write to files on S3 – you save files to S3.
Maybe you should look into some type of database instead.