I have a server which files get uploaded to, I want to be able to forward these on to s3 using boto, I have to do some processing on the data basically as it gets uploaded to s3.
The problem I have is the way they get uploaded I need to provide a writable stream that incoming data gets written to and to upload to boto I need a readable stream. So it’s like I have two ends that don’t connect. Is there a way to upload to s3 with a writable stream? If so it would be easy and I could pass upload stream to s3 and it the execution would chain along.
If there isn’t I have two loose ends which I need something in between with a sort of buffer, that can read from the upload to keep that moving, and expose a read method that I can give to boto so that can read. But doing this I’m sure I’d need to thread the s3 upload part which I’d rather avoid as I’m using twisted.
I have a feeling I’m way over complicating things but I can’t come up with a simple solution. This has to be a common-ish problem, I’m just not sure how to put it into words very well to search it
boto is a Python library with a blocking API. This means you’ll have to use threads to use it while maintaining the concurrence operation that Twisted provides you with (just as you would have to use threads to have any concurrency when using boto ”without” Twisted; ie, Twisted does not help make boto non-blocking or concurrent).
Instead, you could use txAWS, a Twisted-oriented library for interacting with AWS.
txaws.s3.clientprovides methods for interacting with S3. If you’re familiar with boto or AWS, some of these should already look familiar. For example,create_bucketorput_object.txAWS would be better if it provided a streaming API so you could upload to S3 as the file is being uploaded to you. I think that this is currently in development (based on the new HTTP client in Twisted,
twisted.web.client.Agent) but perhaps not yet available in a release.