We’re moving to s3 to start serving some of the statically generated content for our web app. We have been looking at a mechanism for building a metric system about the usage of our site and we were planning on parsing the access logs for S3 by passing additional information to be logged on the content GET requests. We happened across the following entry in the developers guide:
Best Effort Server Log Delivery
The server access logging feature is
designed for best effort. You can
expect that most requests against a
bucket that is properly configured for
logging will result in a delivered log
record, and that most log records will
be delivered within a few hours of the
time that they were recorded.However, the server logging feature is
offered on a best-effort basis. The
completeness and timeliness of server
logging is not guaranteed. The log
record for a particular request might
be delivered long after the request
was actually processed, or it might
not be delivered at all. The purpose
of server logs is to give the bucket
owner an idea of the nature of traffic
against his or her bucket. It is not
meant to be a complete accounting of
all requests.
We are wondering what other people have experienced with respect to the delivery of access logs? Our alternative is to build an HTTP server and try to meter the metrics ourselves with a different call, but we think that parsing the log files could prove to be less work. We’d like to know if people have seen situations where delivery didn’t take place to try to gauge about how accurate we could hope to be because some of the metrics we gather are used in some of our business processes.
I was surprised how large my log files on S3 had gotten in under a month. It wasn’t necessary for my app to parse the logs on Amazon but I like your approach. From what I’ve seen, you can expect the log files to be accurate and complete. Based on their CYA warning, the logs should not be used for anything critical.