I believe that http://www.s3stat.com/ is a very good service to handle this. There are other circumstances, however, that doing things yourself is the only option.
I was amazed to see the there is not much info about this on the web at this moment, so I decided to write my experience.
Reasons for this approach:
- no one comments about s3cmd, which is IMO a very good tool for communicating with AWS through command line.
Download the log files:
#
# you will need to setup your credentials in a s3.config file
# and the example here extracts all logs for Dec/2012
#
$ s3cmd -c s3.config get s3://{bucket}/log/access_log-2012-12*
Note: for 1GB (14k files) of data, this took something like 10hs…
Handle Log files
As of this day, the log files are formatted as:
access_log-YYYY-MM-DD-HH-MM-SS-XXXXXXXXXXXXXX
First thing is to compact them into a single file:
$ logresolvemerge.pl access_log-* > access_log.log
or daily files, in which you could build a script:
prefix="access_log-2012-12"
path=/tmp/s3-logs/raw-logs
dest=/tmp/s3-logs/fixed-logs
for d in {01..31}; do
check_prefix=$path/$prefix-$d
if [ ! `ls $check_prefix* 2>/dev/null | wc -l` -eq 0 ]; then
logresolvemerge.pl $check_prefix* > $dest/$prefix-$d.log
fi
done
This should output logs in the format access_log-2012-12-01.log, and so on.
Note: look for the location of logresolvemerge.pl. Usually it is in: /usr/share/awstats/tools/logresolvemerge.pl
Configure AWStats
You will need to create a config file, following the examples already available (in this example: my.awstats.config). Change the following details:
# this for all files
LogFile="/tmp/s3-logs/fixed-logs/access_log.log"
# or to process a daily file:
LogFile="/tmp/s3-logs/fixed-logs/access_log-%YYYY-0-%MM-0-%DD-0.log"
# compliant with latest in http://docs.aws.amazon.com/AmazonS3/latest/dev/LogFormat.html
LogFormat = "%other %extra1 %time1 %host %logname %other %extra2 %url %methodurl %code %other %bytesd %other %other %other %refererquot %uaquot %other"
#
# an any other you need…
#
I think it is unlikely that they will add Amazon format into AWStats by default, but there is an official request for that here: http://sourceforge.net/p/awstats/feature-requests/864/
Generate the reports
In my case I just needed a single month report to print in PDF, so I did:
$ sudo /usr/lib/cgi-bin/awstats.pl -config="my.awstats.config" -staticlinks -output -year=2012 -month=12 > ./2012-12.html
Since I don't want to serve the AWStats live on a web browser, I did a hack to make the images available, just so I could print the PDF out of the browser:
$ sed -i "s//awstats-icon//http://people.ofset.org/awstats-icon//g" ./2012-12.html
(grabbing some public location with access to the awstats-icon directory, thanks ofset.org).
And voila!