We have a product using PostgreSQL database server that is deployed at a couple of hundred clients. Some of them have gathered tens of gigabytes of data over the years. So in the next version we will introduce automated cleanup procedures which will gradually archive and DELETE old records during nightly batch jobs.
If I understand correctly, autovacuum will kick in and analyze and reorganize the tuples, so the performance will be like it was when fewer records existed.
The actual disk space will not be released, if I understand correctly, since that only happens with a VACUUM FULL, and that is not triggered by autovacuum.
So I was thinking about an automated process that would do this.
I found the bloat view that is used by nagios check_postgres at http://wiki.postgresql.org/wiki/Show_database_bloat.
Is this view any good? Am I correct in understanding that if the tbloat is > 2, it could use a VACUUM FULL? And if the ibloat is too high, it could use a REINDEX?
Any comments on the following job to run as daily batch job?
vacuumdb -Z mydatabase#vacuum with analyze-onlyselect tablename from bloatview order by tbloat desc limit 1vacuumdb -f -t tablename mydatabaseselect tablename, iname from bloatview order by ibloat desc limit 1reindexdb -t tablename -i iname mydatabase
Off course, I still need to wrap it in a nice perl script in crontab (we’re using ubuntu 12), or does postgresql have some sort of scheduler I could do this with?
Or is this total overkill and is there a much simpler procedure?
You probably don’t need it. It is good to do this once — after first archiving job so you’ll get your disk space back but after that your daily archiving job and autovacuum will prevent dead tuples bloat.
Also instead of
vacuum fullit is often better to runcluster table_name using index_name; analyze table_name. This will reorder rows according to an index. This way related table rows can be saved physically close on disk, which can limit disk seeking (important on classic disk drives, largely irrelevant on SSD) and a number of reads for your typical queries.And remember that both
vacuum fullandclusterwill make your tables unusable while they run.