I am writing a Log Unifier program. That is, I have a system that produces logs:
my.log, my.log.1, my.log.2, my.log.3…
I want on each iteration to store the number of lines I’ve read from a certain file, so that on the next iteration – I can continue reading on from that place.
The problem is that when the files are full, they roll:
The last log is deleted
…
my.log.2 becomes my.log.3
my.log.1 becomes my.log.2
my.log becomes my.log.1
and a new my.log is created
I can ofcourse keep track of them, using inodes – which are almost a one-to-one correspondence to files.
I say “almost”, because I fear of the following scenario:
Between two of my iterations – some files are deleted (let’s say the logging is very fast), and are then new files are created and some have inodes of files just deleted. The problem is now – that I will mistake these files as old files – and start reading from line 500 (for example) instead of 0.
So I am hoping to find a way to solve this- here are a few directions I thought about – that may help you help me:
-
Either another 1-to-1 correspondence other than inodes.
-
An ability to mark a file. I thought about using chmod +x to mark the file as an
existing file, and for new files that don’t have these permissions – I will know they are new – but if somebody were to change the permissions manually, that would confuse my program. So if you have any other way to mark. -
I thought about creating soft links to a file that are deleted when the file is deleted. That would allow me to know which files got deleted.
-
Any way to get the “creation date”
-
Any idea that comes to mind – maybe using timestamps, atime, ctime, mtime in some clever way – all will be good, as long as they will allow me to know which files are new, or any idea creating a one-to-one correspondence to files.
Thank you
I can think of a few alternatives:
Use POSIX extended attributes to store metadata about each log file that your program can use for its operation.
It should be a safe assumption that the contents of old log files are not modified after being archived, i.e. after
my.logbecomesmy.log.1. You could generate a hash for each file (e.g. SHA-256) to uniquely identify it.All decent log formats embed a timestamp in each entry. You could use the timestamp of the first entry – or even the whole entry itself – in the file for identification purposes. Log files are usually rolled on a periodic basis, which would ensure a different starting timestamp for each file.