I have an Apache server that writes to a custom log file (through a cgi script) . I also have a Python script that periodically fetches the tail of that log file. Here is my tail function:
def tail(f, window = 1):
f = open(f, 'r')
BUFSIZ = 1024
f.seek(0, 2)
bytes = f.tell()
size = window
block = -1
data = []
while size > 0 and bytes > 0:
if (bytes - BUFSIZ > 0):
# Seek back one whole BUFSIZ
f.seek(block*BUFSIZ, 2)
# read BUFFER
data.append(f.read(BUFSIZ))
else:
# file too small, start from begining
f.seek(0,0)
# only read what was not read
data.append(f.read(bytes))
linesFound = data[-1].count('\n')
size -= linesFound
bytes -= BUFSIZ
block -= 1
f.close()
return '\n'.join(''.join(data).splitlines()[-window:])
Taken individually, the Python script and the Apache logging both work fine. However, when I have them concurrently work on this same log file, the log file stops being updated.
How can I implement a tail read function in Python that doesn’t interfere with Apache writes?
You can take a look at the pytailer project or this recipe on ActiveState for some inspiration on implementation details.