I am working on a python script to do the following:-
I would like to read a log file every ten mins and on each read, I would like to extract any added data to the file since it was last read (preferably without having to read the entire log file each time). Example:-
At 09:00 I read the log file and the content is:-
1. 2011-07-04 11:15:04,507 Processing request 17897931 from status 7 to 13
2. 2011-07-04 11:15:04,508 Processing request 17897931 from status 13 to 17
3. 2011-07-04 11:15:04,508 Processing request d0fcb681 from status 7 to 13
4. 2011-07-04 11:15:04,509 Processing request d0fcb681 from status 13 to 17
5. 2011-07-04 11:15:04,509 Processing request 178819a1 from status 7 to 13
At 09:10 I read the log file again and now the content is:-
1. 2011-07-04 11:15:04,507 Processing request 17897931 from status 7 to 13
2. 2011-07-04 11:15:04,508 Processing request 17897931 from status 13 to 17
3. 2011-07-04 11:15:04,508 Processing request d0fcb681 from status 7 to 13
4. 2011-07-04 11:15:04,509 Processing request d0fcb681 from status 13 to 17
5. 2011-07-04 11:15:04,509 Processing request 178819a1 from status 7 to 13
6. 2011-07-04 11:15:04,510 Processing request 178819a1 from status 13 to 17
7. 2011-07-04 11:15:04,510 Processing request 17161df1 from status 7 to 13
8. 2011-07-04 11:15:04,511 Processing request 17161df1 from status 13 to 17
9. 2011-07-04 11:15:04,511 Processing request 182013e1 from status 7 to 9
How can my script extract the new lines (lines 6. to 9.)?
I have a shell script that is doing this task already by using the file’s inode. I am looking for a solution based on python.
My plan is to execute the script via crontab.
Do you guys have any idea how I can get this done?
Example:
This example will, from time to time, read a partial line if another process is writing to the log at the same time. I’ll leave that solution as an exercise 🙂