I have a python script that runs a subprocess to get some data and then process it. What I’m trying to achieve is have the data written to a file, and then use the data from the file to do the processing (the reason is that the subprocess is slow, but can change based on the date, time, and parameters I use, and I need to run the script frequently)
I’ve tried various methods, including opening the file as w+ and trying to seek to the beginning after the write is done, but nothing seems to work – the file is written, but when I try to read back from it (using file.readline()) i get EOF back.
This is what I’m essentially trying to accomplish:
myFile = open(fileName, "w")
p = subprocess.Popen(args, stdout=myFile)
myFile.flush() # force the file to disk
os.fsync(myFile) # ..
myFile.close()
myFile = open(fileName, "r")
while myFile.readline():
pass # do stuff
myFile.close()
But even though the file is correctly written (after the script runs, i can see the contents of the file), readline never returns a valid line. Like I said I also tried using the same file object, and doing seek(0) on it, to no luck. This only worked when opening the file as r+, which fails when the file doesn’t already exist.
Any help would be appreciated. Also if there’s a cleaner way to do this, i’m open to it 🙂
PS: I realize I can Popen and stdout to a pipe, read from the pipe and then write line by line the data to the file as I do that, but I’m trying to separate the creation of the data file from the reading.
@James Aylett pointed me to the right path, it appears that my problem was that subprocess.Popen wasn’t finished running when I call .flush().
The solution, is to call p.wait() right after the subprocess.Popen call, to allow for the underlying command to finish. After doing that, .flush does the right thing (since all the data is there), and I can proceed to read from the file.
So the above code becomes:
And then it all works!