I’ve got a function in Jython, this function uses Popen to run another program which writes an xml file to it’s stdout, which is directed to a file. When the process is done I close the file and call another function to parse it. I’ve been getting a bunch of error messages referring to access to closed files and/or improperly formatted xml files(which appear fine when I look at them) during the parsing. I thought that output.close() may return before closing the file and so I added a loop that waited for output.closed to be true. That seemed to work at first but then my program printed the following
blasting
blasted
parsing
parsed
Extending genes found via genemark, 10.00% done
blasting
blasted
parsing
Exception in thread "_CouplerThread-7 (stdout)" Traceback (most recent call last):
File "/Users/mbsulli/jython/Lib/subprocess.py", line 675, in run
self.write_func(buf)
IOError: java.nio.channels.AsynchronousCloseException
[Fatal Error] 17_2_corr.blastp.xml:15902:63: XML document structures must start and end within the same entity.
Retry
blasting
blasted
parsing
Exception in thread "_CouplerThread-9 (stdout)" Traceback (most recent call last):
File "/Users/mbsulli/jython/Lib/subprocess.py", line 675, in run
self.write_func(buf)
IOError: java.nio.channels.ClosedChannelException
[Fatal Error] 17_2_corr.blastp.xml:15890:30: XML document structures must start and end within the same entity.
Retry
blasting
I’m not sure what my options are from here. Was I right to think that the xml is not written before I parsed it? If so who can I do make sure it is.
def parseBlast(fileName):
"""
A function for parsing XML blast output.
"""
print "parsing"
reader = XMLReaderFactory.createXMLReader()
reader.entityResolver = reader.contentHandler = BlastHandler()
reader.parse(fileName)
print "parsed"
return dict(map(lambda iteration: (iteration.query, iteration), reader.getContentHandler().iterations))
def cachedBlast(fileName, blastLocation, database, eValue, query, pipeline, remote = False, force = False):
"""
Performs a blast search using the blastp executable and database in blastLocation on
the query with the eValue. The result is an XML file saved to fileName. If fileName
already exists the search is skipped. If remote is true then the search is done remotely.
"""
if not os.path.isfile(fileName) or force:
output = open(fileName, "w")
command = [blastLocation + "/bin/blastp",
"-evalue", str(eValue),
"-outfmt", "5",
"-query", query]
if remote:
command += ["-remote",
"-db", database]
else:
command += ["-num_threads", str(Runtime.getRuntime().availableProcessors()),
"-db", database]
print "blasting"
blastProcess = subprocess.Popen(command,
stdout = output)
while blastProcess.poll() == None:
if pipeline.exception:
print "Stopping in blast"
blastProcess.kill()
output.close()
raise pipeline.exception
output.close()
while not output.closed:
pass
print "blasted"
try:
return parseBlast(fileName)
except SAXParseException:
print 'Retry'
return cachedBlast(fileName, blastLocation, database, eValue, query, pipeline, remote, True)
I think this problem started when I switched from calling wait on the subprocess to using the poll method so I can stop the process while it’s running. Since I already had the results for many of the datasets I worked with it was a while before I had to start the subprocess again so it was hard to tell. Anyway, my guess is that output was still being written into when I closed it and my solution was to switch to pipes and write the file myself.