How could I read individual files’ contents from a commands’ stdout without hitting the disk?
I’ve come up with something like this:
def get_files_from(sha, files):
from subprocess import Popen, PIPE
import tarfile
p = Popen(["git", "archive", sha], bufsize=10240, stdin=PIPE, stdout=PIPE, stderr=PIPE)
tar = tarfile.open(fileobj=p.stdout, mode='r|')
p.communicate()
members = tar.getmembers()
names = tar.getnames()
contents = {}
for fname in files:
if fname not in names:
contents[fname] = None
continue
else:
idx = names.index(fname)
contents[fname] = members[idx].tobuf()
contents[fname] = tar.extractfile(members[idx]) #<--- HERE
tar.close()
return contents
The problem is that adding a .read() call on the line marked
contents[fname] = tar.extractfile(members[idx]) #<--- HERE
will give the error:
tarfile.StreamError: seeking backwards is not allowed
So how to get the contents of the file?
You misspelled your
mode=parameter, you wrotemore=instead:.tell()won’t be called if you specify the mode correctly. 🙂You’ll then have to loop over the tarfile object to extract the members, you cannot read arbitrary files from the tarfile:
You cannot use any of the
.getnames(),.getmember()or.getmembers()methods as these require a full scan of the file, putting the file pointer at the end and leaving you without a means to read the entry data itself.