I have a large tar.gz file to analyze using a python script. The tar.gz file contains a number of zip files which might embed other .gz files in it. Before extracting the file, I would like to walk through the directory structure within the compressed files to see if certain files or directories are present. By looking at tarfile and zipfile module I don’t see any existing function that allow me to get a table of content of a zip file within a tar.gz file.
Appreciate your help,
You can’t get at it without extracting the file. However, you don’t need to extract it to disk if you don’t want to. You can use the
tarfile.TarFile.extractfilemethod to get a file-like object that you can then pass totarfile.openas thefileobjargument. For example, given these nested tarfiles:You can access files from the inner one like so:
and they’re only ever extracted to memory.