The following code allows me to extract .tgz files. However, it stops extracting after about two levels down; there are other subfolders that have .tgz files that need extracting. Additionally, when I extract a file, I have to manually move it to another path or it will get overwritten by other .tgz files that I extract to that location (all .tgz that I’m using have the same file structure/folder names once extracted). Any help is appreciated. Thanks!
import os, sys, tarfile
def extract(tar_url, extract_path='.'):
print tar_url
tar = tarfile.open(tar_url, 'r')
for item in tar:
tar.extract(item, extract_path)
if item.name.find(".tgz") != -1 or item.name.find(".tar") != -1:
extract(item.name, "./" + item.name[:item.name.rfind('/')])
try:
extract(sys.argv[1] + '.tgz')
print 'Done.'
except:
name = os.path.basename(sys.argv[0])
print name[:name.rfind('.')], '<filename>'
If I have not wrongly misinterpreted your question, then here is what you want to do –
more .tgz files within it that needs further
extraction (and so on..)
If I have correctly interpreted your problem, then…
Here is what my code does –
So if this is the directory structure of the .tgz file –
After extraction, the directory structure will be –
Although I have provided plenty of documentation in my code below, I would just brief out the structure of my program. Here are the functions I have defined –
I cannot suggest changes to what you have done, as my approach is a bit different. I have used
extractallmethod of thetarfilemodule instead of the bit complicatedextractmethod as you have done. (Just have glance at this – http://docs.python.org/library/tarfile.html#tarfile.TarFile.extractall and read the warning associated with usingextractallmethod. I don`t think we will be having any such problem in general, but just keep that in mind.)So here is the code that worked for me –
(I tried it for
.tarfiles nested 5 levels deep (ie.tarwithin.tarwithin.tar… 5 times), but it should work for any depth* and also for.tgzfiles.)I have not added a command line interface to it. I guess you can add it if you find it useful.
Here is a slightly better version of the above program –
http://guanidene.blogspot.com/2011/06/nested-tar-archives-extractor.html