Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6059185
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T08:40:56+00:00 2026-05-23T08:40:56+00:00

The following code allows me to extract .tgz files. However, it stops extracting after

  • 0

The following code allows me to extract .tgz files. However, it stops extracting after about two levels down; there are other subfolders that have .tgz files that need extracting. Additionally, when I extract a file, I have to manually move it to another path or it will get overwritten by other .tgz files that I extract to that location (all .tgz that I’m using have the same file structure/folder names once extracted). Any help is appreciated. Thanks!

import os, sys, tarfile

def extract(tar_url, extract_path='.'):
    print tar_url
    tar = tarfile.open(tar_url, 'r')
    for item in tar:
        tar.extract(item, extract_path)
        if item.name.find(".tgz") != -1 or item.name.find(".tar") != -1:
            extract(item.name, "./" + item.name[:item.name.rfind('/')])
try:

    extract(sys.argv[1] + '.tgz')
    print 'Done.'
except:
    name = os.path.basename(sys.argv[0])
    print name[:name.rfind('.')], '<filename>'
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T08:40:56+00:00Added an answer on May 23, 2026 at 8:40 am

    If I have not wrongly misinterpreted your question, then here is what you want to do –

    • Extract a .tgz file which may have
      more .tgz files within it that needs further
      extraction (and so on..)
    • While extracting, you need to be careful that you are not replacing an already existing directory in the folder.

    If I have correctly interpreted your problem, then…
    Here is what my code does –

    • Extracts every .tgz file (recursively) in a separate folder with the same name as the .tgz file (without its extension) in the same directory.
    • While extracting, it makes sure that it is not overwriting/replacing any already existing files/folder.

    So if this is the directory structure of the .tgz file –

    parent/
        xyz.tgz/
            a
            b
            c
            d.tgz/
                x
                y
                z
            a.tgz/                  # note if I extract this directly, it will replace/overwrite contents of the folder 'a'
                m
                n
                o
                p
    

    After extraction, the directory structure will be –

    parent/
        xyz.tgz
        xyz/
            a
            b
            c
            d/
                x
                y
                z
            a 1/                  # it extracts 'a.tgz' to the folder 'a 1' as folder 'a' already exists in the same folder.
                m
                n
                o
                p
    

    Although I have provided plenty of documentation in my code below, I would just brief out the structure of my program. Here are the functions I have defined –

    FileExtension --> returns the extension of a file
    AppropriateFolderName --> helps in preventing overwriting/replacing of already existing folders (how? you will see it in the program)
    Extract --> extracts a .tgz file (safely)
    WalkTreeAndExtract - walks down a directory (passed as parameter) and extracts all .tgz files(recursively) on the way down.
    

    I cannot suggest changes to what you have done, as my approach is a bit different. I have used extractall method of the tarfile module instead of the bit complicated extract method as you have done. (Just have glance at this – http://docs.python.org/library/tarfile.html#tarfile.TarFile.extractall and read the warning associated with using extractall method. I don`t think we will be having any such problem in general, but just keep that in mind.)

    So here is the code that worked for me –
    (I tried it for .tar files nested 5 levels deep (ie .tar within .tar within .tar … 5 times), but it should work for any depth* and also for .tgz files.)

    # extracting_nested_tars.py
    
    import os
    import re
    import tarfile
    
    file_extensions = ('tar', 'tgz')
    # Edit this according to the archive types you want to extract. Keep in
    # mind that these should be extractable by the tarfile module.
    
    def FileExtension(file_name):
        """Return the file extension of file
    
        'file' should be a string. It can be either the full path of
        the file or just its name (or any string as long it contains
        the file extension.)
    
        Examples:
        input (file) -->  'abc.tar'
        return value -->  'tar'
    
        """
        match = re.compile(r"^.*[.](?P<ext>\w+)$",
          re.VERBOSE|re.IGNORECASE).match(file_name)
    
        if match:           # if match != None:
            ext = match.group('ext')
            return ext
        else:
            return ''       # there is no file extension to file_name
    
    def AppropriateFolderName(folder_name, parent_fullpath):
        """Return a folder name such that it can be safely created in
        parent_fullpath without replacing any existing folder in it.
    
        Check if a folder named folder_name exists in parent_fullpath. If no,
        return folder_name (without changing, because it can be safely created 
        without replacing any already existing folder). If yes, append an
        appropriate number to the folder_name such that this new folder_name
        can be safely created in the folder parent_fullpath.
    
        Examples:
        folder_name = 'untitled folder'
        return value = 'untitled folder' (if no such folder already exists
                                          in parent_fullpath.)
    
        folder_name = 'untitled folder'
        return value = 'untitled folder 1' (if a folder named 'untitled folder'
                                            already exists but no folder named
                                            'untitled folder 1' exists in
                                            parent_fullpath.)
    
        folder_name = 'untitled folder'
        return value = 'untitled folder 2' (if folders named 'untitled folder'
                                            and 'untitled folder 1' both
                                            already exist but no folder named
                                            'untitled folder 2' exists in
                                            parent_fullpath.)
    
        """
        if os.path.exists(os.path.join(parent_fullpath,folder_name)):
            match = re.compile(r'^(?P<name>.*)[ ](?P<num>\d+)$').match(folder_name)
            if match:                           # if match != None:
                name = match.group('name')
                number = match.group('num')
                new_folder_name = '%s %d' %(name, int(number)+1)
                return AppropriateFolderName(new_folder_name,
                                             parent_fullpath)
                # Recursively call itself so that it can be check whether a
                # folder named new_folder_name already exists in parent_fullpath
                # or not.
            else:
                new_folder_name = '%s 1' %folder_name
                return AppropriateFolderName(new_folder_name, parent_fullpath)
                # Recursively call itself so that it can be check whether a
                # folder named new_folder_name already exists in parent_fullpath
                # or not.
        else:
            return folder_name
    
    def Extract(tarfile_fullpath, delete_tar_file=True):
        """Extract the tarfile_fullpath to an appropriate* folder of the same
        name as the tar file (without an extension) and return the path
        of this folder.
    
        If delete_tar_file is True, it will delete the tar file after
        its extraction; if False, it won`t. Default value is True as you
        would normally want to delete the (nested) tar files after
        extraction. Pass a False, if you don`t want to delete the
        tar file (after its extraction) you are passing.
    
        """
        tarfile_name = os.path.basename(tarfile_fullpath)
        parent_dir = os.path.dirname(tarfile_fullpath)
    
        extract_folder_name = AppropriateFolderName(tarfile_name[:\
        -1*len(FileExtension(tarfile_name))-1], parent_dir)
        # (the slicing is to remove the extension (.tar) from the file name.)
        # Get a folder name (from the function AppropriateFolderName)
        # in which the contents of the tar file can be extracted,
        # so that it doesn't replace an already existing folder.
        extract_folder_fullpath = os.path.join(parent_dir,
        extract_folder_name)
        # The full path to this new folder.
    
        try:
            tar = tarfile.open(tarfile_fullpath)
            tar.extractall(extract_folder_fullpath)
            tar.close()
            if delete_tar_file:
                os.remove(tarfile_fullpath)
            return extract_folder_name
        except Exception as e:
            # Exceptions can occur while opening a damaged tar file.
            print 'Error occured while extracting %s\n'\
            'Reason: %s' %(tarfile_fullpath, e)
            return
    
    def WalkTreeAndExtract(parent_dir):
        """Recursively descend the directory tree rooted at parent_dir
        and extract each tar file on the way down (recursively).
        """
        try:
            dir_contents = os.listdir(parent_dir)
        except OSError as e:
            # Exception can occur if trying to open some folder whose
            # permissions this program does not have.
            print 'Error occured. Could not open folder %s\n'\
            'Reason: %s' %(parent_dir, e)
            return
    
        for content in dir_contents:
            content_fullpath = os.path.join(parent_dir, content)
            if os.path.isdir(content_fullpath):
                # If content is a folder, walk it down completely.
                WalkTreeAndExtract(content_fullpath)
            elif os.path.isfile(content_fullpath):
                # If content is a file, check if it is a tar file.
                # If so, extract its contents to a new folder.
                if FileExtension(content_fullpath) in file_extensions:
                    extract_folder_name = Extract(content_fullpath)
                    if extract_folder_name:     # if extract_folder_name != None:
                        dir_contents.append(extract_folder_name)
                        # Append the newly extracted folder to dir_contents
                        # so that it can be later searched for more tar files
                        # to extract.
            else:
                # Unknown file type.
                print 'Skipping %s. <Neither file nor folder>' % content_fullpath
    
    if __name__ == '__main__':
        tarfile_fullpath = 'fullpath_path_of_your_tarfile'    # pass the path of your tar file here.
        extract_folder_name = Extract(tarfile_fullpath, False)
    
        # tarfile_fullpath is extracted to extract_folder_name. Now descend
        # down its directory structure and extract all other tar files
        # (recursively).
        extract_folder_fullpath = os.path.join(os.path.dirname(tarfile_fullpath),
          extract_folder_name)
        WalkTreeAndExtract(extract_folder_fullpath)
        # If you want to extract all tar files in a dir, just execute the above
        # line and nothing else.
    

    I have not added a command line interface to it. I guess you can add it if you find it useful.

    Here is a slightly better version of the above program –
    http://guanidene.blogspot.com/2011/06/nested-tar-archives-extractor.html

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

The following code block allows the user creates textbox controls dynamically. However, on each
The following code allows me to take pictures, however it does not insert the
The following code shows a button that allows you to select a file (should
How is that setting a breakpoint in my code allows the following code to
I have the following code that allows me to position it on the top.
I have the following code which only allows users to select Mondays from jquery
I have come up with the following code, which allows users to view a
The following code allows you to process all possible pairs of objects (where DoSomething(a,b)
Extracting the href value from the following sample HTML code is straight forward if
The following code allows the user to click on the question of a flashcard

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.