Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7827821
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 2, 20262026-06-02T10:03:38+00:00 2026-06-02T10:03:38+00:00

This question sounds similar to many posed here, but it’s obnoxiously different. I have

  • 0

This question sounds similar to many posed here, but it’s obnoxiously different.

I have an git repository that was once an svn repository (that was once a cvs repository). This contains data going back to about 1999.

The time has come to split this one repository in to several different repositories, preserving all of this rich history. However, the structure of the repository has changed frequently. All current projects came from a base project, which grew to a few projects, which shrunk to two projects, and then grew again. Code has been moved around but was never duplicated; it has now all found a final resting place in one of several mature projects.

This makes splitting the repositories very hard if I want to preserve the history. Using git-filter-branch seems like the right approach, but all of these seem to hack off parts of the repository and truncate history with them.

EDIT ADDED To clarify, here’s a small example, pretending I’m in the root of the repository. Let’s say the repository looks like this:

foo/
    bar/
        file.txt
    baz/

Now let’s say I edit the contents of file.txt. Then I rename it to newfile.txt. Then I edit the contents again. Then I move this file out of bar/ and into baz/. My repository now looks like this:

foo/
    bar/
    baz/
        newfile.txt

Ok, now let’s say I want to split baz/ out into its own repository. Using git filter-branch or using git subtree split will lose all commit messages and history for newfile.txt back when it was inside bar/ and when it was named file.txt.

I understand that checking out a historical revision might be crazy; it might reference something called ../bar/ or it might reference an invalid directory that doesn’t exist and fail spectacularly. I don’t care as long as I can look at the file contents at any particular revision.

END EDIT

It seems like there are two paths for what I want to do:

  1. Clone the repository N times, preserve the folders that I want in that repository (via git rm-ing other folders), and somehow hack off any revisions that do not eventually reference files that are in the HEAD. I realize this will have a few negative side effects, in that checking out old revisions will not provide a meaningful code base – I don’t care. In order to do this I’d need to find a way to get all paths that descend from all files that exist in HEAD, which I could do with an ugly script.

  2. Build some sort of history index of what the repository looked like during each index. Use a tree filter and chop off files that aren’t matched in their respective revision. Then, delete the files that don’t appear in or descend from files in HEAD.

Is it possible to find all files that don’t appear in HEAD and remove any history pertaining to them? I don’t care about resurrecting files that have been long deleted, and this seems to be at the crux of my issue.

Alternative solutions would also be appreciated. I’m relatively new to git, so I’m probably missing something obvious.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-02T10:03:39+00:00Added an answer on June 2, 2026 at 10:03 am

    I ended up having to do this in a several stage process.

    First, I got a list of all the files paths that were ever found in the repository:

    git log --pretty=format: --name-only --diff-filter=A | sort -u
    

    Using that, I was able to determine where the files I wanted to keep had resided at one point or another. In my case, they had resided in four separate directories in the repository throughout their lifetimes. I used this information to manually create a regex, such as (?:^foo|^bar/baz|^qux/(?:moo|woof)). This matches the directories I wanted to keep.

    I then created a perl script to preserve those pathnames AND any parent pathnames that contained them.

    use Path::Class;    
    if(scalar(@ARGV) < 1) { die "no regex"; }
    
    my $regex = qr/$ARGV[0]/;    
    my @want; my @remove; my $last = undef; my $lastrm = undef;
    
    while(<STDIN>) {
        chomp;
        my $d = $_;
        if( $d =~ $regex ) {
            if( ! defined($last) || ! dir($last)->subsumes(dir($d)) ) {
                $last = $d;
                push @want, $d;
            }
        } else {
            if( ! defined($last) || ! dir($last)->subsumes(dir($d)) ) {
               push @remove, $d;
            }
        }
    }
    foreach $rm (@remove) {
        my $no_rm = 0;
        if( defined($lastrm) && dir($lastrm)->subsumes($rm) ) {
            $no_rm++;
        } else {
            foreach $keep (@want) {
                if( dir($rm)->subsumes(dir($keep)) ) {
                    $no_rm++;
                }
            }
        }
        if( $no_rm == 0 ) {
            print "$rm\n";
            $lastrm = $rm;
        }
    }
    

    Finally, I used git filter-branch to use my new filter with my regex to keep the paths that I wanted.

    git filter-branch --prune-empty --index filter '
        git ls-tree -d -r -t --name-only --full-tree $GIT_COMMIT 
        | sort | /path/to/filter.pl "(?:regex|of|paths)" 
        | xargs -n 50 git rm -rf --cached --ignore-unmatch' -- --all
    

    The sort is necessary as it ensures the perl script gets the directories in their proper hierarchy.

    I hope this helps someone, as it took me many, many hours to come up with this. 🙂

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I know the question sounds silly, but consider this: I have an array of
I know there is this question that is very similar, but I wanted to
I know this sounds similar to many questions already asked, but those were answered
This question sounds outrageous, but VS is giving me an error when I check
Sorry if this question sounds a little silly, but I am not sure what
this might question might sounds stupid, but I could'nt figure it out. How can
This question might be closed as it sounds vague but I'm really asking this
This may sounds like a stupid question but can't find anything on google, probably
This sounds like a really simple question, but I am new to PHP. If
I know the title sounds familiar as there are many similar questions, but I'm

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.