I started working on a little Python script for FTP recently. To start off with, I had server, login and password details for an FTP site hardwired in the script, but this didn’t matter because I was only working on it locally.
I then had the genius idea of putting the project on github. I realised my mistake soon after, and replaced the hardwired details with a solution involving .netrc. I’ve now removed the project from github, as anyone could look at the history, and see the login details in plain text.
The question is, is there any way to go through the git history and remove user name and password throughout, but otherwise leave the history intact? Or do I need to start a new repo with no history?
First of all, you should change the password on the FTP site. The password has already been made public; you can’t guarantee that no one has cloned the repo, or it’s not in plain-text in a backup somewhere, or something of the sort. If the password is at all valuable, I would consider it compromised by now.
Now, for your question about how to edit history. The
git filter-branchcommand is intended for this purpose; it will walk through each commit in your repository’s history, apply a command to modify it, and then create a new commit.In particular, you want
git filter-branch --tree-filter. This allows you to edit the contents of the tree (the actual files and directories) for each commit. It will run a command in a directory containing the entire tree, your command may edit files, add new files, delete files, move them, and so on. Git will then create a new commit object with all of the same metadata (commit message, date, and so on) as the previous one, but with the tree as modified by your command, treating new files as adds, missing files as deletes, etc (so, your command does not need to dogit addorgit rm, it just needs to modify the tree).For your purposes, something like the following should work, with the appropriate regular expression and file name depending on your exact situation:
Remember to do this to a copy of your repository, so if something goes wrong, you will still have the original and can start over again.
filter-branchwill also save references to your original branches, asoriginal/refs/heads/masterand so on, so you should be able to recover even if you forget to do this; when doing some global modification to my source code history, I like to make sure I have multiple fallbacks in case something goes wrong.To explain how this works in more detail:
This will replace
SekrtPassWrdin yourmyscript.pyfile withREDACTED; the-ioption tosedtells it to edit the file in place, with no backup file (as that backup would be picked up by Git as a new file).If you need to do something more complicated than a single substitution, you can write a script, and just invoke that for your command; just be sure to call it with an absolute pathname, as
git filter-branchcall your command from within a temporary directory.This tells
gitto run a tree filter, as described above, over every branch in your repository. The-- --allpart tells Git to apply this to all branches; without it, it would only edit the history of the current branch, leaving all of the other branches unchanged (which probably isn’t what you want).See the documentation on GitHub on Removing Sensitive Data (as originally pointed out by MBO) for some more information about dealing with the copies of the information that have been pushed to GitHub. Note that they reiterate my advice to change your password, and provide some tips for dealing with cached copies that GitHub may still have.