My organisation is preparing to release an open-source version of our software using github, however I’m not sure the best way to approach this:
We have two branches master and release, master contains some proprietary components that we have decided not to release, and release contains the cleaned-up version that we want to distribute. The problem is, if we just push the release branch to github, the proprietary components can be retrieved by looking through the revision history.
I was considering creating a separate repository, copying the HEAD of relase into it, doing a git init, and pushing that repository to github. However, we want to retain the ability to cherry-pick certain patches from master into release in the future, and push those changes up to github.
Is there a way to do this without maintaining two separte repositories?
Thanks!
Update:
To be a little more specific, this is sort-of what our commit history looks like at the moment:
--- o - o - o - o - f - o - o - f - master
\
c - c - c - c - c - c - c - REL - f - f
Where ‘o’ are commits in the master, proprietary branch, ‘c’ are commits that remove things that should not be published (often not removing entire files, but reworking existing ones not to rely on proprietary components), and ‘f’ are fixes in master that apply to release as well, and so have been cherry-picked. REL is a tagged version of the code we deem safe to publish, with no history whatsoever (even previous versions of the release branch, since not all the proprietary material had been removed before the REL tag).
Ben Jackson’s answer already covers the general idea, but I’d like to add a few notes (more than a comment’s worth) about the ultimate goal here.
You can quite easily have two branches, one with an entirely clean (no private files) history, and one complete (with the private files), and share content appropriately. The key is to be careful about how you merge. An oversimplified history might look something like this:
The
ocommits are the “clean” ones, and thexare the ones containing some private information. As long as you merge from public to private, they can both have all the desired shared content, without ever leaking anything. As Ben said, you do need to be careful about this – you can’t ever merge the other way. Still, it’s quite possible to avoid – and you don’t have to limit yourself to cherry-picking. You can use your normal desired merge workflow.In reality, your workflow could end up a little more complex, of course. You could develop a topic (feature/bugfix) on its own branch, then merge it into both the public and the private versions. You could even cherry-pick now and then. Really, anything goes, with the key exception of merging private into public.
filter-branch
So, your problem right now is simply getting your repository into this state. Unfortunately, this can be pretty tricky. Assuming that some commits exist which touch both private and public files, I believe that the simplest method is to use
filter-branchto create the public (clean) version:then create a temporary private-only branch, containing only the private content:
And finally, create the private branch. If you’re okay with only having one complete version, you can simply merge once:
That’ll get you a history with only one merge:
Note: there are two separate root commits here. That’s a little weird; if you want to avoid it, you can use
git rebase --root --onto <SHA1>to transplant the entire private-temp branch onto some ancestor of the public branch.If you’d like to have some intermediate complete versions, you can do the exact same thing, just stopping here and there to merge and rebase:
This will get you a history something like this:
Again, if you want them to have a common ancestor, you can do an initial
git rebase --root --onto ...to get started.Note: if you have merges in your history already, you’ll want to use the
-poption on any rebases to preserve the merges.fake it
Edit: If reworking the history really turns out to be intractable, you can always totally fudge it: squash the entire history down to one commit, on top of the same root commit you already have. Something like this:
So you’ll end up with this:
where
AandA'contain exactly the same content, andXis the commit in which you removed all private content from the public branch.At this point, you can do a single merge of public into private, and from then on, follow the workflow that I described at the top of the answer:
The
-s ourstells git to use the “ours” merge strategy. This means it keeps all content exactly as it is in the private branch, and simply records a merge commit showing that you merged the public branch into it. This prevents git from ever applying those “remove private” changes from commitXto the private branch.If the root commit has private information in it, then you’ll probably want to create a new root commit, instead of committing once on top of the current one.