Merging Git Repositories - A Short Guide to Rewriting History

March 10, 2009

Git comes with amazing powers that few other version control systems can match. One of the most powerful (and dangerous) features of git is the ability to rewrite the history of a repository in various way. As tools go, this is a really sharp knife, you can cut yourself badly on it, but it also enables some very clean cutting.

Recently at work, we went through a bit of a reworking of our repository strucure. Two of our apps shared the same database and quite a few libraries. Before git we had managed the libraries as SVN externals, and also shared the db directory with migrations and schema between both apps in the same way. So when we migrated to git, we basically brought with us the same structure of to repositories, each with a series of submodules.

This seemed like a good idea at a time, but working with submodules when a lot of updates is going on and the submodules are not really seperate self enclosed entities, can quickly get very ugly. And for us it did.

Because of the constant trouble keeping branches, new commits and dependencies between new versions of submodules and new versions of the superproject in check, we decided to rethink our repository structure. Instead of the two different repositories, we wanted to have just one repository, with no submodules, and some relative symlinks to give both application use the same libraries and the same db schema.

The straight forward way to get to this new structure would be to just do some copying and get it done with, but in this way we would loose all history of the submodules, and at least one of the main repositories, so I started looking into how to merge two (or more) git repositories, and retain the history of all of them. This is the solution I found:

Imagine we have a dir with two git repositories:

a/
b/

And we want to end up with one git repository like this:

new_repository/a
new_repository/b

We start by creating our new repository:

mkdir new_repository
cd new_repository
git init

Now we use gits fetch command to bring in each of the old repositories as a seperate branch in our new repository.

git fetch ../a master:a
git fetch ../b master:b

And just to get something into HEAD we add an empty tmp.txt to master:

touch tmp.txt
git add tmp.txt
git commit -m "We need to have something in HEAD for the following commands to work"

Now it is time to rewrite history. The key to this is the command git filter-branch, that allows filtering all revisions within a branch by a shell command. What we want to achieve by this filtering, is to move all files in branch a to a subdir a/, and pretend they were always there, and to move all files in branch b to subdir b/, and likewise pretending that this has always been like that.

The two full commands looks like this:

git filter-branch -f --index-filter 'git ls-files -s | sed "s-\t-&a/-" | GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info && if [ -f $GIT_INDEX_FILE.new ] ; then mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE ; fi' a

git filter-branch -f --index-filter 'git ls-files -s | sed "s-\t-&b/-" | GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info && if [ -f $GIT_INDEX_FILE.new ] ; then mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE ; fi' b

Note that on some systems (like on my mac) sed doesn’t understand the shortcut for matching tab: \t. In this case you need to replace \t with an actual tab. Command line autocompletion might make that tricky, but “ctrl+v tab” should do the job.

These two commands might take a while to run, since they need to churn through all old revisions of the branches and rewrite the index. I’ll describe in more details how they work at the end of this post.

Once they have run, we just need to merge our modified branches into master:

git merge a
git merge b

And voila, we now have a new repository, merging the two old repositories in each their subdirs, while also merging all old history from the two projects.

Obviously all the hard work is done with the filter-branch command, and it is worth understanding well how it works. The easiest way to get an idea of what is going on, is seeing the filter output from a real repository. Lets setup a quick test repository to get an idea of how this would look:

mkdir repo
cd repo
git init
touch a.txt
touch b.txt
git add .
git commit -m "First commit"

Now lets run the first part of our filter within this repository (keep the note on \t from before in mind):

git ls-files -s | sed "s-\t-&c/-"

This will output something like:

100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   c/a.txt
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   c/b.txt

We pipe this into the git update-index command, but first we use a variable $GITINDEXFILE that git filter-branch provides, to make git update-index write the updated index to a temporary index file. Once all the moved versions of the files in the revision has been written to the new index file, we move the temporary index file to overwrite the real indexfile for the revision.

The if statement around the mv is there to handle a special edge case that took me a while to get around. Our original repository was imported into git from subversion with git-svn. But the very first import in subversion, appears as an empty commit in git. git update-index doesn’t write any index file if there are no changes, so the mv command ended up with an empty parameter, and the filter would break. The if statement fixes that simply by making sure that the new index file exists before trying to move it.

That’s it - now go rewrite history. But take care, and remember to coordinate something like this really well with anybody pulling from or pushing to your current repositories.


Posted by Mathias Biilmann. Category: Git. Tags: Git.

Comments

  1. Boris Bokowski said 10 months ago:

    Hi Mathias,

    Great instructions, thanks a lot for documenting this. Do you have any idea how I could also rewrite what old tags point to, in particular when there were tags with the same name in both original repositories?

  2. David said 7 months ago:

    Hi Mathias,

    Thanks for sharing this. Couldn't come around the bugs that are included in some other examples that are published on the web.

    Your way worked great for me.

    Thanks again! BR, David

Leave a comment