How to get rid of big binary files buried in my repository's history

Marijn Vriens marijn at metronomo.cl
Fri Feb 27 14:18:02 CST 2009


On Wed, Feb 25, 2009 at 6:11 AM, Tim Molendijk <tim at timmolendijk.nl> wrote:

> Hello,
>
> I've recently moved most of my projects into Mercurial and so far my
> experience has mainly been a positive one. But now I'm dealing with this
> annoying issue and I cannot get my head around possible solutions.
>
> A couple of commits ago I introduced a big video file into the repository.
> At first I didn't see the harm, until I pushed to my code host which took
> quite a while (obviously). Then I needed to change the path of the file
> within the repository and even though I used 'hg rename', upon pushing I
> noticed the complete video file (which hadn't changed apart from its
> location) was uploaded again. Then I cloned my repository from my code host
> to my web server. This took forever, which made me realize that the video
> file has been stored in 'history' (.hg) twice, and it is there to stay.
>
> I very much don't like this situation, but how can I truly get rid of the
> file? If I simply remove it, it remains part of a past changeset, so every
> time I make a new clone of the repository it will need to be downloaded
> again (twice in my case). If I update back to a revision before I added the
> file I lose all my changes that followed, and I cannot see how I could merge
> these changes back in without merging back in the video file as well.
>
> Any help would be greatly appreciated!
>

Sorry that I didn't respond earlier. The following script is an example how
I got myself out of these kinds of situations. It's not perfect and has some
undesirable side effects, but it get's the job done without (too much) black
magic.

For the following to work, you have to enable the transplant extension.

# The tipical setup.
hg init test
cd test

echo "Foo" > file1.txt
hg ci -A -m"First commit"  # Commit 0
echo "Bar" >> file1.txt
hg ci -m"Second good commit"  # commit 1
dd if=/dev/random of=bad_content.avi bs=1m count=10 # A uncompressable
big-file.
echo "good content" > file2.txt
hg ci -A -m"Commit with bad content"  # Commit 2
echo "More content" >> file2.txt
echo "Quux" > file3.txt
hg ci -A -m"Another good commit"  # Commit 3
echo "Boo" >> file3.txt
hg ci -A -m"Final good commit"  # Commit 4

# It is discovered that bad element has been added in Commit 2.

hg diff --git -r1:2 > bad_content.patch
hg up -C -r 1  # Go to the last commit before adding the content.
hg import --no-commit bad_content.patch  # Load the patch back
hg revert bad_content.avi  # Do not include the unwanted content in the next
commit
hg ci -m"Make a new commit without the unwanted content."  # Commit 5
# now there is a new head.

hg transplant 3:tip  # Re-commit the post-bad-content commits.

cd ..
hg clone --pull -rtip test test-clean

# Effects:
The "test-clean" repos now doesn't have the unwanted content.

 du -hs test/.hg/ test-clean/.hg/
 10M    test/.hg/
 56K    test-clean/.hg/

# Side effects:
The ids of all the commits from the bad content onwards have changed.
If you pull from a previously "infected" repository, the current one will
get re-infected. So this can only be used when you can replace (the mayority
of) the clones.. otherwise you always have to make sure that you are not
pulling the old change sets in (hg incoming is your friend).

I hope this is of any help.

Marijn.


> Thanks,
> Tim Molendijk
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://selenic.com/pipermail/mercurial/attachments/20090227/216960ab/attachment.htm 


More information about the Mercurial mailing list