Bug 2491 - uncompressed bundle files are leaked in many commands
Summary: uncompressed bundle files are leaked in many commands
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: Mercurial (show other bugs)
Version: unspecified
Hardware: All All
: normal bug
Assignee: Thomas Arendsen Hein
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-11 11:04 UTC by Patrick Mézard
Modified: 2013-07-26 16:59 UTC (History)
7 users (show)

See Also:
Python Version: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick Mézard 2010-11-11 11:04 UTC
When handling a compressed bundle, bundlerepo first uncompress it in a
temporary file then overlay it on top of existing revlogs. The temporary
file is eventually removed when calling .close() or if the __del__() is run.
The problem is it never happens in most cases.

bundlerepos are usually created through hg.repository(). The problem is
repository class was never meant to be closed once used, so bundlerepo
specific close() is almost never called. And __del__ is not guaranteed to be
run upon program termination.

My suggestion is we change that and add an explicit .close() method to the
repository class and use it. Besides fixing the bundlerepo case, I believe
having a reliable finalizer on localrepo may be helpful. Consider adding
something like "incoming()" to revsets. This finalizer would help us getting
rid of the remote bundle once done with it.

Here is a test reproducing the issue in the -R case:

  $ hg init repo
  $ cd repo
  $ echo a > a
  $ hg ci -Am t
  adding a
  $ echo b > b
  $ hg ci -Am t
  adding b
  $ hg bundle --base 0 ../test.hg
  1 changesets found
  $ cd ..
  $ hg clone -r 0 repo repo2
  adding changesets
  adding manifests
  adding file changes
  added 1 changesets with 1 changes to 1 files
  updating to branch default
  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
  $ cd repo2

Before importing the bundle

  $ ls .hg | grep bundle
  [1]
  $ hg -R ../test.hg heads
  changeset:   1:2c197ebeca99
  tag:         tip
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     t
  

After importing the bundle

  $ ls .hg | grep bundle
  hg-bundle-gn6Twm.hg10un
Comment 1 IsaacJurado 2010-11-12 02:04 UTC
This is an indicator of a memory leak due to cyclic references, so the 
actual problem is deeper than forgetting to call .close().  And, what is 
worse, it is starting to arise quite often now:

http://selenic.com/repo/hg/rev/dda4ad7c9ea9

I think it is good to keep relying on __del__() to remove dangling bundle 
files as it is always a fast indicator of something wrong.  There is a small 
tool called "objgraph" that can help dececting reference cycles:

http://mg.pov.lt/objgraph/
Comment 2 Patrick Mézard 2010-11-12 04:03 UTC
Perhaps this is a symptom of something but I disagree that relying on
__del__ to do anything is reliable. It is not guaranteed to be run upon
interpreter termination, *even* in CPython. Look for __del__ in:

    http://docs.python.org/reference/datamodel.html
Comment 3 IsaacJurado 2010-11-12 13:51 UTC
I wasn't aware of the fact that the cycle collector does not free objects
with a __del__ method defined.  In fact, I wasn't even aware of the
existance of a cycle collector.

This actually explains the memory leak produced by bookmarks in the first
place.  As bundlerepo has __del__ defined.

So, yes, it's either using explicit .close() (and delete the __del__ method
to let the cycle collector do its job) or finding and fixing the cycle/leak.
Comment 4 Patrick Mézard 2010-11-16 15:46 UTC
An explicit close may help with issue1944 too.

In this case, sshrepo.__del__ seems to be called in all my tests but who knows.
Comment 5 Adrian Buehlmann 2011-01-06 20:46 UTC
I just stumbled over this today as well (and entered duplicate issue2578 , which I've 
now redirected to here).

I can confirm that __del__ of the bundlerepo class is not called (as seen on Windows).

I agree with pmezard that adding an explicit .close() method to the
repository class and use it is probably best.
Comment 6 Thomas Arendsen Hein 2012-03-01 09:39 UTC
When running test-bundle.t with python2.6 and 8ae7626d8bf1 (current stable
tip) I get the following in my temp dir:

/tmp/thomas/tmp7SFC8m
/tmp/thomas/tmp7SFC8m/.hg
/tmp/thomas/tmp7SFC8m/.hg/store
/tmp/thomas/tmp7SFC8m/.hg/00changelog.i
/tmp/thomas/tmp7SFC8m/.hg/requires
/tmp/thomas/tmp7SFC8m/.hg/hg-bundle-i1qYXM.hg10un
/tmp/thomas/tmpPjgBYq
/tmp/thomas/tmpPjgBYq/.hg
/tmp/thomas/tmpPjgBYq/.hg/store
/tmp/thomas/tmpPjgBYq/.hg/00changelog.i
/tmp/thomas/tmpPjgBYq/.hg/requires
/tmp/thomas/tmpPjgBYq/.hg/hg-bundle-4xEKHi.hg10un
Comment 7 HG Bot 2012-04-03 20:00 UTC
Fixed by http://selenic.com/repo/hg/rev/4d875bb546dc
Thomas Arendsen Hein <thomas@intevation.de>
clone: always close source repository (issue2491)

(please test the fix)
Comment 8 Thomas Arendsen Hein 2012-04-04 03:28 UTC
works for me, nothing remaining in $TMP after running tests
Comment 9 Thomas Arendsen Hein 2012-04-04 03:33 UTC
But not solved for pmezard's case in the initial post ...
Comment 10 Thomas Arendsen Hein 2012-04-04 04:05 UTC
affected:
 hg pull
 hg outgoing
 hg log -r 'outgoing()'
 hg summary --remote
 hg bundle foo ../test.hg
 hg bundle -r 'outgoing()' foo  (in this case two copies!)
 hg export -r 'outgoing() (additionally a traceback, see issue3353)
 hg id ../test.hg
 hg push (additionally waiting for lock on self)

not affected:
 hg unbundle
 hg incoming
 other commands not interacting with other repos, e.g. hg add

not testes:
 extensions
 subrepos (can a bundle be a subrepo?)
Comment 11 Thomas Arendsen Hein 2012-04-04 05:22 UTC
4d875bb546dc closes srcrepo, but hg.clone() returns it.
Currently mq's qclone is the only place where the returned srcrepo is used
and it looks like it is used without a good reason:

    sr = hg.repository()
    ...
    if opts.get('patches'):
        patchespath = ui.expandpath(opts.get('patches'))
    else:
        patchespath = patchdir(sr)
    ...
    sr, dr = hg.clone()
    ... opts.get('patches') or patchdir(sr)

I'm tempted to only return destrepo and adjust the few callers which read the
return value, but it is probably safer to not close the repo in hg.clone, but
in all callers.
Comment 12 Bugzilla 2012-05-12 09:14 UTC

--- Bug imported by bugzilla@serpentine.com 2012-05-12 09:14 EDT  ---

This bug was previously known as _bug_ 2491 at http://mercurial.selenic.com/bts/issue2491
Comment 13 Matt Mackall 2013-07-26 16:59 UTC
Appears fixed.