Seeking information about lightweight copying/renaming

Mon Apr 5 22:51:35 CDT 2010

Hi everyone,

I'm digging into the lightweight copies/renames issue a little more, and I'm
looking for some more direction. Here are some questions:

Concerning the status of previous work done by Murkt (
http://markmail.org/message/bz46xb62hid57ewx):
- Quick summary: This approach entails adding a 'lwcopy' flag to the
metadata contained within copy revlogs, which indicates that the copy should
be interpreted as a diff against an indicated source file revision instead
of containing the full text of a file.
- Is this a reasonable approach to solving the problem? Were there any major
concerns with this approach which should be considered when planning future
work?
- Has any further work been done beyond that of the patches written by Murkt
(linked above)?
- Would it be possible for a revlog with no data/diff (only copy/rename
metadata) to cause problems somewhere by invalidating an assumption of no
empty revlogs?
- At first glance, it seems like multiple renames would cause multiple
lookup cycles to apply each 'lwcopy' diff to its referenced source revision.
Perhaps a lightweight rename should be skipped after n renames to prevent
significant performance degradation?

Automated testing:
- Were any tests against lightweight copying written by previous
contributors?
- I took a look at the current copy tests (test-copy and test-copy2). For
any new tests against revlog data, my thought is to continue the pattern of
reading dumprevlog's output and md5summing the revlogs. Is this reasonable?

How pushing/pulling work:
- How do they work for local vs. remote repositories? For local
repositories, are the source/dest data files (for pull/push respectively)
accessed directly?
- For remote repositories, how does the wire protocol fit in? I dug around
in the source code a bit for ssh and http syncing; at first glance, it seems
like ssh syncing just runs an http sync over the tunnel, and the http code
does all of the protocol work. Is that where most of the wire protocol is
implemented?

References:
- Murkt's previous GSoC work: http://markmail.org/message/bz46xb62hid57ewx
- Issue 883: http://mercurial.selenic.com/bts/issue883
- Copy/rename metadata:
http://selenic.com/pipermail/mercurial/2008-February/017139.html
- Rename space saving plan:
http://mercurial.selenic.com/wiki/RenameSpaceSavingPlan
- Design: http://mercurial.selenic.com/wiki/Design
- What goes where: http://mercurial.selenic.com/wiki/WhatGoesWhere
- Some glances at hg's source code
- Various other developer docs
- Are there any other things I should look at?

Sorry for the massive email; there's a lot to wrap my head around. Thanks in
advance for your time and assistance.

~Paul Malmsten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial/attachments/20100405/c79f4597/attachment.htm>