[PATCH] commit: add config option to limit copy search distance

Pierre-Yves David pierre-yves.david at ens-lyon.org
Fri Jan 9 05:19:05 UTC 2015



On 12/18/2014 11:21 AM, Durham Goode wrote:
>
> On 12/18/14 9:29 AM, Durham Goode wrote:
>>
>> On 12/17/14 5:29 PM, Matt Mackall wrote:
>>> On Wed, 2014-12-17 at 17:11 -0800, Ryan McElroy wrote:
>>>> # HG changeset patch
>>>> # User Ryan McElroy <rmcelroy at fb.com>
>>>> # Date 1418862573 28800
>>>> #      Wed Dec 17 16:29:33 2014 -0800
>>>> # Node ID 769728b7e5f243b0527d6f374098c537f9ddd2d6
>>>> # Parent  39cead85fd58ae6693592074656b284ed736d9bc
>>>> commit: add config option to limit copy search distance
>>>>
>>>> Editing history, among other things, can result in a copy source
>>>> 'going missing'
>>>> which causes Mercurial to search back through history for the
>>>> original of the
>>>> file. This search may be in vain, though, because the source of the
>>>> file is no
>>>> longer in the history. In large repositories, this can take hours or
>>>> days.
>>>>
>>>> This patch adds an optional configuration option
>>>> (limit.copysearchrevs) to limit
>>>> how far back Mercurial will search for the file. A value of None
>>>> (default) will
>>>> maintain the current behavior of searching all the way back while a
>>>> value of 0
>>>> will disable search through any revs. Moderate values should enable
>>>> the benefits
>>>> of this behavior in most cases while avoiding long delays searching
>>>> through the
>>>> history.
>>> Interesting that at the same time we have a patch on the list that
>>> propose removing the limit entirely.
>>>
>> I think this is a different issue.  The _tracelimit discussion in the
>> other thread is about traversing the filelog, while the stuff Ryan is
>> changing is actually traversing the entire changelog.
>>
>> I don't understand the copy tracking code even before Ryan's change.
>> It's basically saying "if the copy source isn't in either of the
>> dirstate parent manifests, go looking for it".  But if the file being
>> copied isn't in either of the parents, why are we treating it as a copy?
>>
>> Further more, why are we even searching for the copy node? Doesn't the
>> copy data returned by fctx.renamed() give us the exact copy source
>> path and copy source file node?  It looks like memfilectx doesn't obey
>> this, but that's probably fixable.
>>
>> I'll play around with this a bit to see if my theory holds out.
> Ug, it gets even worse.  The look up is 'self[None].ancestors()' but
> localrepo._filecommit is called from commitctx which is totally
> independent of the working copy.  So arbitrary memctx commits are doing
> copy tracing relative to the working copy position.
>
> I tried deleting this copy tracing from localrepo._filecommit, and the
> only thing it seems to break is a special case in graft where you graft
> a copy and it results in an empty commit.  But the existing copy tracing
> is breaks that case in other ways already!  If you have two branches
> that copy foo-to-bar and edit bar in different ways, it completely
> breaks hg log -f because it grafts in the bar copy as a copy from the
> original foo source, instead of an edit to the existing bar copy.
>
> I guess I'll log a task to redo this code.  In the meantime Ryan's
> change will at least let us avoid the massive perf hit.

Ryan, Durham and I walked through the issue again today.

The -commit- code have some magic to handle renamed data where the 
source of the renamed does not exists in the commit parent (‽).

The code to handle that have been added in 2008 to handle a bug (issue 
1175) that is summarized by the following situation:

1) `hg mv src dst`: record `dest` as a copy from `src`
2) `hg commit src`: make a changeset with only the removal of `src`
2.5) At that point the working copy data still record `dst` as copied 
from `src`, but src is missing from `src` working copy and its parent.
3) `hg commit dst`: make a changeset with the addition of `dst`. The 
commit code try to use the rename information.

(note: there is a couple of other way to end up in this situation)

Before issue1175 fixes, this raised a KeyError and crashed Mercurial. To 
solve this crash, it was decided to detect this situation and walk the 
ancestors the working directory parent in order to find the copy sources.

Much more recent code doing grafting (like the graft command or 
histedit) have buggy rename duplication code. This buggy rename 
duplication results in file being marked as copied when they should not. 
Including copied from source that does not exist in the parent at all. 
Such buggy rename data triggers (somewhat by chance) the issue1175 fix 
into pathological case.


The way issue1175 is fixed is debatable, Ryan, Durham and I see the 
recording of rename from file non-existent in parent as a buggish. But 
the main issue affecting our user today (issue4476) comes from buggy 
rename duplication when grafting. We should focus on fixing this buggy 
rename duplication first that would relief our users issue.

We expect to discuss these topic with Matt in person next week.

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list