dirstate and improving the performance of hg-add

Pierre-Yves David pierre-yves.david at logilab.fr
Tue Jun 12 03:40:26 CDT 2012


On Tue, Jun 12, 2012 at 12:08:27AM +0200, Adrian Buehlmann wrote:
> (reordering the answer, as we usually don't top post)
> 
> On 2012-06-11 23:06, Joshua Redstone wrote:
> > On 6/11/12 3:58 PM, "Adrian Buehlmann" <adrian at cadifra.com> wrote:
> >> On 2012-06-11 21:08, Joshua Redstone wrote:
> >>> Hi mercurial-devel,
> >>> I've been looking into how to improve the performance of hg-add and
> >>> wanted to get people's thoughts on removing dirstate._dirs and adding a
> >>> sorted list of entries to dirstate to mirror the stuff in dirstate._map.
> >>>
> >>> Background:
> >>>
> >>> hg-add is fast for small repos, but for larger repos, we've been seeing
> >>> the time to add a file grow to over 1.5 seconds ...
> 
> [..]
> 
> >>> A few observations:
> >>>
> >>> - the number of case-insensitive comparisons could be dramatically
> >>> reduced by indexing into the dirstate rather than exhaustively iterating
> >>> through the whole thing.  One way to do this is to keep a sorted list of
> >>> entries and doing a binary search for the path being added.  It suffices
> >>> to do a case-insensitive comparison with only the directory components
> >>> of the entries in the list surrounded where the path would be added.
> >>
> >> Do you actually need the default value of the config setting
> >> ui.portablefilenames?
> >>
> >> The default is "warn", which does that case-insensitive checking. But if
> >> you don't need that warning, then setting it to "ignore" or "false"
> >> should not do this case-insensitive comparison.
> >>
> > If we can afford it, it seems like it's nice to have that warning to
> > maintain portability.  And, at least in principle, it seems like we should
> > be able to afford it with ease.
> 
> You are sidestepping my question :-). The question is about whether the
> default value of "warn" is actually relevant to your very demanding use
> patterns.
> 
> So I assume:
> (a) You want to keep that warning
> (b) The 1.5 seconds was measured with ui.portablefilenames=True
> 
> Pure Linux shops hardly ever need that warning. And if you happend to
> later change your mind, you can always rename files, making a revision
> and it's children able to checkout on Windows. The names in the store
> are encoded, so the store survives on all supported platforms, no matter
> how silly your file names or your platforms used are.
> 
> That default value was the result of relatively short discussion (IIRC)
> and I think I voiced then to have it on "ignore", but I was defeated. I
> still think the default should be the fast setting. Even though I am a
> Windows hacker.

I think the current default value is a very good one. Case collision error are
a pain and we are better safe then sorry.

As Joshua pointed, there is no other reason for the current default to be slow
other than the poor quality of the current implementation[1]. Moving this from
O(N) to O(log N) should provide a boost in performance nice enough for the
current default to be a fast default.

[1] I'm not blaming anyone for that, it is a non critical part of most repo.

-- 
Pierre-Yves David

http://www.logilab.fr/


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20120612/ddcccba2/attachment.pgp>


More information about the Mercurial-devel mailing list