dirstate and improving the performance of hg-add

Joshua Redstone joshua.redstone at fb.com
Tue Jun 19 16:27:33 CDT 2012


I've worked up an approach based on using a sorted list of files in the repo, and a lazily constructed cache of files that have been case converted.  So when adding a file, searching the sorted list narrows down the set of files to consider, and a cache is checked first to see if the file being added is in a directory that has already been case-converted.

The perf cost of constructing the sorted list and maintaining it seems to be negligible.
This optimization drops the time spent in case conversions and constructing _dirs to a small number and speeds up hg-add on a large repo from 2.3 seconds down to 0.9 seconds.

Fwiw, I tried a dir-tree-based approach and the construction cost of the tree was relatively expensive.

Josh

From: Joshua Redstone <joshua.redstone at fb.com<mailto:joshua.redstone at fb.com>>
Date: Wednesday, June 13, 2012 7:14 PM
To: Bryan O'Sullivan <bos at serpentine.com<mailto:bos at serpentine.com>>
Cc: "mercurial-devel at selenic.com<mailto:mercurial-devel at selenic.com>" <mercurial-devel at selenic.com<mailto:mercurial-devel at selenic.com>>
Subject: Re: dirstate and improving the performance of hg-add

I'm still working on fleshing this out.  After looking at how _dirs is used, I think it may not make sense to try to kill the whole thing.
I'm focusing on the hg-add code specifically and playing around with the sorted list because it's simple to construct.
Oh, to answer your other question, I would not be modifying _map, and a _sortledlist thingy would have @propertycache.
I'm still exploring.
Josh

From: Bryan O'Sullivan <bos at serpentine.com<mailto:bos at serpentine.com>>
Date: Monday, June 11, 2012 6:20 PM
To: Joshua Redstone <joshua.redstone at fb.com<mailto:joshua.redstone at fb.com>>
Cc: "mercurial-devel at selenic.com<mailto:mercurial-devel at selenic.com>" <mercurial-devel at selenic.com<mailto:mercurial-devel at selenic.com>>
Subject: Re: dirstate and improving the performance of hg-add

On Mon, Jun 11, 2012 at 12:08 PM, Joshua Redstone <joshua.redstone at fb.com<mailto:joshua.redstone at fb.com>> wrote:
> I've been looking into how to improve the performance of hg-add and wanted to get people's thoughts on removing dirstate._dirs and adding a sorted list of entries to dirstate to mirror the stuff in dirstate._map.

I don't follow how this would be supposed to work. Can you flesh the idea out some more? I take it you're referring to replacing dirstate._map with a different data type while getting rid of dirstate._dirs. Off the top of my head, a sorted list seems like a not-great choice compared to a tree due to its poor insertion performance.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20120619/30ee5616/attachment.html>


More information about the Mercurial-devel mailing list