[RFC] Improving Speed of "hg status" with Largefiles

Na'Tosha Bard natosha at unity3d.com
Wed Oct 5 05:49:25 CDT 2011


Hello,

One problem that we have with largefiles is the fact that largefiles-enabled
repositories are really slow to run hg status.  Example case (our repository
that is 1.6 GB with clean working copy and 1.1 GB of largefiles (broken down
into 32 separate largefiles):

With largefiles enabled:

$ time hg status

real    0m10.197s
user    0m5.041s
sys    0m2.343s

Without largefiles enabled:

$ time hg status

real    0m0.363s
user    0m0.310s
sys    0m0.050s

I believe that when "status" is run, we first update all of the standins to
match the largefiles currently in the working copy -- which means we
re-calculate the SHA1 sum of each largefile -- then use the output of "hg
status" on the standins to determine if a largefile has actually been
modified.  You can see why this takes some time -- calculating the SHA1 sum
of several large binaries takes time, which adds up on a repository with a
lot of largefiles.

Not only is this annoying in a general sense, it also has a big hit on the
ability to use GUI tools because some (for example, SourceTree on OS X) rely
heavily on "hg status" to determine what to display to the user.  For
usability purposes, we need to speed this up significantly.  Could we
perhaps rely on last date modified + timestamp or similar for determining
whether we should actually re-calculate the SHA1 of a given largefile?
What's the best way to speed this up?

Cheers,
Na'Tosha

-- 
*Na'Tosha Bard*
Build & Infrastructure Developer | Unity Technologies

*E-Mail:* natosha at unity3d.com
*Skype:* natosha.bard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20111005/5b9e0d20/attachment.html>


More information about the Mercurial-devel mailing list