Parallel dirstate.walk?

Mathieu Lacage mathieu.lacage at sophia.inria.fr
Wed Aug 18 10:50:53 CDT 2010


On Wed, 2010-08-18 at 17:05 +0200, Martin Geisler wrote:
> Hi again,
> 
> Also at my meeting, the dear customer complained about the time it takes
> to do 'hg status' on a repository with tens of thousand of files. They
> mentioned 'hg status' takes 8 seconds on 100.000 files.
> 
> This over NFS(!) and I told them that of course it would be slow... They
> replied that they cannot move things away from NFS, so now the question
> is if we can speed things up anyway.
> 
> 
> The idea of parallel listdir calls comes from their IT department. They
> do a weekly indexing of all machines over NFS. They say this is only
> feasible because they do it by issuing listdir calls in parallel. The
> NFS server is a heavy-duty server with RAID disks and it can apparently
> serve the results in parallel back to the clients.
> 
> So this is the idea: stat the directories in parallel. I guess this
> would be done best by replacing dirstate.walk with a osutils.walk that
> uses C threads internally before it returns a list to its caller. That
> way the threads will be hidden from sight.
> 
> Does this sound crazy? Is anybody crazy enough to attempt it? :)

One day, I wrote something similar in python, and rewrote it in c++
because the python version was so badly written that it used 100% of the
cpu. I don't have the python version around anymore. And this version
might be buggy as hell. Help yourself :)

Mathieu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pdu.cc
Type: text/x-c++src
Size: 5106 bytes
Desc: not available
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20100818/0b330216/attachment.cc>


More information about the Mercurial-devel mailing list