Parallel dirstate.walk?

Martin Geisler mg at aragost.com
Wed Aug 18 10:05:11 CDT 2010


Hi again,

Also at my meeting, the dear customer complained about the time it takes
to do 'hg status' on a repository with tens of thousand of files. They
mentioned 'hg status' takes 8 seconds on 100.000 files.

This over NFS(!) and I told them that of course it would be slow... They
replied that they cannot move things away from NFS, so now the question
is if we can speed things up anyway.


The idea of parallel listdir calls comes from their IT department. They
do a weekly indexing of all machines over NFS. They say this is only
feasible because they do it by issuing listdir calls in parallel. The
NFS server is a heavy-duty server with RAID disks and it can apparently
serve the results in parallel back to the clients.

So this is the idea: stat the directories in parallel. I guess this
would be done best by replacing dirstate.walk with a osutils.walk that
uses C threads internally before it returns a list to its caller. That
way the threads will be hidden from sight.

Does this sound crazy? Is anybody crazy enough to attempt it? :)

-- 
Martin Geisler

aragost Trifork
Professional Mercurial support
http://aragost.com/mercurial/


More information about the Mercurial-devel mailing list