[PATCH] issue 1286

Petr Kodl petrkodl at gmail.com
Thu Sep 4 19:07:44 CDT 2008


You are correct - but I change the line in walk to access the _foldmap 
directly via 

_foldmap.get()
 
Hence the extra check to make sure it stays empty. If you always go 
through the normalize the check is not necessary.

There is another aspect of the whole walk call. In Hg there are two main 
operation modes for the walk

1) The disk tree is walked via os.listdir  and values compared to 
something hashed we already have
- this is used during hg stat

2) Something we already have is walked and values compared to HDD tree 
- this is used during eg. hg diff when the step two does not iterate and 
everything is resolved in step 3

For the first case the number of disc accesses can be optimized to be 
proportional to # of directories instead of number of files.
On Win32 the FindFirst/FindNext is supplying stat values and on Linux 
the opendir seems to be doing good job caching - not sure about OSX.

In case 2 this is not an option - we walk the tree is memory in ABC 
order - and call lstat on every file so
the number of lstat calls is proportional to number of files.

I ran some basic benchmarks with large trees looking at Hg/Bzr and based 
on the numbers coming back it looks like
bzr is now always using method 1 - assume you know how to walk the tree 
fast and do the lookup in memory.

so for bzr the status and diff commands on clean tree have very similar 
performance characteristics while in hg there can be substantial 
variance between the two

One potential advantage of #2 is that you let the filesystem  take care 
of the case/unicode folding - but that is about the only one, and 
assuming the walk always iterates the tree on disc we would always know 
the correct file name and folding can be always done in memory without 
further disc IO.

It would also mean that the code never has to call lstat or exists on 
individual files - with exception of files names typed in as command 
line parameters - which usually means just handful of files where the 
check is more expensive.

pk











Adrian Buehlmann wrote:
> On 05.09.2008 01:10, Petr Kodl wrote:
>   
>> Well, since the _foldmap is created lazily on first access and it has 
>> been accessed directly it makes sure that on non folding OS the foldmap 
>> stays empty and the
>> path is just piped through as is.
>>
>> So for non folding OS the cost is about the same as before ({}.get() vs 
>> lambda x:x).
>>
>> I agree it is cleaner to have the normpath() call in place - but in that 
>> case it needs some way of signaling that the fspath should not be called 
>> on _foldmap miss.
>>
>>     
>
> I don't get it, sorry.
>
> _foldmap isn't accessed at all on non-folding os, because normpath is
> lambda x:x because _checkcase is False on non-folding os.
>
> Where is the bug in my reasoning?
>
> Where is _foldmap accessed directly?
>
> I see only access to _foldmap in _normalize, which is never called
> on a non-folding os (because normalize = lambda x:x on a non-folding
> os)
>
> (Maybe I'm need to normalize myself now and have some sleep now ;-)
>
>
>
>
>   



More information about the Mercurial-devel mailing list