[PATCH 2 of 5 v5] store: implement fncache basic path encoding in C
mads at kiilerich.com
Wed Sep 12 14:22:57 CDT 2012
On 09/12/2012 08:36 AM, Noel Grandin wrote:
> On 2012-09-12 00:59, Adrian Buehlmann wrote:
>> On 2012-09-10 22:34, Bryan O'Sullivan wrote:
>>> store: implement fncache basic path encoding in C
>> I have a (possibly crazy) idea:
>> What if we would do a new repo format - let's call it "fasthash"  -
>> with the following characteristics:
>> a) fixes issue3621
>> b) does a slightly simpler encoding for hashed paths
>> c) uses the same encoding as we currently have for short paths
> Why not just always hash the paths?
One reason not to use hashes is that Mercurial and many other tools
store and visit files in alphabetic order. A simple backup/restore or
recursive copy of files will place the actual file content on disk in
alphabetical order and thus give some kind of 'defragmentation' for
access in that order, making the block device access mostly sequential.
That will make a difference, especially with small files on spinning
media where we have read-ahead and relatively high seek times.
With Mercurials current encoding of filenames the store will have almost
the same sort order as the corresponding filenames and we will often
benefit from the sequential access. If path hashes were used in the
filename encoding it would be more random access. This is allegedly one
of the reasons Mercurial in some cases outperform git.
- but that is all anecdotal evidence and might be irrelevant.
Benchmarking of worst and best and realistic cases will tell how big the
impact really is.
Another consideration is that directories with a lot files perform badly
on some filesystems. It might be necessary to use multiple levels of
A simple encoding is also faster than computing a hash ... and you have
to use a 'secure' hash function unless you want to handle hash collisions.
A final reason for keeping a scheme like the current is that now it is
quite transparent and easy to figure out what goes where. Pure hashes
makes it much harder to debug storage issues.
But if we store all the mappings from 'real' name to encoded anyway then
it might be possible to come up with some other naming scheme where we
generate some 'random' names that have the same sort order as the actual
More information about the Mercurial-devel