[PATCH rfc] manifest: write a more efficient version of lazymanifest, in pure python

Matt Mackall mpm at selenic.com
Mon Aug 29 15:46:26 EDT 2016


On Fri, 2016-08-26 at 16:57 -0700, Sean Farley wrote:
> Matt Mackall <mpm at selenic.com> writes:
> 
> > 
> > On Wed, 2016-08-24 at 23:55 -0400, Augie Fackler wrote:
> > > 
> > > > 
> > > > 
> > > > On Aug 20, 2016, at 5:02 PM, Maciej Fijalkowski <fijall at gmail.com>
> > > > wrote:
> > > > 
> > > > # HG changeset patch
> > > > # User Maciej Fijalkowski <fijall at gmail.com>
> > > > # Date 1471726818 -7200
> > > > #      Sat Aug 20 23:00:18 2016 +0200
> > > > # Node ID 21b2401d468d6b24c1658468e4fc5ce8744f925b
> > > > # Parent  300f14ea21432face8d7e6cdcf92ba9d2f1f92dc
> > > > manifest: write a more efficient version of lazymanifest, in pure python
> > > > 
> > > > Questions outsdanding:
> > > > * who calls filtercopy? noone in tests at the very least
> > > manifest.py line 287 or thereabouts: it’s used for matches() on a
> > > manifest.
> > > test-status-rev.t will exercise that codepath if that helps.
> > > 
> > > > 
> > > > 
> > > > * are the performance tradeoffs ok here, notably __delitem__?
> > > Probably. It looks very similar to the C version. Deleting a ton of
> > > entries
> > > from a large manifest is probably still tragic, since it’ll be a lot of
> > > copies, but for a first pass this is already so much better than the naive
> > > version that was there...
> > > 
> > > > 
> > > > 
> > > > * should we use mmap instead of reading stuff from a file?
> > > This I can’t answer. Maybe?
> > > 
> > > > 
> > > > 
> > > > * current version does not support 21 or 22 long hashes, why are they
> > > > necessary?
> > > There are a handful of (weird) places that do something like poke a “+” on
> > > the
> > > end of a hash in the manifest to mark it as dirty, but that is never saved
> > > to
> > > disk.
> > FWIW, we could probably replace the "+" hack with an "C"*40 hack. The SHA1
> > space
> > is big enough that we can punch some other small holes in it in addition to
> > the
> > null hash.
> I thought Yuya was working on adding "F"*40 (unless I missed something
> special about "C"?)?

It's just a number with a 2^-160 probability of collision with a real hash.
FFFF... was chosen as a pseudo-hash for the working copy pseudo-changeset to
contrast with the all-zeros representation of nullid. CCCCC... is simply a
(confusing) mnemonic for "changed" since M isn't available.

Having the pseudo-hashes for the file-level and changeset-level be the same is
slightly risky since we have nothing in Python that prevents us from ever
comparing these different types. But it has the upside that they have similar
interpretations.

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial-devel mailing list