[PATCH 3 of 6 v6 lazy-manifest] manifest: split manifestdict into high-level and low-level logic

Mon Mar 9 08:19:13 CDT 2015

On Mar 8, 2015, at 1:43 AM, Martin von Zweigbergk <martinvonz at google.com> wrote:
> On Sat, Mar 7, 2015 at 9:25 AM Augie Fackler <raf at durin42.com> wrote:
>> # HG changeset patch
>> # User Augie Fackler <augie at google.com>
>> # Date 1425747879 18000
>> #      Sat Mar 07 12:04:39 2015 -0500
>> # Node ID 74e64852b07f9cfb5a7b89d827dd9e1f01314b1b
>> # Parent  2524c117ce836e18402cac792361f0e44cac42c5
>> manifest: split manifestdict into high-level and low-level logic
>> 
>> 
>> +    def __iter__(self):
>> +        return ((f, e[0], e[1]) for f, e in sorted(self.iteritems()))
>> 
> 
> What's the reason for yielding (f, n, fl) instead of the more natural (f, (n, fl))? The infamous effect tuples have on Python's GC? Also see questions further down.

It's mostly the GC concern, but it also simplified the implementation of the C type.

>> +
>> +class manifestdict(object):
>> 
> 
> So _lazymanifest is not lazy, manifestdict is not a dict, but _lazymanifest is a dict :-) Just a note, not asking for anything to change.

Yup.

>> +
>> +    def __iter__(self):
>> +        return (x[0] for x in self._lm)
>> 
> If the answer to my question above was about GC, then wouldn't this (the creation of tuples that are immediately thrown away) be an equally big problem? If the low-level type's __iter__ was left untouched (and iteritems() would be used where __iter__ is currently used), then there wouldn't be any tuples involved here.

It's probably worth benchmarking at some point in the future, and/or adding more iterator methods to the C type. For now, I was going for the simplest thing that could work.

>  
>> +
>> +    def copy(self):
>> +        c = manifestdict('')
>> +        c._lm = self._lm.copy()
>> +        return c
>> +
>> +    def iteritems(self):
>> +        return (x[:2] for x in self._lm)
> 
> Similar comment here: this creates a 3-tuple that gets converted into a 2-tuple. That does seem better than creating 2 2-tuples (f, (n, e)) that converted into another one (f, n). Even better would be to add a new filesandnodes() generator that yields exactly the wanted (f, n) tuples, no? OTOH, is that really what we want in many places? I have a feeling that most places that user iteritems() would actually prefer all three fields. Yielding (f, (n, e)) seems cleaner to me, but if GC is a concern, then perhaps we'd have to yield (f, n, e). What do you think?

That matches my guess too, but that feels like a followup patch to me. In general, I think I want to move to iterating over the 3-tuple everywhere just so things are more consistent, but that feels too invasive to do all at once.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20150309/57edff63/attachment.pgp>