D4366: treemanifest: introduce lazy loading of subdirs

spectral (Kyle Lippincott) phabricator at mercurial-scm.org
Thu Sep 6 15:08:03 EDT 2018


spectral added a comment.


  In https://phab.mercurial-scm.org/D4366#67153, @spectral wrote:
  
  > In https://phab.mercurial-scm.org/D4366#67144, @indygreg wrote:
  >
  > > I'm not sure how I feel about so many methods having the `if dir in self._lazydirs: self._loadlazy(dir)` pattern.
  > >
  > > On one hand, action at a distance when it involves caching can be dangerous. And doing the lookup inline will avoid a Python function call.
  > >
  > > On the other, it seems very redundant.
  > >
  > > Overall I'm OK with the patch. I just feel like using a `collections.defaultdict` or implementing `__missing__` to automagically resolve missing keys might work out better. Do you think this is a reasonable request?
  >
  >
  > I think it's a reasonable request, but I'm not sure it's feasible with how collections.defaultdict/__missing__ work.  They are *only* called by __getitem__.  We'd need a separate dict subclass for _dirs that handles __contains__ as well, at the very least, for all the cases of `if dir not in self._dirs:  return <something>`
  >
  > I'm playing around with such a thing, it's not difficult but I think might be more awkward than helpful.  Even if that doesn't bear fruit, while implementing it I noticed something: I'm not actually populating self._lazydirs in this patch, I appear to have lost it during split/merge/whatever. Oops.  I have to hope that the numbers I have in the commit description were before I lost this, since otherwise it's completely incomprehensible that this patch would cause a performance benefit (well, maybe not, if dead code can cause a performance loss, why not the other way around?)
  
  
  Turns out that it got quite messy and caused some significant performance loss, though it's likely that at least some of the performance difference was something I did wrong.  To handle the two dictionaries, and that things migrate from one to the other, I needed to override __missing__, get, __setitem__, __nonempty__, __iter__, items, iteritems, values, ... I basically had to reimplement all of dict, and that felt a fair amount messier than the duplication here. If you think that might still be preferable, I can try harder to figure out why there's a significant (~15-30% full runtime on some of the commands) performance difference.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D4366

To: spectral, #hg-reviewers
Cc: indygreg, mercurial-devel


More information about the Mercurial-devel mailing list