[PATCH] lfs: add a progress bar when searching for blobs to upload

Yuya Nishihara yuya at tcha.org
Thu Aug 30 07:41:02 EDT 2018


On Wed, 29 Aug 2018 23:17:45 -0400, Matt Harbison wrote:
> On Fri, 24 Aug 2018 18:18:32 -0400, Matt Harbison <mharbison72 at gmail.com>  
> wrote:
> 
> > # HG changeset patch
> > # User Matt Harbison <matt_harbison at yahoo.com>
> > # Date 1535147146 14400
> > #      Fri Aug 24 17:45:46 2018 -0400
> > # Node ID 76eca3ae345b261c0049d16269cdf991a31af21a
> > # Parent  c9a3f7f5c0235e3ae35135818c48ec5ea006de37
> > lfs: add a progress bar when searching for blobs to upload
> >
> > The search itself can take an extreme amount of time if there are a lot  
> > of
> > revisions involved.  I've got a local repo that took 6 minutes to push  
> > 1850
> > commits, and 60% of that time was spent here (there are ~70K files):
> >
> >      \ 58.1%  wrapper.py:     extractpointers      line 297:  pointers =  
> > extractpointers(...
> >        | 57.7%  wrapper.py:     pointersfromctx    line 352:  for p in  
> > pointersfromctx(ct...
> >        | 57.4%  wrapper.py:     pointerfromctx     line 397:  p =  
> > pointerfromctx(ctx, f, ...
> >          \ 38.7%  context.py:     __contains__     line 368:  if f not  
> > in ctx:
> >            | 38.7%  util.py:        __get__        line 82:  return key  
> > in self._manifest
> >            | 38.7%  context.py:     _manifest      line 1416:  result =  
> > self.func(obj)
> >            | 38.7%  manifest.py:    read           line 472:  return  
> > self._manifestctx.re...
> >              \ 25.6%  revlog.py:      revision     line 1562:  text =  
> > rl.revision(self._node)
> >                \ 12.8%  revlog.py:      _chunks    line 2217:  bins =  
> > self._chunks(chain, ...
> >                  | 12.0%  revlog.py:      decompressline 2112:   
> > ladd(decomp(buffer(data, ch...
> >                \  7.8%  revlog.py:      checkhash  line 2232:   
> > self.checkhash(text, node, ...
> >                  |  7.8%  revlog.py:      hash     line 2315:  if node  
> > != self.hash(text, ...
> >                  |  7.8%  revlog.py:      hash     line 2242:  return  
> > hash(text, p1, p2)
> >              \ 12.0%  manifest.py:    __init__     line 1565:   
> > self._data = manifestdict(t...
> >          \ 16.8%  context.py:     filenode         line 378:  if not  
> > _islfs(fctx.filelog(...
> >            | 15.7%  util.py:        __get__        line 706:  return  
> > self._filelog
> >            | 14.8%  context.py:     _filelog       line 1416:  result =  
> > self.func(obj)
> >            | 14.8%  localrepo.py:   file           line 629:  return  
> > self._repo.file(self...
> >            | 14.8%  filelog.py:     __init__       line 1134:  return  
> > filelog.filelog(self...
> >            | 14.5%  revlog.py:      __init__       line 24:   
> > censorable=True)
> 
> Any ideas how to trim down some of this overhead?  revset._matchfiles()  
> has a comment about reading the changelog directly because of the overhead  
> of creating changectx[1].  I think that could work here too, but falls  
> apart because of the need to access the filelogs too.  It seems like  
> reading the changelog and accessing the filelogs directly here is too low  
> level, especially with @indygreg trying to add support for non-filelog  
> storage.

Is there any way to filter lfs files without reading filelog? I suspect it
would spend time scanning each filelog revision.

FWIW, I think it's okay to use storage-level API to scan lfs pointers
linearly.


More information about the Mercurial-devel mailing list