[PATCH 2 of 3 RFC] lfs: convert threshold testing to use a matcher

Matt Harbison mharbison72 at gmail.com
Wed Jan 3 04:28:22 UTC 2018


On Sat, 30 Dec 2017 17:36:01 -0500, Jun Wu <quark at fb.com> wrote:

> Excerpts from Matt Harbison's message of 2017-12-29 16:43:23 -0500:
>> lfs: convert threshold testing to use a matcher
>> [...]
>> --- a/mercurial/fileset.py
>> +++ b/mercurial/fileset.py
>> @@ -380,7 +380,14 @@
>>      else:
>>          raise error.ParseError(_("couldn't parse size: %s") % expr)
>>
>> -    return [f for f in mctx.existing() if m(mctx.ctx[f].size())]
>> +    # This explodes when converting largefiles -> lfs.  It looks like  
>> the first
>> +    # commit converts fine, and then when converting the second commit,
>> +    # 'lfs.txt' is one of the files tested here, even though it wasn't  
>> modified
>> +    # in this commit (but it was added in the parent).
>> +
>> +    # XXX: 'f in mctx.ctx and ...' doesn't short circuit None.size(),  
>> so
>> +    # something appears to be wrong with the ctx for conversions.
>> +    return [f for f in mctx.existing() if mctx.ctx[f] and  
>> m(mctx.ctx[f].size())]
>
> As you can see here, this scans all existing files to do the "size(> x)"
> test. The old code is O(1) to do the same test. In a repo with millions  
> of
> files, this will basically make things unusable.

I was hoping that by using the workingcommitctx, we could limit the walk  
to just the files that are being committed.  But I didn't dig too deeply  
into what would need to be done, because the code felt hacky.

>>  @predicate('encoding(name)', callexisting=True)
>>  def encoding(mctx, x):
>
> FWIW, I had an internal patch to make the filter more flexible while  
> still
> maintain good performance. It was accepted but didn't land because the
> feature was not urgently needed and there was concern about conflicting  
> with
> the fileset synatx.

Thanks!  Are there any other uncommitted patches (approved or not)?

Are there specific cases for conflicts that were a concern, or was it a  
concern about reinventing the wheel?  It is a little sad to have to invent  
another language for this.  But it seems like a reasonable idea to me-  
it's pretty simple and intuitive, and doesn't allow nonsensical use of  
'relpath:' and friends that matcher would allow.  Plus, the 'always' and  
'never' symbols are more clear than they would be with filesets.  The only  
other benefit I can think of that filesets have is the binary() predicate,  
but that can probably be added here.

I've got these applied locally, and will probably submit them tomorrow.   
I'm guessing most reviewers need time to catch up.  After that, I would  
like to convert this to a tracked file before the freeze.  So probably:

   1) every line is wrapped in '()' as it is read in
   2) every line is OR'd
   3) '#' lines are stripped and ignored as comments


> # HG changeset patch
> # User Jun Wu <quark at fb.com>
> # Date 1493767928 25200
> #      Tue May 02 16:32:08 2017 -0700
> # Node ID bc949f9b315e3c3967853f209b7d8f8a5e81a6ce
> # Parent  c95721badfb48141e376d428d0621784563688b4
> lfs: add a small language to filter files


More information about the Mercurial-devel mailing list