[PATCH 2 of 3] lfs: add a small language to filter files
Yuya Nishihara
yuya at tcha.org
Sun Jan 7 03:17:12 EST 2018
On Thu, 04 Jan 2018 23:58:55 -0500, Matt Harbison wrote:
> # HG changeset patch
> # User Matt Harbison <matt_harbison at yahoo.com>
> # Date 1514704880 18000
> # Sun Dec 31 02:21:20 2017 -0500
> # Node ID 8c20ade835ce43441c61e56e63d9bf92deaacd55
> # Parent 2798cb4faacdae2db46e84ba0f3beaf506848915
> lfs: add a small language to filter files
>
> This patch was authored by Jun Wu for the fb-experimental repo, to avoid using
> matcher for efficiency[1]. All I've changed here is the package (hgext3rd ->
> hgext), and fixed up the imports in the test file (use absolute_import,
> print_function, and 'from lfs import ...' -> 'from hgext.lfs import...').
>
> We want a way to specify what files to be converted to LFS at commit time.
> And per discussion, we also want to specify what files to skip text diff or
> merge in another config option. The current `lfs.threshold` config option
> could not satisfy complex needs.
>
> This diff adds a small language for that. It's self-explained, and deals
> with both simple and complex cases. For example:
>
> always # everything
> >20MB # larger than 20MB
> !.txt # except for .txt files
> .zip | .tar.gz | .7z # some types of compressed files
> /bin # files under "bin" in the project root
> (.php & >2MB) | (.js & >5MB) | .tar.gz | (/bin & !/bin/README) | >1GB
>
> [1] https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-December/109387.html
Can't we make it a subset of the fileset language so we can eventually switch
to it if O(n) issue is solved?
i.e. _compile() the result of fileset.parse(), but abort if unsupported element
found.
> +def _tokenize(text):
> + text = memoryview(text) # make slice zero-copy
> + special = ' ()&|!'
> + pos = 0
> + l = len(text)
> + while pos < l:
> + symbol = ''.join(itertools.takewhile(lambda ch: ch not in special,
> + text[pos:]))
> + if symbol:
> + yield ('symbol', symbol, pos)
> + pos += len(symbol)
> + else: # special char
> + if text[pos] != ' ': # ignore space silently
> + yield (text[pos], None, pos)
Taking anything other than specials as symbol means we can't extend the
language.
More information about the Mercurial-devel
mailing list