RFC: Transparent subrepository support by match module

Martin Geisler mg at lazybytes.net
Wed Aug 18 02:59:09 CDT 2010

Klaus Koch <kuk42 at gmx.net> writes:

Nobody has any comments on this?

I'm asking since I think this is slightly over-engineered and would
prefer a simpler solution. So if anybody likes this proposal, then
please let us know :-)

> The Mercurial match module and commands should support file paths and
> file patterns into subrepositories. For example, 'hg diff sub1/a'
> should print the diff of file 'a' in subrepo 'sub1'.

I'm working on patches for status and diff. They currently know how to
recurse but cross-repository paths are rejected by the path auditor:

  % ./run-tests.py -l test-status-subrepos.t

  ERROR: test-status-subrepos.t output changed
  --- test-status-subrepos.t 
  +++ test-status-subrepos.t.err 
  @@ -29,7 +29,7 @@
   Status call crossing repository boundaries:

     $ hg status inner/x.txt
  -  M inner/x.txt
  +  abort: path 'inner/x.txt' is inside repo 'inner'

My current thinking is that this can be fixed by informing the path
auditor of all subrepositories and making it accept such paths as
already audited.

The patch queue is here:


> By default, Mercurial commits any changed files and subrepositories.
> This is good, because it is easier to backout unintended commited data
> than regain lost data.
> In case one wants to commit only subsets of files and/or
> subrepositories, they can be selected/included or excluded by shell
> patterns, glob patterns or even regular expressions. This works quite
> fine for files, however, for subrepositories the support not so good.
> Today, one can select/include or exclude subrepositories only as a
> whole. That is, one can commit all changed files and revision states in
> subrepositories, but not just the states or only the files (for several
> repositories). For example, '-X sub1' would exclude any changed files or
> dirty state of subrepo sub1, whereas '-I sub1' would include sub1's dirty
> state *and* changed files. One cannot select just the dirty state of the
> subrepo sub1.
> Proposed Solution
> =================
> The match module and the Mercurial commands should tranparently support
> subrepositories, i.e., hg diff sub1/a should print the diff for file a
> in subrepository sub1.
> Introduce a subrepo boundary marker defining the border between an outer
> repository and a subrepository for Mercurial patterns (glob, relglob,
> re). As marker could be used NUL.

As discussed elsewhere, it turns out that NUL is a bad byte to use here
since it cannot be passed as a command line argument.

I don't like the idea of asking the user to enter any control character
line '\n' -- as a user I've had nothing but trouble whenever I wanted to
embed such characters in command line arguments and I'm never really
sure if works.

So perhaps one could extend the '**' and '*' glob operators by adding a
lazy forms: '**?' and '*?' which then wont expand across subrepository
boundaries. This is inspired by regexps where the '*?' is a non-greedy
version of the '*' character.

> For example, all files in an outer repo could be matched with
> 'glob:**\0' or 're:.*\0'.
> Compared to a options which would tell Mercurial whether it should
> work recursively in regard to subrepos, with a boundary marker one
> could select of files and states more easily, and one could select
> states and files across subrepositories at the same time.
> For example, selecting all files in the outer repository and the state
> in subrepo sub1 can be done with the patterns '**\0' and '**\0sub1',
> but not with a hypothetical option --recursive or --nonrecursive.
> Some use cases
> ==============
> a) all files in outer repo ('**\0')
> b) all new states in subrepos of x level (for first level: '**\0*/' or
>    '**\0*')
> c) all files in subrepos of x level, including any nested subrepos
>    (for first level: '**\0**')
> d) all files in subrepos of x level, excluding any subrepos (for first
>    level: '**\0**\0')
> e) all new states in nested subrepos of level v in subrepos of x level
>    (for first level: '**\0**\0*/')
> Why the \0?
> ===========
> a) It is the only character not allowed in POSIX file names. Windows
>    does not allow it in file names. It is the only character Mercurial
>    will never support for file names short of changing its internal data
>    formats.
> b) If you call 'hg status --no-status --print0', you would get a list
>    like file1\0dir1/file1\0 Currently, the status does not recurse into
>    subrepositories. So the limiting character to the subrepo directory
>    names is in a way the '\0'.
> c) It is not used in regular expressions (so far). Instead of NUL, we
>    could use any character which is not allowed in Windows and needs
>    quoting for most shells: <, >, |, :, (, ), &.
> Some further examples
> =====================
> **\0                    # every file in outer repo

This could be done with a --no-recurse option, right?

> **\0/*                  # every subrepo state (no nested subrepos exist)

So "hg commit -I '**\0/*'" is like

  cd sub1; hg commit *; cd ..
  cd sub2; hg commit *; cd ..

for every subrepository? I'm not sure why one would do that -- what
should the commit message me that spans all subrepositories and yet
should not also cover the outer repository?

> **\0/**                 # every file in any subrepo, but not their state

So I guess my example above is not correct? Is "hg commit -I '**\0/*'"

  hg commit .hgsubstate

and then "hg commit -I '**\0/**'" is like

  cd sub1; hg commit *; cd ..
  cd sub2; hg commit *; cd ..


> **.c\0                  # any C file in outer repo

This use case makes sense, I can easily see why you would want to limit
a commit to the top level. However, I think a --no-recurse option would
be simpler.

> **\0/sub[0-9]           # state of all subrepos sub0 to sub9
> **\0/sub1/**.c          # any C file under any 1st level subrepo called sub1
> **\0/sub1/**\0sub1sub1  # state of any nested subrepo named sub1sub1 in a
>                         # subrepo named sub1 nested right below sub1

You would use these patterns in -I or -X arguments right? Personally, I
never use -I or -X since I find them confusing compared to letting my
shell expand things for me upfront -- I realize people on Windows
probably feel the opposite :)

I just wanted to say that this may be why I prefer a solution that does
not involve the patterns but instead just uses a --no-recurse flag.

Martin Geisler

Mercurial links: http://mercurial.ch/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20100818/44e278ac/attachment.pgp>

More information about the Mercurial-devel mailing list