RFC: Transparent subrepository support by match module

Martin Geisler mg at aragost.com
Wed Aug 11 08:53:09 CDT 2010


Hi guys,

As you know, I'm working on our subrepo support. Here is a suggestion I
got from one of the developers at my customer and I would like to hear
what you think about it.

Apart from the problem of not being able to pass a NUL character as a
command line argument, the idea of somehow being able to match subrepo
boundaries could be interesting if another character was used.

The proposal is below:


The Mercurial match module and commands should support file paths and
file patterns into subrepositories. For example, 'hg diff sub1/a' should
print the diff of file 'a' in subrepo 'sub1'.

By default, Mercurial commits any changed files and subrepositories.
This is good, because it is easier to backout unintended commited data
than regain lost data.

In case one wants to commit only subsets of files and/or
subrepositories, they can be selected/included or excluded by shell
patterns, glob patterns or even regular expressions. This works quite
fine for files, however, for subrepositories the support not so good.

Today, one can select/include or exclude subrepositories only as a
whole. That is, one can commit all changed files and revision states in
subrepositories, but not just the states or only the files (for several
repositories). For example, '-X sub1' would exclude any changed files or
dirty state of subrepo sub1, wheras '-I sub1' would include sub1's dirty
state *and* changed files. One cannot select just the dirty state of the
subrepo sub1.

Proposed Solution
=================

The match module and the Mercurial commands should tranparently support
subrepositories, i.e., hg diff sub1/a should print the diff for file a
in subrepository sub1.

Introduce a subrepo boundary marker defining the border between an outer
repository and a subrepository for Mercurial patterns (glob, relglob,
re). As marker could be used NUL.

For example, all files in an outer repo could be matched with glob:*0 or
're:.\0'.

Compared to a options which would tell Mercurial whether it should work
recursively in regard to subrepos, with a boundary marker one could
select of files and states more easily, and one could select states and
files across subrepositories at the same time.

For example, selecting all files in the outer repository and the state
in subrepo sub1 can be done with the patterns '*\0' and '*\0sub1', but
not with a hypothetical option --recursive or --nonrecursive.


Some use cases
==============

a) all files in outer repo ('**\0')

b) all new states in subrepos of x level (for first level: '*\0/' or
   '*\0')

c) all files in subrepos of x level, including any nested subrepos (for
   first level: '*\0*')

d) all files in subrepos of x level, excluding any subrepos (for first
   level: '*\0*\0')

e) all new states in nested subrepos of level v in subrepos of x level
   (for first level: '*\0\0/')


Why the \0?
===========

a) It is the only character not allowed in POSIX file names. Windows
   does not allow it in file names. It is the only character Mercurial
   will never support for file names short of changing its internal data
   formats.

b) If you call 'hg status --no-status --print0', you would get a list
   like file1\0dir1/file1\0 Currently, the status does not recurse into
   subrepositories. So the limiting character to the subrepo directory
   names is in a way the '\0'.

c) It is not used in regular expressions (so far). Instead of NUL, we
   could use any character which is not allowed in Windows and needs
   quoting for most shells: <, >, |, :, (, ), &.


Some further examples
=====================

**\0                    # every file in outer repo
**\0/*                  # every subrepo state (no nested subrepos exist)
**\0/**                 # every file in any subrepo, but not their state
**.c\0                  # any C file in outer repo
**\0/sub[0-9]           # state of all subrepos sub0 to sub9
**\0/sub1/**.c          # any C file under any 1st level subrepo called sub1
**\0/sub1/**\0sub1sub1  # state of any nested subrepo named sub1sub1 in a
                        # subrepo named sub1 nested right below sub1


-- 
Martin Geisler

aragost Trifork
Professional Mercurial support
http://aragost.com/mercurial/


More information about the Mercurial-devel mailing list