RFC: Transparent subrepository support by match module

Wed Aug 18 13:11:20 CDT 2010

Mads Kiilerich <mads <at> kiilerich.com> writes:

> 
>   Martin Geisler wrote, On 08/18/2010 12:50 PM:
> > Didly Bom<didlybom <at> gmail.com>  writes:
> >> On Wed, Aug 18, 2010 at 9:59 AM, Martin Geisler<mg <at> lazybytes.net>  wrote:
> >>> So perhaps one could extend the '**' and '*' glob operators by adding
> >>> a lazy forms: '**?' and '*?' which then wont expand across
> >>> subrepository boundaries. This is inspired by regexps where the '*?'
> >>> is a non-greedy version of the '*' character.
> >> I like this idea. It seems powerful yet easy to understand.
> > I just had a better idea:
> >
> > - "*" matches all characters except "/"
> > - "**" matches all characters including "/", but stops at subrepos
> > - "***" matches everything
> 
> I agree that the matcher should recurse when for example commit recurse. 
> I think there should be a general flag for enabling/disabling recursion 
> everywhere, but why do we need other and special syntax for stopping 
> recursion at subrepo borders? I think it adds unneeded complexity and we 
> don't want it.
> 
> /Mads

Why at all?
===========
Some people think, subrepos should be handled just like directories in Mercurial, i.e., commands should recursively traverse them, if appropriate. Other people think, subrepos are special, Mercurial commands should never traverse into them.

This leads to endless little-endian and big-endian discussions, even among otherwise intelligent folks.

Today, when one wants to tell Mercurial what to consider, one can just use patterns.  These patterns can be consumed by the shell or by Mercurial directly.  The latter works on all platforms and in any shell.

The shells out there know nothing about Mercurial or even subrepos---no need to discuss them any further.

With a --non-recursive option, you could only select files to be committed in the outer repository---you could not select specific subrepositories.

Say, you have following structure:
M A
M Sub1/
M Sub1/b
M dir1/Sub2/
C dir1/Sub2/c

That is, you have a modified file A in outer repo, the subrepository Sub1 has been updated to a revision other than stored in .hgsubstate, the file b in subrepo Sub1 is modified.  The subrepo Sub2 under dir1 is clean, but its revision differs from the one in .hgsubstate.

Now suppose the changes in the files 'Sub1/b' and 'dir1/Sub2/c' are not ready for commit.  So you want to commit A Sub1/ dir1/Sub2/

There is no way to do that today.  You could only 'shelve' the modified files and then just commit.  Unfortunately, the data to be stored could be quite big.  In our company, we have Mercurial repositories of several GiB and some even contain far more than ten subrepos.  We used 'attic' in the past and it generated patches of 50 MiB or more.

Even then, you could only commit at least Sub1 and dir1/Sub2.  You could not commit just the new revision of dir1/Sub2.

With the proposed subrepo boundary, you could call
   hg ci --include '**\0/' A        # file A and all new revisions of all subrepositories
or
   hg ci --include '**\0/Sub2/' A   # file A and the new revision of subrepo dir1/Sub2

Klaus