[RFC] revision sets

Wed May 26 18:28:37 CDT 2010

On Wed, 2010-05-26 at 21:24 +0200, Henrik Stuart wrote:
> On 23-04-2010 22:16, Henrik Stuart wrote:
> > On 20-04-2010 00:32, Matt Mackall wrote:
> >> Right now we have a notion of revision ranges, ie:
> >>
> >>  hg log -r 0:1.0
> >>
> >> Internally, we iterate over this with something like:
> >>
> >>  for rev in commandutil.revrange(opts['rev']):
> >>
> >> I've been talking about expanding this into a more powerful system that
> >> would allow specifying dates, keywords, branches, etc. My current
> >> thought is to make it look like this:
> >>
> >>  hg log -r "branch(foo) and keyword(bar) and date(mar 1 - apr 1)"
> > 
> > A query language seems like a very good idea. It would probably be
> > prudent to get a list going on what primitives we would like to support
> > and what they should do. It would, for instance, be lovely if we could
> > find separate primtives for the entire branch and the branch tip, and
> > possibly also branch heads.
> > 
> > [snip]
> > 
> >>  hg log -r "descendant(parent2(1.0)) and ancestor(2.0) and
> >> author(george) and sorted(date) and reversed()"
> >>
> >> Read that as: every cset that is descended from the second parent of
> >> revision 1.0 and is also an ancestor of 2.0 and was written by george,
> >> sorted by date in reverse order.
> >>
> >> revrange would be replaced by a new revset function that would parse the
> >> query/queries and build an iterator. Some of the operations, like
> >> keyword() and author(), would obviously be fairly expensive and many
> >> would fail to work (at least for now) on remote repos.
> > 
> > We should probably greatly consider whether to support this remotely,
> > depending on what primitives are chosen so as not to put too great a
> > strain on the remote server.
> > 
> >> I've pitched an idea like this before, usually with a weird
> >> operator-intensive syntax. This time, I think the right thing is an
> >> easily-read but more verbose query language.
> >>
> >> Steps to get from here to there:
> >>
> >> - change all callers of revrange to revset
> >> - design a BNF for the revset query language
> >> - build a query parser/"compiler"
> >> - add filters for the query functions
> >> - simplify some of the existing options (like -d and -k) by turning them
> >> into queries internally
> >>
> >> Thoughts?
> > 
> > A few pedantic notes: ancestor() and descendant() seem to indicate to me
> > that they can pick any one ancestor of something and I'd much favour the
> > plural versions, ancestors() and descendants(), as that indicates the
> > full set.
> > 
> > Also, since we are working on sets, "and" is really intersection, and
> > "or" is really union. One could ponder whether it wouldn't be nice to
> > support the usual set operations in some manner (subtraction and
> > difference come to mind).
> > 
> > With regards to syntax, I rather favour supporting both the written out
> > operations and a shortcut like Dirkjan proposed elsewhere in the thread.
> > 
> > And finally a brief stab at defining relevant functions/primitives:
> > 
> > Notation:
> >   c(_n)?: changeset (a one-element set probably)
> >   s(_n)?: sets of changesets
> > 
> > hash: changeset with hash
> > rev: changeset with rev
> > rev_1:rev_2: set of changesets between rev_1 and rev_2 linearly
> > .: working context parent1
> > 
> > ancestor(c_1,c_2): common ancestor of c_1 and c_2
> > ancestors(s): all ancestors of c
> > children(s): all immediate children of s (one level)
> > descendants(s): all descendants of s
> > heads(name): branch heads of branch name
> > heads(s): branch heads of s
> > topo-heads(s): topological heads of s
> > branch(name): set of changes on branch name
> > tag(name): changeset pointed to by name tag
> > tip(s): tip-most changeset
> > tip(name): tip-most name branch changeset
> > parent1(c): first parent of changeset
> > parent2(c): second parent of changeset
> > parents(c): both parents of changeset
> > 
> > keyword(name): changesets where user/commit message/etc. contain name
> > user(name): changesets where user contains name
> > date(datespec): changesets with date within datespec
> > adds(fname): changesets that add fname
> > removes(fname): changesets that removes fname
> > modifies(fname): changesets that modify fname
> > file(fname): changesets that add/remove/modify fname
> 
> Replying to myself since nothing has happened since...

Actually a lot has happened: we wrote two nearly complete
implementations.

> I have written a proposal in the form of a Mercurial extension that can
> parse and present something very close to the outlined functions above
> (with a few changes, and some unimplemented filters, namely the
> file-related ones, as well as sorting) using the log command (by way of
> a new debugrevspec command).
> 
> The extension can be cloned from http://bitbucket.org/hstuart/hg-revspec
> and instructions are present in the README.
> 
> Please note that Matt has an alternative implementation of this thing as
> well. I'll leave it to Matt to present you with a link to his version if
> he so pleases.
> 
> If you are at all interested in this area, now would be the time to
> weigh in with your thoughts, preferences, missing features, etc.

The thing we're most interested in is probably feedback on the syntax.
Both of our implementations implement very similar query styles:

operators:
  and, or, not, :, .., ()

function-style filters:
  ancestors(a), ancestor(x, y), keyword("foo")

identifiers:
  tip, 1.0, default, "quoted-to-be-parser-friendly"

See these two URLs for examples:

http://bitbucket.org/hstuart/hg-revspec/changeset/446121faa725
http://www.selenic.com/blog/?p=613

We also need to come up with a scheme for sorting. I'm currently
considering something like:

# sort ancestors by user (ascending), then date (descending)
sort(ancestors(1.1), "user -date")

-- 
Mathematics is the supreme nostalgia of our time.