Proposed Patch to hgignore
brendan at kublai.com
Thu Mar 6 17:52:07 CST 2008
On Thursday, 06 March 2008 at 16:19, Matt Mackall wrote:
> On Thu, Mar 06, 2008 at 10:01:09PM +0100, Jakob Krainz wrote:
> > On Thu 2008-03-06 14:00:19 -0600, Matt Mackall wrote:
> > >
> > > As discussed in the distant past, this feature is very difficult to
> > > implement in a way that is simultaneously correct and easy to
> > > understand, nevermind fast. Even though it looks simple on paper.
> > >
> > > Ever wonder why regexes themselves don't have a negation operator?
> > > It's the sort of deep result that computer science professors tend to
> > > skim over but could easily fill a book or two.
> > >
> > Maybe, but my patch uses only facilities that are already present
> > in mercurial (look at the -I and -X options).
> Yes, but you've made all the semantics much more complicated.
> For starters, pattern matching now has ordering issues. Which means
> there are now precedence issues. Across all the ignore files that are
> in effect. So if I say ignore *.c in one and I say don't ignore *.c in
> another, I have to know which one gets read first.
> Also, if I say "ignore foo/" in one place, and "don't ignore bar.c" in
> another (and we happen to have a foo/bar.c), what do we do? It's not
> obvious. On the one hand, we want to traverse all files so that we can
> pick up all possible exceptions. On the other, we really want to not
> traverse all files, because that makes our tea cold. And we can't know
> ahead of time that a particular unignore is inside a particular ignore
> except for relatively trivial cases.
> Then let's suppose I say "ignore fo*/" somewhere later. Should that
> actually work? If so, then we've got to do break down all the rules
> and apply them in order, and in this case we'll have to test against
> all the directory components. Now we've gone from 1 regex call per
> file to O(rules * depth) and we're looking at every file in the repo.
> Now if we simply say "there are only two sets, ignore and unignore", then
> we're still stuck with lots of unintuitive behavior and probably a
> bunch of users who won't understand the implications of the docs.
> > a valid comparison would be the hosts.allow / hosts.deny mechanism
> > of wietse venemas tcp_wrapper.
> > (cf. http://itso.iu.edu/TCP_Wrappers )
> First, note that that's not traversing the tree of the namespace, so
> it's a much simpler problem. And people still get those sorts of rules
> wrong all the time! I wrote just such a simple ruleset for a router
> when I worked at Cisco and every single tester a) got it wrong and b)
> had their own unique idea of how to make it more intuitive.
> The current subtraction-only rules for .hgignore are simple,
> unambiguous, and fast. It could stand to be better documented, with
> more examples, and a note that no, you can't do an inverse match with
> a regex.
I've seen plenty of people make errors even with the current system
(mostly, people just aren't very good at writing regexes). I lean a
bit towards an ignore set and an unignore myself, where unignore takes
precedence (so in your ignore foo/, unignore bar.c case, foo/bar.c
would not be ignored). Maybe it's less intuitive, but I think in
practice it can sometimes make a proper .hgignore much shorter, and
shorter means less bugs, right? :)
It would also be nice to be able to match according to flags like exec
or symlink :)
More information about the Mercurial-devel