hg log <file> can't be trusted

Matt Mackall mpm at selenic.com
Mon Feb 1 08:36:59 CST 2010


On Mon, 2010-02-01 at 08:50 +0100, Dirkjan Ochtman wrote:
> On Sun, Jan 31, 2010 at 23:22, Greg Ward <greg at gerg.ca> wrote:
> > But it would be nice to see this fixed, or at least better documented.
> >  One idea: add options to log:
> >
> >  --sloppy     fast log of individual files (not always correct) (default)
> >  --thorough   complete log of individual files
> >
> > I'm not sure if --thorough would be an alias for --removed, since I
> > have not read the code to see what exactly --removed does.
> 
> Another: actually keep track of this information (removals and
> re-appearances) in filelogs.

I bet there's actually a relatively lightweight algorithm for
reconstructing a complete changeset to file revision mapping that would
solve all the problems in this area (log, annotate, deletions, and
merge). Consider:

cset graph:

1-2-3-4-5-6-7

file graph:

a2-b7  (number is the linkrev)

The file graph implies that csets 2-6 all contain revision a with no
deletions. And we also know it's not present in 1. We can't delete the
file and reintroduce it without adding a parentless node to the file
graph. Now consider:

cset graph:

1-2-3-4-5-6-10-11-12-13
     \
      7-8-9

file graph:

a2-b4-c9
    \
     d10

Again, not present in 1, a in 2 and 3, b in 4. But what's happening in
c9? The parent cset of 9 must contain b (the duplicate revision on
branch case), which means b is in 8. Which leaves us with either a or b
in 7. And we can quickly test whether the file was modified in cset 7
(appears in the ctx.files()) without doing an expensive manifest lookup.

We also know that csets 5 and 6 must contain b because no other
revisions can appear between 4 and 10.

That leaves csets 11-13. The only thing that can happen here is for the
file to be deleted or remain revision d. It can't switch to c because c
is not a descendant of d. There are a couple tests we can make here. If
the file is present in 13 (via slow manifest lookup), we know it's
present in 11 and 12. Alternately, if it's not changed in 11 and 12 (via
two fast cset checks), we know it's present in 13.

I'm not sure what a complete algorithm looks like (and there are a bunch
of other cases to be considered!), but it seems pretty clear there's
enough implied information here that we can figure most of it out
directly from the graphs, without more expensive cset and manifest
lookups.

(Which is fortunate, because changing the schema is still out of the
question.)

-- 
http://selenic.com : development and support for Mercurial and Linux




More information about the Mercurial-devel mailing list