Question about auto-refresh and detecting manual edits to standins

Mon Feb 8 22:57:59 CST 2010

> On Mon, Feb 8, 2010 at 11:23 AM, Tessa Starkey <testarkey at gmail.com>
wrote:
> Hello,
> 
>  I'm a part of the group of students working on improvements to bfiles. 
> Recently we were discussing how to detect manual modifications of the
bfile 
> stand-in files and a question arose. 
> (Issue link: https://ucosp.fogbugz.com/default.asp?36#450)
> 
> There are two different situations that would lead to the stand-in files
not 
> match the working copy:
> 
>  1) the stand-in hasn't been bfrefreshed since the working copy was edited
>  2) Someone opened the stand-in in editied it manually, or another program

>     modified the stand-in

There are three main pieces of state associated with a bfile:
 1. The placeholder file in the repository (PR)
 2. The placeholder file in the working directory (PW)
 3. The bfile in the working directory (BW)

Consider these VALID scenarios:
 1. In a clean working copy, we have (PR = PW = BW). That is, the
placeholder is unmodified and its hash matches the bfile.

 2. If a user modifies the bfile, then (PR = PW != BW)

 3. After an 'hg bfrefresh', we end up with (PR != PW = BW)

 4. If the user subsequently modifies the bfile we get (PR != PW != BW)

 5. Finally, the user copies in the original version of the bfile from
elsewhere and so (PR != PW != BW = PR)

In particular, every possible combination of equality between the three
states is a valid configuration under the current behavior.

There are a few more cases when we consider adding and deleting files, but
this is the main idea.

> The question is: How do we tell the difference?
> One possiblility is to store the last bfrefresh time for each file,
possibly 
> somewhere in the .hgbfiles directory, If the bfrefresh time  <  mtime of
the 
> stand-in, then we know it has been modified. 

Given the above discussion, the mtime is the only way to tell if the file
was modified by something other than bfiles. It may be sufficient to check
that mtime(placeholder) = mtime(bfile) and to set the placeholder mtime to
the mtime of the bfile when re-computing. This will not prevent someone from
maliciously faking the mtime, but it should prevent accidental edits.

We should also consider if it makes sense to look for explicit tempering in
the first place. What would a user do if we detect that the placeholder was
edited? If the editing was truly accidental, the user would just run 'hg
bfrefresh' to bring the placeholders in line with the real bfiles. Why not
just do this for the user automatically? Furthermore, if bfrefresh can
recover from this situation automatically, why not just run bfrefresh
instead of first checking for tampering? In particular, if auto-refresh is
enabled, just run bfrefresh(FILE) with every bfiles command that touches
FILE instead of just on commit.

Someone also suggested making the placeholder files read-only as a warning
to the use. However, I don't see Mercurial doing this for any other private
data (i.e. stuff in .hg); is there a particular reason for this?

Thanks,

Anton