[RFC] Upgrades to the dirstate format

Matt Mackall mpm at selenic.com
Fri Apr 16 11:36:50 CDT 2010


On Fri, 2010-04-16 at 12:44 +0200, Benoit Boissinot wrote:
> Continuing a long standing tradition of sneaking upgrades and new
> features in unused bits, here is my proposal to extend the dirstate
> format.
> 
> [sorry it's a bit long, if you already know about dirstate, skip
> directly to summary]
> 
> Current dirstate format
> =======================
> 
> .hg/dirstate:
> <p1 binhash><p2 binhash>
> <list of dirstate entries>
> 
> a dirstate entry is composed of:
> 8bit: status
> 32bit: mode
> 32bit: size
> 32bit: mtime
> 32bit: length
> variable length entry (length given by the previous length field) with:
> "<filename>" followed if it's a copy by: "\0<source if copy>"
> 
> status can be either:
> 'n': normal
> 'm': merged
> 'a': added
> 'r': removed
> 
> Details of the semantics
> ------------------------
> 
> mode stores the st.st_mod of the file as it was clean, but only the user
> x-bit is ever checked
> 
> size is usually the size of the file, as it was stored (after any
> potential filters).
> If size is -1 or -2, it has a different semantic. First -1, in
> conjunction with mtime can be used to force a lookup.
> Secondly they are although use used when the dirstate is in a merge
> state (p1 != nullid):
> -2 will *always* return dirty, it is used to mark a file that was
> cleanly picked from p2
> With a status of 'r', -2 means that the previous state was -2 (always
> dirty, picked from p2), -1 means the previous status was 'm' (merged),
> those allows revert to pick the right status back during a merge.
> 
> mtime is usually the mtime of the file when it was last clean. If the
> size is < 0, setting -1 as mtime will force a lookup (and allows us to
> correctly deal with changes done less than one second after we updated
> the dirstate).

Please add all this to the wiki.

> Summary
> -------
> 
> In summary, we have the additional "meta" status:
> 'nl' : normallookup (status == 'n', size == -1, mtime == -1 (or sometimes 0))
> 'np2': merged from other parent (status == 'n', size == -2)
> 'rm' : removed and previous state was 'm' (status == 'r', size == -1)
> 'rp2': removed and previous state was 'np2' (status == 'r', size == -2)
> 
> And we can notice that no bits from mode are used, except 0x40 (user
> x-bit). Assuming the bits from stat.st_mode are portable across
> plateform and OSs, the upper bits are set in the following way (in binary)
> 
> S_IFIFO  0001 /* FIFO.  */
> S_IFCHR  0010 /* Character device.  */
> S_IFLNK  1010 /* Symbolic link.  */
> S_IFBLK  0110 /* Block device.  */
> S_IFDIR  0100 /* Directory.  */
> S_IFREG  1000 /* Regular file.  */
> S_IFSOCK 1100 /* Socket.  */ 
> 
> Since hg should only add regular files or symlinks to the dirstate, it means we
> can signal the presence of the extended dirstate entry by setting either
> 0100, or 0001. Then we can use the remaining bits (30 free bits!) to
> encode whatever we want.
> 
> Extended metadatas
> ==================
> 
> I thus propose to add the following new information:
> - 'l' flag (is the entry a symlink)
> - 'fallback-x': should the on-disk file be considered as having the x-bit set,
>   useful if the FS doesn't support exec bit, the bit can still be
>   changed with a git patch).
> - 'fallback-l': should the on disk-file be considered a symlink (useful
>   if the FS doesn't support symlinks, they can still be added to the
>   repo, with hg import and a git patch for example)
> - correctly mark 'np2', for merges we can use a bit to indicate if the
>   file is clean from p1 or from p2.
> - anything else?

Looks reasonable. We might squeeze in some subsecond timestamp data here
too. We could probably use a copy source indicator bit as well (see the
big comment in localrepo._filecommit).

-- 
http://selenic.com : development and support for Mercurial and Linux




More information about the Mercurial-devel mailing list