Differences between revisions 2 and 24 (spanning 22 versions)
Revision 2 as of 2010-10-02 12:44:49
Size: 611
Editor: abuehl
Comment:
Revision 24 as of 2010-11-10 18:36:47
Size: 4936
Editor: abuehl
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
When doing a local clone with a plain 'hg clone A B' mercurial first tries to create hardlinks for
files inside .hg. This speeds up cloning and saves harddisk space by using the same physical file
for two directory entries. Both Linux and Windows NTFS file systems support creating hardlinks.
For filesystems that don't support hardlinks (e.g. Windows FAT), mercurial falls back to copying
all files instead of hardlinking them.
#pragma section-numbers 2
= Hardlinked Clones =
Line 7: Line 4:
In situations where a hardlinked clone may not be ideal, users can use 'hg clone --pull', which will When doing a local [[Clone|clone]] with a plain {{{hg clone A B}}} mercurial first tries to create
[[http://en.wikipedia.org/wiki/Hard_link|hardlinks]]
for files inside the .hg directory. This speeds up cloning and saves harddisk space by using
the same physical file for two or more directory entries.

Both Linux and Windows NTFS file systems support hardlinks. For filesystems that don't support
hardlinks (e.g. Windows FAT), mercurial falls back to copying all files instead of hardlinking them.

In situations where a hardlinked clone may not be ideal, users can use {{{hg clone --pull}}}, which will
Line 9: Line 14:

Cloning over http/https or ssh from a remote server implicitly implies {{{--pull}}}.

When committing or pushing to a [[Repository|repository]], mercurial checks the hardlink count for every
file X it needs to write to inside .hg. If the count is two or more, mercurial breaks up the hardlink
for X before writing to it. Breaking up the hardlink for a file X means (1) copying X to a temporary file,
(2) deleting X and then (3) renaming the tempfile back to X.

<<TableOfContents>>

== Examples ==

{{{
  $ hg clone http://selenic.com/repo/hg hgcopy
  requesting all changes
  adding changesets
  adding manifests
  adding file changes
  added 12613 changesets with 24932 changes to 1936 files
  updating to branch default
  848 files updated, 0 files merged, 0 files removed, 0 files unresolved
}}}

This was clone over http from a remote server. The resulting clone (hgcopy) thus has no hardlinks.

{{{
  $ hg clone --pull hgcopy hgcopy2
  requesting all changes
  adding changesets
  adding manifests
  adding file changes
  added 12613 changesets with 24932 changes to 1936 files
  updating to branch default
  848 files updated, 0 files merged, 0 files removed, ..
}}}

This was a clone with explicit {{{--pull}}}. The resulting clone (hgcopy2) thus has no hardlinks and
is completely independent from hgcopy.

If mercurial prints "adding changesets" then the resulting clone will have no hardlinks.

{{{
  $ hg clone --debug -U hgcopy2 hgcopy3
  linked 1956 files
}}}

This was a clone which uses hardlinks. The files in hgcopy2 and hgcopy3 (inside the .hg dir) are
hardlinked. Mercurial versions 1.6 and later print the number of files that were hardlinked if
{{{--debug}}} is specified.

{{{
  $ hg clone --debug -U hgcopy2 x:\hgcopy4
  copied 1956 files
}}}

This was a clone where mercurial first tried doing hardlinks, but didn't succeed. For example
the filesystem may not support hardlinks or the source and the destination are not on the same
volume. In this case mercurial falls back to copying the files.


== Hardlinked clones on Windows shares ==

Mercurial versions up to 1.6.2 suffer from a [[http://mercurial.selenic.com/bts/issue761|bug]]
which is present in nearly all Windows variants (including Windows 7).
Windows computers that serve files on a share always report a count of '''one''' when asked for
the number of hardlinks a file has, even if a file actually ''does'' have hardlinks and thus the
correct number reported should be two or more. This means mercurial running on a client gets a
wrong hardlink count for files which are part of a hardlinked clone that resides on a windows
network share.

A [[http://selenic.com/repo/hg/rev/50523b4407f6|workaround]] for Mercurial running on Windows for
this has been first released with mercurial 1.6.3 (see WhatsNew). The workaround unconditionally
makes a full copy of each file before writing to it ''if the file is on a windows network share'',
thus making sure any hardlinks that may exist on that file are broken up.

Note that this workaround is only effective if Mercurial is run on Windows. There is a related
bug in the Linux CIFS driver, which is still not fixed ("Linux CIFS mounts may corrupt hardlinked
repos on Windows shares", see Bts:issue1866).
A [[http://selenic.com/repo/hg/rev/bf826c0b9537|workaround]] for that Linux CIFS driver bug will
be released with Mercurial 1.7.1.

All Mercurial versions prior to 1.6.3 fail to cope with this Windows bug. If such a Mercurial
version is used on a client computer to commit or push to a hardlinked clone on a network share,
then the '''target repository may be corrupted''' because the file modifications will erroneously
appear in all clones that share these files. There is no error message reported on the respective
commit or push. The resulting repository corruption is detected by a later hg verify.

== See also ==

 * Mailing list thread [[http://markmail.org/thread/cuhm7dywfbgvgelf|"Repo corrupted again, no idea why"]] (Oct 2010)
 * RepositoryCorruption
 * [[http://technet.microsoft.com/en-us/library/cc788097%28WS.10%29.aspx|'fsutil hardlink' command]]

Hardlinked Clones

When doing a local clone with a plain hg clone A B mercurial first tries to create hardlinks for files inside the .hg directory. This speeds up cloning and saves harddisk space by using the same physical file for two or more directory entries.

Both Linux and Windows NTFS file systems support hardlinks. For filesystems that don't support hardlinks (e.g. Windows FAT), mercurial falls back to copying all files instead of hardlinking them.

In situations where a hardlinked clone may not be ideal, users can use hg clone --pull, which will use the pull protocol for cloning and create a fully independent clone.

Cloning over http/https or ssh from a remote server implicitly implies --pull.

When committing or pushing to a repository, mercurial checks the hardlink count for every file X it needs to write to inside .hg. If the count is two or more, mercurial breaks up the hardlink for X before writing to it. Breaking up the hardlink for a file X means (1) copying X to a temporary file, (2) deleting X and then (3) renaming the tempfile back to X.

1. Examples

  $ hg clone http://selenic.com/repo/hg hgcopy
  requesting all changes
  adding changesets
  adding manifests
  adding file changes
  added 12613 changesets with 24932 changes to 1936 files
  updating to branch default
  848 files updated, 0 files merged, 0 files removed, 0 files unresolved

This was clone over http from a remote server. The resulting clone (hgcopy) thus has no hardlinks.

  $ hg clone --pull hgcopy hgcopy2
  requesting all changes
  adding changesets
  adding manifests
  adding file changes
  added 12613 changesets with 24932 changes to 1936 files
  updating to branch default
  848 files updated, 0 files merged, 0 files removed, ..

This was a clone with explicit --pull. The resulting clone (hgcopy2) thus has no hardlinks and is completely independent from hgcopy.

If mercurial prints "adding changesets" then the resulting clone will have no hardlinks.

  $ hg clone --debug -U hgcopy2 hgcopy3
  linked 1956 files

This was a clone which uses hardlinks. The files in hgcopy2 and hgcopy3 (inside the .hg dir) are hardlinked. Mercurial versions 1.6 and later print the number of files that were hardlinked if --debug is specified.

  $ hg clone --debug -U hgcopy2 x:\hgcopy4
  copied 1956 files

This was a clone where mercurial first tried doing hardlinks, but didn't succeed. For example the filesystem may not support hardlinks or the source and the destination are not on the same volume. In this case mercurial falls back to copying the files.

2. Hardlinked clones on Windows shares

Mercurial versions up to 1.6.2 suffer from a bug which is present in nearly all Windows variants (including Windows 7). Windows computers that serve files on a share always report a count of one when asked for the number of hardlinks a file has, even if a file actually does have hardlinks and thus the correct number reported should be two or more. This means mercurial running on a client gets a wrong hardlink count for files which are part of a hardlinked clone that resides on a windows network share.

A workaround for Mercurial running on Windows for this has been first released with mercurial 1.6.3 (see WhatsNew). The workaround unconditionally makes a full copy of each file before writing to it if the file is on a windows network share, thus making sure any hardlinks that may exist on that file are broken up.

Note that this workaround is only effective if Mercurial is run on Windows. There is a related bug in the Linux CIFS driver, which is still not fixed ("Linux CIFS mounts may corrupt hardlinked repos on Windows shares", see issue1866). A workaround for that Linux CIFS driver bug will be released with Mercurial 1.7.1.

All Mercurial versions prior to 1.6.3 fail to cope with this Windows bug. If such a Mercurial version is used on a client computer to commit or push to a hardlinked clone on a network share, then the target repository may be corrupted because the file modifications will erroneously appear in all clones that share these files. There is no error message reported on the respective commit or push. The resulting repository corruption is detected by a later hg verify.

3. See also

HardlinkedClones (last edited 2019-05-16 22:15:46 by AntonGogolev)