Slow clone from Windows UNC source paths

Fri May 28 06:34:31 CDT 2010

(moving to -devel list)

(previous subject was: "What is the purpose of the "Use pull protocol to copy metadata" hg clone option?")

On 28.05.2010 12:43, Didly Bom wrote:
> On Fri, May 28, 2010 at 12:11 PM, Didly Bom <didlybom at gmail.com
> <mailto:didlybom at gmail.com>> wrote:
> 
>     Matt,
> 
>     I did a few more tests which hopefully will narrow down the problem.
>     I performed the tests using 4 machines:
> 
>     - Local machine: Microsoft Windows XP Professional (2002 version),
>     Service Pack 3, Spanish version. Dell Latitude D430, Intel Core2
>     (U7600) @ 1.20 GHz, 2 GB RAM.  Eset NOD32 Antivirus (v4.0.314.0).
>     Running WebClient service.
> 
>     - Remote machine A: Microsoft Windows Server 2003, Standard x64
>     Edition, Service Pack 2, Spanish version. Intell Xeon CPU (E5410) @
>     2.33 GHz, 13.7 GB RAM. Eset NOD32 Antivirus (v4.0.417.0). Not
>     running WebClient service. I believe that this server is virtualized
>     (as all our windows servers are).
> 
>     - Remote machine B: Microsoft Windows XP Professional (2002
>     version), Service Pack 3, Spanish version. Intel Core2 (6400) @ 2.13
>     GHz, 3.50 GB RAM.  Eset NOD32 Antivirus (v4.0.314.0). Running
>     WebClient service.
> 
>     - Remote machine C: Red Hat Linux 5. 8 GB RAM. I believe it has a
>     quad core Xeon CPU but I am not sure. The Gnome System Monitor
>     reports 8 CPUs.
> 
>     I used TortoiseHg running on the local machine to perform the
>     following operations. Note that in all cases TortoiseHg was running
>     in the machine called "Local machine". The only difference between
>     each of the tests is the source and target destination addresses
>     that I typed on the Clone tool of TortoiseHg and whether I enabled
>     or not the "Use pull protocol" checkbox:
> 
>     1.- When I make a clone of a 3 MB repository from Remote machine A
>     into the local machine with pull enabled it takes around 10 seconds.
>     2.- When I make a clone from Remote machine A into the local machine
>     but I don't use the pull option, it takes between 2 and a half and 3
>     minutes.
>     3.- When I make a clone from my local machine into the Remote
>     machine A, using the pull option it takes around 30 seconds.
>     4.- When I make a clone from my local machine into the Remote
>     machine A, without the pull option, it takes around 15 seconds.
> 
>     5.- When I make a clone from Remote machine B into the local machine
>     with pull enabled it takes around 8 seconds.
>     6.- When I make a clone from Remote machine B into the local machine
>     but I don't use the pull option, it takes around 10 seconds.
> 
>     7.- When I make a clone from Remote machine B into the local machine
>     with pull enabled it takes around 8 seconds.
>     8.- When I make a clone from Remote machine B into the local machine
>     but I don't use the pull option, it takes around 10 seconds.
>     9.- When I make a clone from my local machine into the Remote
>     machine B, using the pull option it takes around 20 seconds.
>     10.- When I make a clone from my local machine into the Remote
>     machine B, without the pull option, it takes around 10 seconds.
> 
>     11.- When I make a clone from Remote machine A into the Remote
>     machine B with pull enabled it takes around 30 seconds.
>     12.- When I make a clone from Remote machine A into the Remote
>     machine B but I don't use the pull option it takes around 3 minutes.
> 
>     From these tests it is clear that the problem only happens when I do
>     not use the pull protocol on the Remote server B, which is the one
>     that has the 64 bit Windows 2003 SP2 and which is also virtualized.
> 
>     I repeated most of the tests several times and the results were
>     quite consistent. In some cases the worst case scenario time (remote
>     machine B to local, no pull) has been worse than 3 minutes (up to 5
>     minutes). On the other hand the good scenarios have never been slow.
>     I suspect that the worst case scenario timing may depend on the load
>     of the remote machine.
> 
>     I also did a test in which I stopped the Web Client service, but
>     this did not seem to make a difference (i.e. the worst case scenario
>     was still slow).

Did you stop the Web Client service on the _client_ machine? (the machine
that runs the hg.exe process).

Stopping the Web Client service on the server is supposed to be irrelevant.

>     Regarding CIFS or SMB I am not sure which is being used. I don't
>     think that you can chose one or the other when you create a network
>     share on Windows.
> 
>     Finally, I performed a folder diff between two clones made from the
>     Remote server A (the one showing the problem). I had used the pull
>     protocol to create one of the clones, while for the other I did not
>     use it. I was surprised that there were some differences on the .hg
>     folder:
>     - The "dirstate" files were different
>     - The clone for which I used "pull" had some extra files:
>     undo.branch, undor.dirstate and store/undo.
> 
>     When I compared two clones that both had used the pull protocol the
>     only difference was on the "dirstate" file. This is probably normal,
>     but I wanted to point it out just in case it was useful to narrow
>     down the problem even further.
> 
>     I'll try to install WireShark soon. When I do so I will try to
>     capture the problem. Is it ok to send the trace as an attachment to
>     the list? It may be a big attachment file...
> 
>     Cheers,
> 
>     Didly
> 
> 
> I have performed the WireShark captures. They are two files of 600 KB
> and 2.5 MB. How do you want me to send them to you guys?

Can you upload them somewhere instead? Bitbucket allows uploading files on free
accounts/repos. Not sure if I will look at them though.

BTW, I've started looking a bit at logs from the tool "Process Monitor" here
(Microsoft sysinternals), which logs all file accesses. I might comment on
that later.

Antivirus on-access scanners are known to potentially very badly affect
mercurial's file access. So unless you explicitly understand all exact
consequences I strongly recommend to exclude all repository directories from
on-access-scanning and also from indexing by the Indexing service.

BTW2, I've started playing with the following patch (not ready for pushing
yet):

# HG changeset patch
# User Adrian Buehlmann <adrian at cadifra.com>
# Date 1275044856 -7200
# Node ID e92dbef5466765475a86cbd6f0c3671827ad3827
# Parent  b9e89fc5c7f11ab8e4a9f74b9c266d8f1d11b91f
util: don't call os_link() again if it already failed in copyfiles()

Achieved by returning the 'hardlink' flag on copyfiles() recursion.

Thanks to this, if the os_link() call on the first file in the top level
directory already fails [1], the copying process switches mode to using
shutil.copy() for the rest of the tree, assuming that calling os_link()
again would be pointless because it would fail anyway.

[1] happens on Windows for every file when cloning from a UNC path

diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -458,7 +458,7 @@ def copyfiles(src, dst, hardlink=None):
         for name, kind in osutil.listdir(src):
             srcname = os.path.join(src, name)
             dstname = os.path.join(dst, name)
-            copyfiles(srcname, dstname, hardlink)
+            hardlink = copyfiles(srcname, dstname, hardlink)
     else:
         if hardlink:
             try:
@@ -469,6 +469,8 @@ def copyfiles(src, dst, hardlink=None):
         else:
             shutil.copy(src, dst)
 
+    return hardlink
+
 class path_auditor(object):
     '''ensure that a filesystem path contains no banned components.
     the following properties of a path are checked: