Slow clone from Windows UNC source paths
Adrian Buehlmann
adrian at cadifra.com
Fri May 28 06:34:31 CDT 2010
(moving to -devel list)
(previous subject was: "What is the purpose of the "Use pull protocol to copy metadata" hg clone option?")
On 28.05.2010 12:43, Didly Bom wrote:
> On Fri, May 28, 2010 at 12:11 PM, Didly Bom <didlybom at gmail.com
> <mailto:didlybom at gmail.com>> wrote:
>
> Matt,
>
> I did a few more tests which hopefully will narrow down the problem.
> I performed the tests using 4 machines:
>
> - Local machine: Microsoft Windows XP Professional (2002 version),
> Service Pack 3, Spanish version. Dell Latitude D430, Intel Core2
> (U7600) @ 1.20 GHz, 2 GB RAM. Eset NOD32 Antivirus (v4.0.314.0).
> Running WebClient service.
>
> - Remote machine A: Microsoft Windows Server 2003, Standard x64
> Edition, Service Pack 2, Spanish version. Intell Xeon CPU (E5410) @
> 2.33 GHz, 13.7 GB RAM. Eset NOD32 Antivirus (v4.0.417.0). Not
> running WebClient service. I believe that this server is virtualized
> (as all our windows servers are).
>
> - Remote machine B: Microsoft Windows XP Professional (2002
> version), Service Pack 3, Spanish version. Intel Core2 (6400) @ 2.13
> GHz, 3.50 GB RAM. Eset NOD32 Antivirus (v4.0.314.0). Running
> WebClient service.
>
> - Remote machine C: Red Hat Linux 5. 8 GB RAM. I believe it has a
> quad core Xeon CPU but I am not sure. The Gnome System Monitor
> reports 8 CPUs.
>
> I used TortoiseHg running on the local machine to perform the
> following operations. Note that in all cases TortoiseHg was running
> in the machine called "Local machine". The only difference between
> each of the tests is the source and target destination addresses
> that I typed on the Clone tool of TortoiseHg and whether I enabled
> or not the "Use pull protocol" checkbox:
>
> 1.- When I make a clone of a 3 MB repository from Remote machine A
> into the local machine with pull enabled it takes around 10 seconds.
> 2.- When I make a clone from Remote machine A into the local machine
> but I don't use the pull option, it takes between 2 and a half and 3
> minutes.
> 3.- When I make a clone from my local machine into the Remote
> machine A, using the pull option it takes around 30 seconds.
> 4.- When I make a clone from my local machine into the Remote
> machine A, without the pull option, it takes around 15 seconds.
>
> 5.- When I make a clone from Remote machine B into the local machine
> with pull enabled it takes around 8 seconds.
> 6.- When I make a clone from Remote machine B into the local machine
> but I don't use the pull option, it takes around 10 seconds.
>
> 7.- When I make a clone from Remote machine B into the local machine
> with pull enabled it takes around 8 seconds.
> 8.- When I make a clone from Remote machine B into the local machine
> but I don't use the pull option, it takes around 10 seconds.
> 9.- When I make a clone from my local machine into the Remote
> machine B, using the pull option it takes around 20 seconds.
> 10.- When I make a clone from my local machine into the Remote
> machine B, without the pull option, it takes around 10 seconds.
>
> 11.- When I make a clone from Remote machine A into the Remote
> machine B with pull enabled it takes around 30 seconds.
> 12.- When I make a clone from Remote machine A into the Remote
> machine B but I don't use the pull option it takes around 3 minutes.
>
> From these tests it is clear that the problem only happens when I do
> not use the pull protocol on the Remote server B, which is the one
> that has the 64 bit Windows 2003 SP2 and which is also virtualized.
>
> I repeated most of the tests several times and the results were
> quite consistent. In some cases the worst case scenario time (remote
> machine B to local, no pull) has been worse than 3 minutes (up to 5
> minutes). On the other hand the good scenarios have never been slow.
> I suspect that the worst case scenario timing may depend on the load
> of the remote machine.
>
> I also did a test in which I stopped the Web Client service, but
> this did not seem to make a difference (i.e. the worst case scenario
> was still slow).
Did you stop the Web Client service on the _client_ machine? (the machine
that runs the hg.exe process).
Stopping the Web Client service on the server is supposed to be irrelevant.
> Regarding CIFS or SMB I am not sure which is being used. I don't
> think that you can chose one or the other when you create a network
> share on Windows.
>
> Finally, I performed a folder diff between two clones made from the
> Remote server A (the one showing the problem). I had used the pull
> protocol to create one of the clones, while for the other I did not
> use it. I was surprised that there were some differences on the .hg
> folder:
> - The "dirstate" files were different
> - The clone for which I used "pull" had some extra files:
> undo.branch, undor.dirstate and store/undo.
>
> When I compared two clones that both had used the pull protocol the
> only difference was on the "dirstate" file. This is probably normal,
> but I wanted to point it out just in case it was useful to narrow
> down the problem even further.
>
> I'll try to install WireShark soon. When I do so I will try to
> capture the problem. Is it ok to send the trace as an attachment to
> the list? It may be a big attachment file...
>
> Cheers,
>
> Didly
>
>
> I have performed the WireShark captures. They are two files of 600 KB
> and 2.5 MB. How do you want me to send them to you guys?
Can you upload them somewhere instead? Bitbucket allows uploading files on free
accounts/repos. Not sure if I will look at them though.
BTW, I've started looking a bit at logs from the tool "Process Monitor" here
(Microsoft sysinternals), which logs all file accesses. I might comment on
that later.
Antivirus on-access scanners are known to potentially very badly affect
mercurial's file access. So unless you explicitly understand all exact
consequences I strongly recommend to exclude all repository directories from
on-access-scanning and also from indexing by the Indexing service.
BTW2, I've started playing with the following patch (not ready for pushing
yet):
# HG changeset patch
# User Adrian Buehlmann <adrian at cadifra.com>
# Date 1275044856 -7200
# Node ID e92dbef5466765475a86cbd6f0c3671827ad3827
# Parent b9e89fc5c7f11ab8e4a9f74b9c266d8f1d11b91f
util: don't call os_link() again if it already failed in copyfiles()
Achieved by returning the 'hardlink' flag on copyfiles() recursion.
Thanks to this, if the os_link() call on the first file in the top level
directory already fails [1], the copying process switches mode to using
shutil.copy() for the rest of the tree, assuming that calling os_link()
again would be pointless because it would fail anyway.
[1] happens on Windows for every file when cloning from a UNC path
diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -458,7 +458,7 @@ def copyfiles(src, dst, hardlink=None):
for name, kind in osutil.listdir(src):
srcname = os.path.join(src, name)
dstname = os.path.join(dst, name)
- copyfiles(srcname, dstname, hardlink)
+ hardlink = copyfiles(srcname, dstname, hardlink)
else:
if hardlink:
try:
@@ -469,6 +469,8 @@ def copyfiles(src, dst, hardlink=None):
else:
shutil.copy(src, dst)
+ return hardlink
+
class path_auditor(object):
'''ensure that a filesystem path contains no banned components.
the following properties of a path are checked:
More information about the Mercurial-devel
mailing list