speed up relink script

TK Soh teekaysoh at gmail.com
Mon Mar 19 16:09:07 CDT 2007


On 3/19/07, TK Soh <teekaysoh at gmail.com> wrote:
> On 3/19/07, Alexis S. L. Carvalho <alexis at cecm.usp.br> wrote:
> > Maybe I'm missing something obvious, but doesn't this seek beyond EOF,
> > making sfp.read(CHUNKLEN) return an empty string, which means the loop
> > doesn't get executed and you unconditionally relink stuff, possibly
> > losing data?
>
> I think you are right. But somehow it didn't seem to cause any obvious
> error. Hm, I will look into it. Thanks for the input.

Looks like file.seek() allows seeking beyong EOF, though in any case
my last patch was totally broken. So, how about the patch below?

BTW, I wonder what the odds are for two corresponding *.[id] files to
have the same size, but contain different data, with CHUNKLEN of 64K.
A smaller value of CHUNKLEN would improve the performance, obviously.

--- a/contrib/hg-relink Mon Mar 19 09:36:06 2007 -0700
+++ b/contrib/hg-relink Mon Mar 19 16:08:39 2007 -0500
@@ -58,7 +58,7 @@ def prune(candidates, dst):
             raise Exception('Source and destination are on different devices')
         if st.st_size != ts.st_size:
             continue
-        targets.append((fn, ts.st_size))
+        targets.append((fn, st, ts))

     return targets

@@ -67,11 +67,13 @@ def relink(src, dst, files):
     relinked = 0
     savedbytes = 0

-    for f, sz in files:
+    for f, st, ts in files:
         source = os.path.join(src, f)
         tgt = os.path.join(dst, f)
         sfp = file(source)
         dfp = file(tgt)
+        sfp.seek(-min(st.st_size, CHUNKLEN), 2)
+        dfp.seek(-min(ts.st_size, CHUNKLEN), 2)
         sin = sfp.read(CHUNKLEN)
         while sin:
             din = dfp.read(CHUNKLEN)
@@ -89,7 +91,7 @@ def relink(src, dst, files):
                 raise
             print 'Relinked %s' % f
             relinked += 1
-            savedbytes += sz
+            savedbytes += ts.st_size
             os.remove(tgt + '.bak')
         except OSError, inst:
             print '%s: %s' % (tgt, str(inst))


More information about the Mercurial mailing list