[Bug 4233] New: hg diff on a single file takes very long for certain file contents

mercurial-bugs at selenic.com mercurial-bugs at selenic.com
Tue Apr 29 17:15:34 CDT 2014


http://bz.selenic.com/show_bug.cgi?id=4233

          Priority: normal
            Bug ID: 4233
                CC: mercurial-devel at selenic.com
          Assignee: bugzilla at selenic.com
           Summary: hg diff on a single file takes very long for certain
                    file contents
          Severity: bug
    Classification: Unclassified
                OS: Linux
          Reporter: malte.helmert at unibas.ch
          Hardware: PC
            Status: UNCONFIRMED
           Version: 2.9.2
         Component: Mercurial
           Product: Mercurial

Created attachment 1764
  --> http://bz.selenic.com/attachment.cgi?id=1764&action=edit
Test input file #1 (must be uncompressed before use); see description.

For certain file contents, the runtime of hg diff scales badly.

I stumbled on this when attempting to "hg convert" an existing SVN repository,
which got stuck on a certain changeset. But it's easy to reproduce this without
hg convert by just adding a single file to a new repository, making changes,
and running "hg diff". The file in question has 35 MB, but compresses to around
250 KB, so I hope it's OK to add it as an attachment to this issue.

To reproduce, download before.bz2 and after.bz2 and set up the repo as follows:

hg init testrepo
cp before.bz2 after.bz2 testrepo/
cd testrepo
bunzip2 before.bz2
bunzip2 after.bz2
mv before file.txt
hg add file.txt
hg commit -m "added file.txt"
mv after file.txt

We are now in a state where "file.txt" has local modifications and "hg diff" or
"hg commit" would take several days to complete. (I didn't run them to
completion). I could only kill "hg diff" with "kill -9"; there was no reaction
to plain "kill" or Ctrl-C. The file is large, but not massively large: GNU diff
("diff -u before after") takes 0.9 seconds on my machine.

To reproduce the effect in cases where hg diff/hg commit can still run to
completion, I looked at smaller versions of the same files, obtained e.g. with

head -c 10M before > before.10M
head -c 10M after > after.10M

and then using these instead of "before" and "after" above. Here are the
runtimes for different sizes on my machine, also mentioning the size of the
diff:

SIZE=1M: diff 0.29s (22895 lines), commit 0.20s
SIZE=2M: diff 0.81s (47439 lines), commit 0.61s
SIZE=3M: diff 1.76s (72830 lines), commit 1.47s
SIZE=4M: diff 7.43s (97965 lines), commit 6.99s
SIZE=5M: diff 4.43s (122899 lines), commit 3.91s
SIZE=6M: diff 88.36s (147787 lines), commit 90.56s
SIZE=7M: diff 10.51s (172759 lines), commit 9.74s
SIZE=8M: diff 202.15s (198395 lines), commit 202.04s
SIZE=9M: diff 25.33s (223536 lines), commit 24.55s
SIZE=10M: diff 162.89s (248567 lines), commit 159.23s
SIZE=12M: diff 132.42s (299271 lines), commit 130.41s
SIZE=14M: diff 53.40s (350046 lines), commit 51.73s
SIZE=16M: diff 2822.02s (400085 lines), commit 2802.49s
SIZE=18M: diff 16687.10s (450685 lines), commit 16856.23s
SIZE=20M: diff 20673.23s (501958 lines), commit 20639.25s
SIZE=22M: diff 25507.32s (552989 lines), commit 25758.22s
SIZE=24M: diff 30875.81s (603709 lines), commit 31276.82s

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Mercurial-devel mailing list