Bug 127 - hg diff -w does not ignore white space in hunks containing other changes
Summary: hg diff -w does not ignore white space in hunks containing other changes
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: Mercurial (show other bugs)
Version: unspecified
Hardware: All All
: normal bug
Assignee: Bugzilla
URL: There is a known issue where hg diff ...
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-15 12:36 UTC by Vadim Gelfer
Modified: 2015-08-13 04:04 UTC (History)
14 users (show)

See Also:
Python Version: ---


Attachments
(32 bytes, text/x-patch)
2006-02-15 12:36 UTC, Vadim Gelfer
Details
(33 bytes, text/x-diff)
2009-10-12 14:30 UTC, Patrick Mézard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Vadim Gelfer 2006-02-15 12:36 UTC
i am trying to fix issue 126. when my fix does not work, i think it is my fault,
but diff code is generating wrong output instead. i have attached broken diff.
look for hunk at end. line with "-I" is only different with white space. but
shows up in diff output as changed, even with -w option to diff.
Comment 1 Chris Mason 2006-02-15 13:39 UTC
On Wednesday 15 February 2006 13:36, Vadim Gelfer wrote:
> New submission from Vadim Gelfer <vadim.gelfer@gmail.com>:
>
> i am trying to fix issue 126. when my fix does not work, i think it is my
> fault, but diff code is generating wrong output instead. i have attached
> broken diff. look for hunk at end. line with "-I" is only different with
> white space. but shows up in diff output as changed, even with -w option to
> diff.

It came as part of a larger hunk.  The current ws handling only ignores hunks 
where nothing but ws have changed.

-chris
Comment 2 Thomas Arendsen Hein 2006-02-21 07:49 UTC
changed title to match problem.
Comment 3 Matt Mackall 2007-12-07 15:48 UTC
Did we fix this one?
Comment 4 Thomas Arendsen Hein 2007-12-08 03:58 UTC
Not fixed yet (tested with 1db1c7c1eb5e)
Comment 5 Dirkjan Ochtman 2007-12-13 06:48 UTC
I'll look into fixing this soon. Transcript from chat with pmezard:

13:35 < pmezard> djc: i think the lines should be normalized before sending
                 them to bdiff.blocks()
13:37 < djc> pmezard: indeed
13:37 < pmezard> "if wsclean(opts, "".join(old)) == wsclean(opts,
                 "".join(new)):"  looks a bit simplistic
13:39 < pmezard> i don't know how expensive it would be to duplicate both line
                 lines. probably not too expensive, especially if done only for
                 ws cases.
13:39 < djc> pmezard: my concern was with dedenting a block of code
13:39 < djc> the diff becomes a bit unreadable if you do that
13:39 < djc> especially if the block is large
13:39 < djc> does this approach fix that problem?
13:40 < pmezard> only dedenting or dedenting + changes
13:41 < djc> well, I'd like to get a diff that ignores all the dedenting and
             shows the other changes
13:41 < djc> and I figure diff -b should do that
13:41 < djc> but it doesn't
13:41 < pmezard> yes i think this case is currently ignore by the line i quote
                 above
13:41 < pmezard> it only checks for pure whitespaces changes in non-matching
                 blocks
13:42 < djc> right
13:42 < pmezard> problem is that any other changes make the whole block to be
                 emitted
13:43 < djc> exactly
13:43 < pmezard> so, you can create two copies of l1/l2 with every lines
                 normalized by wsclean, pass that to bdiff.blocks, then match
                 again with the original input
Comment 6 Jesse Glick 2008-05-20 12:32 UTC
A simple test case:

$ hg init
$ (echo 'doStuff'; echo 'doMoreStuff') > f
$ hg ci -A -m 1
adding f
$ (echo 'block {'; echo '  doStuff'; echo '  doMoreStuff'; echo '} end') > f
$ hg ci -m 2
$ hg cat -r 0 f > f_0
$ diff -uw f_0 f
--- f_0	2008-05-20 14:29:16.000000000 -0400
+++ f	2008-05-20 14:28:16.000000000 -0400
@@ -1,2 +1,4 @@
+block {
 doStuff
 doMoreStuff
+} end
$ hg --config diff.ignorews=1 di -r0:1
diff --git a/f b/f
--- a/f
+++ b/f
@@ -1,2 +1,4 @@
-doStuff
-doMoreStuff
+block {
+  doStuff
+  doMoreStuff
+} end
$
Comment 7 Andy 2008-07-31 09:44 UTC
Hi;
I think that diff -w should be the same as diff -bB.  When I use diff -bB it
acts like diff -w (no exhaustive test cases).  The problem with the diff code
now I believe is that this old version
mercurial/mdiff.py, beginning at line 52 (version 0.9.5), old:

def wsclean(opts, text):
    if opts.ignorews:
        text = re.sub('[ \t]+', '', text)
    elif opts.ignorewsamount:
        text = re.sub('[ \t]+', ' ', text)
        text = re.sub('[ \t]+\n', '\n', text)
    if opts.ignoreblanklines:
        text = re.sub('\n+', '', text)
    return text

could be changed to something like:

def wsclean(opts, text):
    if opts.ignorews:
        text = re.sub('[ \t\n]+', '', text)
    else:
        if opts.ignorewsamount:
            text = re.sub('[ \t]+', ' ', text)
            text = re.sub('[ \t]+\n', '\n', text)
        if opts.ignoreblanklines:
            text = re.sub('\n+', '', text)
    return text
Comment 8 Tom Lynn 2008-08-14 07:33 UTC
The definition of whitespace should include at least \r (' \t\r\n' is XML's idea
of whitespace). Alternatively you could use the '\s' regexp ('\t\n\v\f\r ',
usually the same as string.whitespace) or whatever the locale says. But at least \r.
Comment 9 Andy 2008-08-14 08:38 UTC
I agree, \n is sort of limiting to the non-windows world (who uses that anyway?:).
So, what about \s?  I think though the most important thing is that it behaves
similarly to other diffs.  At the moment it does not.  Most definately, if -w is
changed to ignore \s+ then -b should be suitably updated (and if \r is included
in -w then -B should also be updated).

My key point is that -bB should be identical to -w.  I hope it's clear and is
everyone in agreement with this?  Can we move forward to fixing this issue?
Comment 10 Thomas Arendsen Hein 2008-08-14 09:02 UTC
Shouldn't the behaviour match that of the "normal" diff command?

diff (GNU diffutils) 2.8.1 considers with -w:

"foo bar\n" and "foo  bar\n" to be equal, but "foo bar\n" and "foo\nbar\n" not,
so XML's idea of whitespace doesn't help here.

"foo bar\n" and "foo\rbar\r" to be equal, so \r seems to be viewed as whitespace.
Comment 11 Dirkjan Ochtman 2008-08-14 09:04 UTC
There are of course also the -b and -B options for newline stuff.
Comment 12 Andy 2008-08-14 09:49 UTC
Well, you're absolutely correct on that one.  I was positive that diff -w
ignored newline differences but some simple tests show me that I was way off on
that one.  Maybe I was thinking of -wB, but even that will say that 'abc\ndef'
differs from 'ab\ncd\nef'.  So please ignore my thought that -bB should act like -w.
Comment 13 Richard Davies 2008-09-03 11:33 UTC
Here's another demonstration of a poor diff caused by this bug, which standalone
"diff -ub" handles much better, giving a diff which is much easier for the user
to understand...

$ echo -e "X\n  A\n  B\n  D\n  E" > 1
$ echo -e "X\n A\n B\n C\n D\n E" > 2
$ diff -ub 1 2
--- 1	2008-09-03 18:24:35.000000000 +0100
+++ 2	2008-09-03 18:24:37.000000000 +0100
@@ -1,5 +1,6 @@
 X
   A
   B
+ C
   D
   E
$ hg init
$ hg add 1 2
$ hg commit -m 'Check in'
$ mv 2 1
$ hg diff 1
diff -r 34bb3cd02d65 1
--- a/1	Wed Sep 03 18:24:47 2008 +0100
+++ b/1	Wed Sep 03 18:25:01 2008 +0100
@@ -1,5 +1,6 @@
 X
-  A
-  B
-  D
-  E
+ A
+ B
+ C
+ D
+ E
Comment 14 Richard Davies 2008-09-05 03:13 UTC
(I meant "hg diff -b 1" at the end there, of course, which gives the same poor diff)
Comment 15 Eric Haszlakiewicz 2009-10-05 14:40 UTC
Over 3 years and this still isn't fixed?  Doesn't anyone use the diff command?

If whatever code hg is using to actually perform the diff is too difficult
to fix, would it at least be possible to add a --diff-cmd option like
subversion has, so I could at least specify a working diff program?
Comment 16 Matt Mackall 2009-10-05 16:25 UTC
See the extdiff extension.
Comment 17 Eric Haszlakiewicz 2009-10-05 16:30 UTC
Oh, that looks useful.  Thanks!
http://mercurial.selenic.com/wiki/ExtdiffExtension
Comment 18 Patrick Mézard 2009-10-12 14:30 UTC
Preliminary patch attached. I have not checked performance yet, at least it
seems to work. Relevant part is:

diff --git a/mercurial/mdiff.py b/mercurial/mdiff.py
--- a/mercurial/mdiff.py
+++ b/mercurial/mdiff.py
@@ -57,13 +57,13 @@
 
 defaultopts = diffopts()
 
-def wsclean(opts, text):
+def wsclean(opts, text, blank=True):
     if opts.ignorews:
         text = re.sub('[ \t]+', '', text)
     elif opts.ignorewsamount:
         text = re.sub('[ \t]+', ' ', text)
         text = re.sub('[ \t]+\n', '\n', text)
-    if opts.ignoreblanklines:
+    if blank and opts.ignoreblanklines:
         text = re.sub('\n+', '', text)
     return text
 
@@ -183,6 +183,10 @@
     # below finds the spaces between those matching sequences and translates
     # them into diff output.
     #
+    if opts.ignorews or opts.ignorewsamount:
+        t1 = wsclean(opts, t1, False)
+        t2 = wsclean(opts, t2, False)
+
     diff = bdiff.blocks(t1, t2)
     hunk = None
     for i, s1 in enumerate(diff):
@@ -208,7 +212,7 @@
         if not old and not new:
             continue
 
-        if opts.ignorews or opts.ignorewsamount or opts.ignoreblanklines:
+        if opts.ignoreblanklines:
             if wsclean(opts, "".join(old)) == wsclean(opts, "".join(new)):
                 continue
Comment 19 Jesse Glick 2009-10-13 15:12 UTC
extdiff seems to work well for replacing the diff command. Does not help for
'hg out -p', though this currently seems to lack any diff options as well.
Would really like to see 'hg out -pw' work.
Comment 20 Patrick Mézard 2009-11-11 10:11 UTC
Here is some profiling with:

$ cd hg-crew-stable
$ hg revert -r 4000 -a
$ hg perfdiffwd

* Without the patch:

! diffopts: none
! wall 4.933914 comb 6.950000 user 5.083333 sys 1.866667 (best of 3)
! diffopts: -w
! wall 5.509815 comb 7.900000 user 6.016667 sys 1.883333 (best of 3)
! diffopts: -b
! wall 5.807460 comb 8.400000 user 6.500000 sys 1.900000 (best of 3)
! diffopts: -B
! wall 5.360419 comb 7.650000 user 5.766667 sys 1.883333 (best of 3)
! diffopts: -wB
! wall 5.803107 comb 8.400000 user 6.500000 sys 1.900000 (best of 3)


* With the patch

! diffopts: none
! wall 4.934670 comb 6.950000 user 5.083333 sys 1.866667 (best of 3)
! diffopts: -w
! wall 5.792024 comb 8.366667 user 6.433333 sys 1.933333 (best of 3)
! diffopts: -b
! wall 6.264275 comb 9.133333 user 7.183333 sys 1.950000 (best of 3)
! diffopts: -B
! wall 5.364071 comb 7.666667 user 5.783333 sys 1.883333 (best of 3)
! diffopts: -wB
! wall 6.648478 comb 9.783333 user 7.833333 sys 1.950000 (best of 3)


* Summary

none: +0%
-w  : +5% (slower)
-b  : +8%
-B  : +0%
-wB : +14%

I think the new behaviour is worth the slowdown. If nobody objects and the
tests pass, I will push this.
Comment 21 Matt Mackall 2009-11-11 10:16 UTC
Numbers look good to me.
Comment 22 Patrick Mézard 2009-11-11 11:48 UTC
In crew as 4fe9ca519637

@jglick: I have split your outgoing remark into
http://mercurial.selenic.com/bts/issue1906
Comment 23 Bugzilla 2012-05-12 08:29 UTC

--- Bug imported by bugzilla@serpentine.com 2012-05-12 08:29 EDT  ---

This bug was previously known as _bug_ 127 at http://mercurial.selenic.com/bts/issue127
Imported an attachment (id=513)
Imported an attachment (id=514)
Comment 24 Jared Null 2015-08-13 04:00 UTC
Can the following document section be removed?  

https://developer.mozilla.org/en-US/docs/Mercurial/Using_Mercurial#How_do_I_get_a_diff_-w
Comment 25 Jared Null 2015-08-13 04:01 UTC
(In reply to comment #24)
specifically: "There is a known issue where hg diff -w doesn't work."
Comment 26 Jared Null 2015-08-13 04:04 UTC
(In reply to comment #25)
ignore this, i just realized the document is outdated.. and the new document is here:  http://mozilla-version-control-tools.readthedocs.org/en/latest/hgmozilla/index.html