i am trying to fix issue 126. when my fix does not work, i think it is my fault, but diff code is generating wrong output instead. i have attached broken diff. look for hunk at end. line with "-I" is only different with white space. but shows up in diff output as changed, even with -w option to diff.
On Wednesday 15 February 2006 13:36, Vadim Gelfer wrote: > New submission from Vadim Gelfer <vadim.gelfer@gmail.com>: > > i am trying to fix issue 126. when my fix does not work, i think it is my > fault, but diff code is generating wrong output instead. i have attached > broken diff. look for hunk at end. line with "-I" is only different with > white space. but shows up in diff output as changed, even with -w option to > diff. It came as part of a larger hunk. The current ws handling only ignores hunks where nothing but ws have changed. -chris
changed title to match problem.
Did we fix this one?
Not fixed yet (tested with 1db1c7c1eb5e)
I'll look into fixing this soon. Transcript from chat with pmezard: 13:35 < pmezard> djc: i think the lines should be normalized before sending them to bdiff.blocks() 13:37 < djc> pmezard: indeed 13:37 < pmezard> "if wsclean(opts, "".join(old)) == wsclean(opts, "".join(new)):" looks a bit simplistic 13:39 < pmezard> i don't know how expensive it would be to duplicate both line lines. probably not too expensive, especially if done only for ws cases. 13:39 < djc> pmezard: my concern was with dedenting a block of code 13:39 < djc> the diff becomes a bit unreadable if you do that 13:39 < djc> especially if the block is large 13:39 < djc> does this approach fix that problem? 13:40 < pmezard> only dedenting or dedenting + changes 13:41 < djc> well, I'd like to get a diff that ignores all the dedenting and shows the other changes 13:41 < djc> and I figure diff -b should do that 13:41 < djc> but it doesn't 13:41 < pmezard> yes i think this case is currently ignore by the line i quote above 13:41 < pmezard> it only checks for pure whitespaces changes in non-matching blocks 13:42 < djc> right 13:42 < pmezard> problem is that any other changes make the whole block to be emitted 13:43 < djc> exactly 13:43 < pmezard> so, you can create two copies of l1/l2 with every lines normalized by wsclean, pass that to bdiff.blocks, then match again with the original input
A simple test case: $ hg init $ (echo 'doStuff'; echo 'doMoreStuff') > f $ hg ci -A -m 1 adding f $ (echo 'block {'; echo ' doStuff'; echo ' doMoreStuff'; echo '} end') > f $ hg ci -m 2 $ hg cat -r 0 f > f_0 $ diff -uw f_0 f --- f_0 2008-05-20 14:29:16.000000000 -0400 +++ f 2008-05-20 14:28:16.000000000 -0400 @@ -1,2 +1,4 @@ +block { doStuff doMoreStuff +} end $ hg --config diff.ignorews=1 di -r0:1 diff --git a/f b/f --- a/f +++ b/f @@ -1,2 +1,4 @@ -doStuff -doMoreStuff +block { + doStuff + doMoreStuff +} end $
Hi; I think that diff -w should be the same as diff -bB. When I use diff -bB it acts like diff -w (no exhaustive test cases). The problem with the diff code now I believe is that this old version mercurial/mdiff.py, beginning at line 52 (version 0.9.5), old: def wsclean(opts, text): if opts.ignorews: text = re.sub('[ \t]+', '', text) elif opts.ignorewsamount: text = re.sub('[ \t]+', ' ', text) text = re.sub('[ \t]+\n', '\n', text) if opts.ignoreblanklines: text = re.sub('\n+', '', text) return text could be changed to something like: def wsclean(opts, text): if opts.ignorews: text = re.sub('[ \t\n]+', '', text) else: if opts.ignorewsamount: text = re.sub('[ \t]+', ' ', text) text = re.sub('[ \t]+\n', '\n', text) if opts.ignoreblanklines: text = re.sub('\n+', '', text) return text
The definition of whitespace should include at least \r (' \t\r\n' is XML's idea of whitespace). Alternatively you could use the '\s' regexp ('\t\n\v\f\r ', usually the same as string.whitespace) or whatever the locale says. But at least \r.
I agree, \n is sort of limiting to the non-windows world (who uses that anyway?:). So, what about \s? I think though the most important thing is that it behaves similarly to other diffs. At the moment it does not. Most definately, if -w is changed to ignore \s+ then -b should be suitably updated (and if \r is included in -w then -B should also be updated). My key point is that -bB should be identical to -w. I hope it's clear and is everyone in agreement with this? Can we move forward to fixing this issue?
Shouldn't the behaviour match that of the "normal" diff command? diff (GNU diffutils) 2.8.1 considers with -w: "foo bar\n" and "foo bar\n" to be equal, but "foo bar\n" and "foo\nbar\n" not, so XML's idea of whitespace doesn't help here. "foo bar\n" and "foo\rbar\r" to be equal, so \r seems to be viewed as whitespace.
There are of course also the -b and -B options for newline stuff.
Well, you're absolutely correct on that one. I was positive that diff -w ignored newline differences but some simple tests show me that I was way off on that one. Maybe I was thinking of -wB, but even that will say that 'abc\ndef' differs from 'ab\ncd\nef'. So please ignore my thought that -bB should act like -w.
Here's another demonstration of a poor diff caused by this bug, which standalone "diff -ub" handles much better, giving a diff which is much easier for the user to understand... $ echo -e "X\n A\n B\n D\n E" > 1 $ echo -e "X\n A\n B\n C\n D\n E" > 2 $ diff -ub 1 2 --- 1 2008-09-03 18:24:35.000000000 +0100 +++ 2 2008-09-03 18:24:37.000000000 +0100 @@ -1,5 +1,6 @@ X A B + C D E $ hg init $ hg add 1 2 $ hg commit -m 'Check in' $ mv 2 1 $ hg diff 1 diff -r 34bb3cd02d65 1 --- a/1 Wed Sep 03 18:24:47 2008 +0100 +++ b/1 Wed Sep 03 18:25:01 2008 +0100 @@ -1,5 +1,6 @@ X - A - B - D - E + A + B + C + D + E
(I meant "hg diff -b 1" at the end there, of course, which gives the same poor diff)
Over 3 years and this still isn't fixed? Doesn't anyone use the diff command? If whatever code hg is using to actually perform the diff is too difficult to fix, would it at least be possible to add a --diff-cmd option like subversion has, so I could at least specify a working diff program?
See the extdiff extension.
Oh, that looks useful. Thanks! http://mercurial.selenic.com/wiki/ExtdiffExtension
Preliminary patch attached. I have not checked performance yet, at least it seems to work. Relevant part is: diff --git a/mercurial/mdiff.py b/mercurial/mdiff.py --- a/mercurial/mdiff.py +++ b/mercurial/mdiff.py @@ -57,13 +57,13 @@ defaultopts = diffopts() -def wsclean(opts, text): +def wsclean(opts, text, blank=True): if opts.ignorews: text = re.sub('[ \t]+', '', text) elif opts.ignorewsamount: text = re.sub('[ \t]+', ' ', text) text = re.sub('[ \t]+\n', '\n', text) - if opts.ignoreblanklines: + if blank and opts.ignoreblanklines: text = re.sub('\n+', '', text) return text @@ -183,6 +183,10 @@ # below finds the spaces between those matching sequences and translates # them into diff output. # + if opts.ignorews or opts.ignorewsamount: + t1 = wsclean(opts, t1, False) + t2 = wsclean(opts, t2, False) + diff = bdiff.blocks(t1, t2) hunk = None for i, s1 in enumerate(diff): @@ -208,7 +212,7 @@ if not old and not new: continue - if opts.ignorews or opts.ignorewsamount or opts.ignoreblanklines: + if opts.ignoreblanklines: if wsclean(opts, "".join(old)) == wsclean(opts, "".join(new)): continue
extdiff seems to work well for replacing the diff command. Does not help for 'hg out -p', though this currently seems to lack any diff options as well. Would really like to see 'hg out -pw' work.
Here is some profiling with: $ cd hg-crew-stable $ hg revert -r 4000 -a $ hg perfdiffwd * Without the patch: ! diffopts: none ! wall 4.933914 comb 6.950000 user 5.083333 sys 1.866667 (best of 3) ! diffopts: -w ! wall 5.509815 comb 7.900000 user 6.016667 sys 1.883333 (best of 3) ! diffopts: -b ! wall 5.807460 comb 8.400000 user 6.500000 sys 1.900000 (best of 3) ! diffopts: -B ! wall 5.360419 comb 7.650000 user 5.766667 sys 1.883333 (best of 3) ! diffopts: -wB ! wall 5.803107 comb 8.400000 user 6.500000 sys 1.900000 (best of 3) * With the patch ! diffopts: none ! wall 4.934670 comb 6.950000 user 5.083333 sys 1.866667 (best of 3) ! diffopts: -w ! wall 5.792024 comb 8.366667 user 6.433333 sys 1.933333 (best of 3) ! diffopts: -b ! wall 6.264275 comb 9.133333 user 7.183333 sys 1.950000 (best of 3) ! diffopts: -B ! wall 5.364071 comb 7.666667 user 5.783333 sys 1.883333 (best of 3) ! diffopts: -wB ! wall 6.648478 comb 9.783333 user 7.833333 sys 1.950000 (best of 3) * Summary none: +0% -w : +5% (slower) -b : +8% -B : +0% -wB : +14% I think the new behaviour is worth the slowdown. If nobody objects and the tests pass, I will push this.
Numbers look good to me.
In crew as 4fe9ca519637 @jglick: I have split your outgoing remark into http://mercurial.selenic.com/bts/issue1906
--- Bug imported by bugzilla@serpentine.com 2012-05-12 08:29 EDT --- This bug was previously known as _bug_ 127 at http://mercurial.selenic.com/bts/issue127 Imported an attachment (id=513) Imported an attachment (id=514)
Can the following document section be removed? https://developer.mozilla.org/en-US/docs/Mercurial/Using_Mercurial#How_do_I_get_a_diff_-w
(In reply to comment #24) specifically: "There is a known issue where hg diff -w doesn't work."
(In reply to comment #25) ignore this, i just realized the document is outdated.. and the new document is here: http://mozilla-version-control-tools.readthedocs.org/en/latest/hgmozilla/index.html