D3212: patch: implement a new worddiff algorithm
quark (Jun Wu)
phabricator at mercurial-scm.org
Mon Apr 16 19:12:22 EDT 2018
This revision was automatically updated to reflect the committed changes.
Closed by commit rHG35632d392279: patch: implement a new worddiff algorithm (authored by quark, committed by ).
REPOSITORY
rHG Mercurial
CHANGES SINCE LAST UPDATE
https://phab.mercurial-scm.org/D3212?vs=7924&id=8335
REVISION DETAIL
https://phab.mercurial-scm.org/D3212
AFFECTED FILES
mercurial/color.py
mercurial/patch.py
tests/test-diff-color.t
CHANGE DETAILS
diff --git a/tests/test-diff-color.t b/tests/test-diff-color.t
--- a/tests/test-diff-color.t
+++ b/tests/test-diff-color.t
@@ -337,41 +337,39 @@
[diff.deleted|-(to see if it works)]
[diff.inserted|+three of those lines have]
[diff.inserted|+collapsed onto one]
-#if false
$ hg diff --config experimental.worddiff=True --color=debug
[diff.diffline|diff --git a/file1 b/file1]
[diff.file_a|--- a/file1]
[diff.file_b|+++ b/file1]
[diff.hunk|@@ -1,16 +1,17 @@]
- [diff.deleted|-this is the ][diff.deleted.highlight|first][diff.deleted| line]
- [diff.deleted|-this is the second line]
- [diff.deleted|-][diff.deleted.highlight| ][diff.deleted|third line starts with space]
- [diff.deleted|-][diff.deleted.highlight|+][diff.deleted| starts with a ][diff.deleted.highlight|plus][diff.deleted| sign]
- [diff.deleted|-][diff.tab| ][diff.deleted|this one with ][diff.deleted.highlight|one][diff.deleted| tab]
- [diff.deleted|-][diff.tab| ][diff.deleted|now with full ][diff.deleted.highlight|two][diff.deleted| tabs]
- [diff.deleted|-][diff.tab| ][diff.deleted|now tabs][diff.tab| ][diff.deleted|everywhere, much fun]
- [diff.inserted|+that is the first paragraph]
- [diff.inserted|+][diff.inserted.highlight| ][diff.inserted|this is the ][diff.inserted.highlight|second][diff.inserted| line]
- [diff.inserted|+third line starts with space]
- [diff.inserted|+][diff.inserted.highlight|-][diff.inserted| starts with a ][diff.inserted.highlight|minus][diff.inserted| sign]
- [diff.inserted|+][diff.tab| ][diff.inserted|this one with ][diff.inserted.highlight|two][diff.inserted| tab]
- [diff.inserted|+][diff.tab| ][diff.inserted|now with full ][diff.inserted.highlight|three][diff.inserted| tabs]
- [diff.inserted|+][diff.tab| ][diff.inserted|now][diff.inserted.highlight| there are][diff.inserted| tabs][diff.tab| ][diff.inserted|everywhere, much fun]
+ [diff.deleted|-][diff.deleted.changed|this][diff.deleted.unchanged| is the first ][diff.deleted.changed|line]
+ [diff.deleted|-][diff.deleted.unchanged|this is the second line]
+ [diff.deleted|-][diff.deleted.changed| ][diff.deleted.unchanged|third line starts with space]
+ [diff.deleted|-][diff.deleted.changed|+][diff.deleted.unchanged| starts with a ][diff.deleted.changed|plus][diff.deleted.unchanged| sign]
+ [diff.deleted|-][diff.tab| ][diff.deleted.unchanged|this one with ][diff.deleted.changed|one][diff.deleted.unchanged| tab]
+ [diff.deleted|-][diff.tab| ][diff.deleted.unchanged|now with full ][diff.deleted.changed|two][diff.deleted.unchanged| tabs]
+ [diff.deleted|-][diff.tab| ][diff.deleted.unchanged|now ][diff.deleted.unchanged|tabs][diff.tab| ][diff.deleted.unchanged|everywhere, much fun]
+ [diff.inserted|+][diff.inserted.changed|that][diff.inserted.unchanged| is the first ][diff.inserted.changed|paragraph]
+ [diff.inserted|+][diff.inserted.changed| ][diff.inserted.unchanged|this is the second line]
+ [diff.inserted|+][diff.inserted.unchanged|third line starts with space]
+ [diff.inserted|+][diff.inserted.changed|-][diff.inserted.unchanged| starts with a ][diff.inserted.changed|minus][diff.inserted.unchanged| sign]
+ [diff.inserted|+][diff.tab| ][diff.inserted.unchanged|this one with ][diff.inserted.changed|two][diff.inserted.unchanged| tab]
+ [diff.inserted|+][diff.tab| ][diff.inserted.unchanged|now with full ][diff.inserted.changed|three][diff.inserted.unchanged| tabs]
+ [diff.inserted|+][diff.tab| ][diff.inserted.unchanged|now ][diff.inserted.changed|there are ][diff.inserted.unchanged|tabs][diff.tab| ][diff.inserted.unchanged|everywhere, much fun]
this line won't change
two lines are going to
- [diff.deleted|-be changed into ][diff.deleted.highlight|three][diff.deleted|!]
- [diff.inserted|+(entirely magically,]
- [diff.inserted|+ assuming this works)]
- [diff.inserted|+be changed into ][diff.inserted.highlight|four][diff.inserted|!]
+ [diff.deleted|-][diff.deleted.unchanged|be changed into ][diff.deleted.changed|three][diff.deleted.unchanged|!]
+ [diff.inserted|+][diff.inserted.changed|(entirely magically,]
+ [diff.inserted|+][diff.inserted.changed| assuming this works)]
+ [diff.inserted|+][diff.inserted.unchanged|be changed into ][diff.inserted.changed|four][diff.inserted.unchanged|!]
- [diff.deleted|-three of those lines ][diff.deleted.highlight|will]
- [diff.deleted|-][diff.deleted.highlight|collapse][diff.deleted| onto one]
- [diff.deleted|-(to see if it works)]
- [diff.inserted|+three of those lines ][diff.inserted.highlight|have]
- [diff.inserted|+][diff.inserted.highlight|collapsed][diff.inserted| onto one]
-#endif
+ [diff.deleted|-][diff.deleted.unchanged|three of those lines ][diff.deleted.changed|will]
+ [diff.deleted|-][diff.deleted.changed|collapse][diff.deleted.unchanged| onto one]
+ [diff.deleted|-][diff.deleted.changed|(to see if it works)]
+ [diff.inserted|+][diff.inserted.unchanged|three of those lines ][diff.inserted.changed|have]
+ [diff.inserted|+][diff.inserted.changed|collapsed][diff.inserted.unchanged| onto one]
multibyte character shouldn't be broken up in word diff:
@@ -386,12 +384,10 @@
> EOF
$ hg ci -m 'slightly change utf8 char' utf8
-#if false
$ hg diff --config experimental.worddiff=True --color=debug -c.
[diff.diffline|diff --git a/utf8 b/utf8]
[diff.file_a|--- a/utf8]
[diff.file_b|+++ b/utf8]
[diff.hunk|@@ -1,1 +1,1 @@]
- [diff.deleted|-blah ][diff.deleted.highlight|\xe3\x82\xa2][diff.deleted| blah] (esc)
- [diff.inserted|+blah ][diff.inserted.highlight|\xe3\x82\xa4][diff.inserted| blah] (esc)
-#endif
+ [diff.deleted|-][diff.deleted.unchanged|blah ][diff.deleted.changed|\xe3\x82\xa2][diff.deleted.unchanged| blah] (esc)
+ [diff.inserted|+][diff.inserted.unchanged|blah ][diff.inserted.changed|\xe3\x82\xa4][diff.inserted.unchanged| blah] (esc)
diff --git a/mercurial/patch.py b/mercurial/patch.py
--- a/mercurial/patch.py
+++ b/mercurial/patch.py
@@ -50,7 +50,8 @@
gitre = re.compile(br'diff --git a/(.*) b/(.*)')
tabsplitter = re.compile(br'(\t+|[^\t]+)')
-_nonwordre = re.compile(br'([^a-zA-Z0-9_\x80-\xff])')
+wordsplitter = re.compile(br'(\t+| +|[a-zA-Z0-9_\x80-\xff]+|'
+ '[^ \ta-zA-Z0-9_\x80-\xff])')
PatchError = error.PatchError
@@ -2504,8 +2505,78 @@
if chompline != line:
yield (line[len(chompline):], '')
+def diffsinglehunkinline(hunklines):
+ """yield tokens for a list of lines in a single hunk, with inline colors"""
+ # prepare deleted, and inserted content
+ a = ''
+ b = ''
+ for line in hunklines:
+ if line[0] == '-':
+ a += line[1:]
+ elif line[0] == '+':
+ b += line[1:]
+ else:
+ raise error.ProgrammingError('unexpected hunk line: %s' % line)
+ # fast path: if either side is empty, use diffsinglehunk
+ if not a or not b:
+ for t in diffsinglehunk(hunklines):
+ yield t
+ return
+ # re-split the content into words
+ al = wordsplitter.findall(a)
+ bl = wordsplitter.findall(b)
+ # re-arrange the words to lines since the diff algorithm is line-based
+ aln = [s if s == '\n' else s + '\n' for s in al]
+ bln = [s if s == '\n' else s + '\n' for s in bl]
+ an = ''.join(aln)
+ bn = ''.join(bln)
+ # run the diff algorithm, prepare atokens and btokens
+ atokens = []
+ btokens = []
+ blocks = mdiff.allblocks(an, bn, lines1=aln, lines2=bln)
+ for (a1, a2, b1, b2), btype in blocks:
+ changed = btype == '!'
+ for token in mdiff.splitnewlines(''.join(al[a1:a2])):
+ atokens.append((changed, token))
+ for token in mdiff.splitnewlines(''.join(bl[b1:b2])):
+ btokens.append((changed, token))
+
+ # yield deleted tokens, then inserted ones
+ for prefix, label, tokens in [('-', 'diff.deleted', atokens),
+ ('+', 'diff.inserted', btokens)]:
+ nextisnewline = True
+ for changed, token in tokens:
+ if nextisnewline:
+ yield (prefix, label)
+ nextisnewline = False
+ # special handling line end
+ isendofline = token.endswith('\n')
+ if isendofline:
+ chomp = token[:-1] # chomp
+ token = chomp.rstrip() # detect spaces at the end
+ endspaces = chomp[len(token):]
+ # scan tabs
+ for maybetab in tabsplitter.findall(token):
+ if '\t' == maybetab[0]:
+ currentlabel = 'diff.tab'
+ else:
+ if changed:
+ currentlabel = label + '.changed'
+ else:
+ currentlabel = label + '.unchanged'
+ yield (maybetab, currentlabel)
+ if isendofline:
+ if endspaces:
+ yield (endspaces, 'diff.trailingwhitespace')
+ yield ('\n', '')
+ nextisnewline = True
+
def difflabel(func, *args, **kw):
'''yields 2-tuples of (output, label) based on the output of func()'''
+ if kw.get(r'opts') and kw[r'opts'].worddiff:
+ dodiffhunk = diffsinglehunkinline
+ else:
+ dodiffhunk = diffsinglehunk
headprefixes = [('diff', 'diff.diffline'),
('copy', 'diff.extended'),
('rename', 'diff.extended'),
@@ -2525,7 +2596,7 @@
hunkbuffer = []
def consumehunkbuffer():
if hunkbuffer:
- for token in diffsinglehunk(hunkbuffer):
+ for token in dodiffhunk(hunkbuffer):
yield token
hunkbuffer[:] = []
diff --git a/mercurial/color.py b/mercurial/color.py
--- a/mercurial/color.py
+++ b/mercurial/color.py
@@ -90,14 +90,16 @@
'branches.inactive': 'none',
'diff.changed': 'white',
'diff.deleted': 'red',
- 'diff.deleted.highlight': 'red bold underline',
+ 'diff.deleted.changed': 'red',
+ 'diff.deleted.unchanged': 'red dim',
'diff.diffline': 'bold',
'diff.extended': 'cyan bold',
'diff.file_a': 'red bold',
'diff.file_b': 'green bold',
'diff.hunk': 'magenta',
'diff.inserted': 'green',
- 'diff.inserted.highlight': 'green bold underline',
+ 'diff.inserted.changed': 'green',
+ 'diff.inserted.unchanged': 'green dim',
'diff.tab': '',
'diff.trailingwhitespace': 'bold red_background',
'changeset.public': '',
To: quark, #hg-reviewers, durin42, yuja
Cc: indygreg, dhduvall, yuja, spectral, mercurial-devel
More information about the Mercurial-devel
mailing list