D3212: patch: implement a new worddiff algorithm

quark (Jun Wu) phabricator at mercurial-scm.org
Mon Apr 16 19:12:22 EDT 2018


This revision was automatically updated to reflect the committed changes.
Closed by commit rHG35632d392279: patch: implement a new worddiff algorithm (authored by quark, committed by ).

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D3212?vs=7924&id=8335

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

AFFECTED FILES
  mercurial/color.py
  mercurial/patch.py
  tests/test-diff-color.t

CHANGE DETAILS

diff --git a/tests/test-diff-color.t b/tests/test-diff-color.t
--- a/tests/test-diff-color.t
+++ b/tests/test-diff-color.t
@@ -337,41 +337,39 @@
   [diff.deleted|-(to see if it works)]
   [diff.inserted|+three of those lines have]
   [diff.inserted|+collapsed onto one]
-#if false
   $ hg diff --config experimental.worddiff=True --color=debug
   [diff.diffline|diff --git a/file1 b/file1]
   [diff.file_a|--- a/file1]
   [diff.file_b|+++ b/file1]
   [diff.hunk|@@ -1,16 +1,17 @@]
-  [diff.deleted|-this is the ][diff.deleted.highlight|first][diff.deleted| line]
-  [diff.deleted|-this is the second line]
-  [diff.deleted|-][diff.deleted.highlight|    ][diff.deleted|third line starts with space]
-  [diff.deleted|-][diff.deleted.highlight|+][diff.deleted| starts with a ][diff.deleted.highlight|plus][diff.deleted| sign]
-  [diff.deleted|-][diff.tab|	][diff.deleted|this one with ][diff.deleted.highlight|one][diff.deleted| tab]
-  [diff.deleted|-][diff.tab|		][diff.deleted|now with full ][diff.deleted.highlight|two][diff.deleted| tabs]
-  [diff.deleted|-][diff.tab|	][diff.deleted|now tabs][diff.tab|		][diff.deleted|everywhere, much fun]
-  [diff.inserted|+that is the first paragraph]
-  [diff.inserted|+][diff.inserted.highlight|    ][diff.inserted|this is the ][diff.inserted.highlight|second][diff.inserted| line]
-  [diff.inserted|+third line starts with space]
-  [diff.inserted|+][diff.inserted.highlight|-][diff.inserted| starts with a ][diff.inserted.highlight|minus][diff.inserted| sign]
-  [diff.inserted|+][diff.tab|	][diff.inserted|this one with ][diff.inserted.highlight|two][diff.inserted| tab]
-  [diff.inserted|+][diff.tab|			][diff.inserted|now with full ][diff.inserted.highlight|three][diff.inserted| tabs]
-  [diff.inserted|+][diff.tab|	][diff.inserted|now][diff.inserted.highlight| there are][diff.inserted| tabs][diff.tab|		][diff.inserted|everywhere, much fun]
+  [diff.deleted|-][diff.deleted.changed|this][diff.deleted.unchanged| is the first ][diff.deleted.changed|line]
+  [diff.deleted|-][diff.deleted.unchanged|this is the second line]
+  [diff.deleted|-][diff.deleted.changed|    ][diff.deleted.unchanged|third line starts with space]
+  [diff.deleted|-][diff.deleted.changed|+][diff.deleted.unchanged| starts with a ][diff.deleted.changed|plus][diff.deleted.unchanged| sign]
+  [diff.deleted|-][diff.tab|	][diff.deleted.unchanged|this one with ][diff.deleted.changed|one][diff.deleted.unchanged| tab]
+  [diff.deleted|-][diff.tab|		][diff.deleted.unchanged|now with full ][diff.deleted.changed|two][diff.deleted.unchanged| tabs]
+  [diff.deleted|-][diff.tab|	][diff.deleted.unchanged|now ][diff.deleted.unchanged|tabs][diff.tab|		][diff.deleted.unchanged|everywhere, much fun]
+  [diff.inserted|+][diff.inserted.changed|that][diff.inserted.unchanged| is the first ][diff.inserted.changed|paragraph]
+  [diff.inserted|+][diff.inserted.changed|    ][diff.inserted.unchanged|this is the second line]
+  [diff.inserted|+][diff.inserted.unchanged|third line starts with space]
+  [diff.inserted|+][diff.inserted.changed|-][diff.inserted.unchanged| starts with a ][diff.inserted.changed|minus][diff.inserted.unchanged| sign]
+  [diff.inserted|+][diff.tab|	][diff.inserted.unchanged|this one with ][diff.inserted.changed|two][diff.inserted.unchanged| tab]
+  [diff.inserted|+][diff.tab|			][diff.inserted.unchanged|now with full ][diff.inserted.changed|three][diff.inserted.unchanged| tabs]
+  [diff.inserted|+][diff.tab|	][diff.inserted.unchanged|now ][diff.inserted.changed|there are ][diff.inserted.unchanged|tabs][diff.tab|		][diff.inserted.unchanged|everywhere, much fun]
    
    this line won't change
    
    two lines are going to
-  [diff.deleted|-be changed into ][diff.deleted.highlight|three][diff.deleted|!]
-  [diff.inserted|+(entirely magically,]
-  [diff.inserted|+ assuming this works)]
-  [diff.inserted|+be changed into ][diff.inserted.highlight|four][diff.inserted|!]
+  [diff.deleted|-][diff.deleted.unchanged|be changed into ][diff.deleted.changed|three][diff.deleted.unchanged|!]
+  [diff.inserted|+][diff.inserted.changed|(entirely magically,]
+  [diff.inserted|+][diff.inserted.changed| assuming this works)]
+  [diff.inserted|+][diff.inserted.unchanged|be changed into ][diff.inserted.changed|four][diff.inserted.unchanged|!]
    
-  [diff.deleted|-three of those lines ][diff.deleted.highlight|will]
-  [diff.deleted|-][diff.deleted.highlight|collapse][diff.deleted| onto one]
-  [diff.deleted|-(to see if it works)]
-  [diff.inserted|+three of those lines ][diff.inserted.highlight|have]
-  [diff.inserted|+][diff.inserted.highlight|collapsed][diff.inserted| onto one]
-#endif
+  [diff.deleted|-][diff.deleted.unchanged|three of those lines ][diff.deleted.changed|will]
+  [diff.deleted|-][diff.deleted.changed|collapse][diff.deleted.unchanged| onto one]
+  [diff.deleted|-][diff.deleted.changed|(to see if it works)]
+  [diff.inserted|+][diff.inserted.unchanged|three of those lines ][diff.inserted.changed|have]
+  [diff.inserted|+][diff.inserted.changed|collapsed][diff.inserted.unchanged| onto one]
 
 multibyte character shouldn't be broken up in word diff:
 
@@ -386,12 +384,10 @@
   > EOF
   $ hg ci -m 'slightly change utf8 char' utf8
 
-#if false
   $ hg diff --config experimental.worddiff=True --color=debug -c.
   [diff.diffline|diff --git a/utf8 b/utf8]
   [diff.file_a|--- a/utf8]
   [diff.file_b|+++ b/utf8]
   [diff.hunk|@@ -1,1 +1,1 @@]
-  [diff.deleted|-blah ][diff.deleted.highlight|\xe3\x82\xa2][diff.deleted| blah] (esc)
-  [diff.inserted|+blah ][diff.inserted.highlight|\xe3\x82\xa4][diff.inserted| blah] (esc)
-#endif
+  [diff.deleted|-][diff.deleted.unchanged|blah ][diff.deleted.changed|\xe3\x82\xa2][diff.deleted.unchanged| blah] (esc)
+  [diff.inserted|+][diff.inserted.unchanged|blah ][diff.inserted.changed|\xe3\x82\xa4][diff.inserted.unchanged| blah] (esc)
diff --git a/mercurial/patch.py b/mercurial/patch.py
--- a/mercurial/patch.py
+++ b/mercurial/patch.py
@@ -50,7 +50,8 @@
 
 gitre = re.compile(br'diff --git a/(.*) b/(.*)')
 tabsplitter = re.compile(br'(\t+|[^\t]+)')
-_nonwordre = re.compile(br'([^a-zA-Z0-9_\x80-\xff])')
+wordsplitter = re.compile(br'(\t+| +|[a-zA-Z0-9_\x80-\xff]+|'
+                          '[^ \ta-zA-Z0-9_\x80-\xff])')
 
 PatchError = error.PatchError
 
@@ -2504,8 +2505,78 @@
         if chompline != line:
             yield (line[len(chompline):], '')
 
+def diffsinglehunkinline(hunklines):
+    """yield tokens for a list of lines in a single hunk, with inline colors"""
+    # prepare deleted, and inserted content
+    a = ''
+    b = ''
+    for line in hunklines:
+        if line[0] == '-':
+            a += line[1:]
+        elif line[0] == '+':
+            b += line[1:]
+        else:
+            raise error.ProgrammingError('unexpected hunk line: %s' % line)
+    # fast path: if either side is empty, use diffsinglehunk
+    if not a or not b:
+        for t in diffsinglehunk(hunklines):
+            yield t
+        return
+    # re-split the content into words
+    al = wordsplitter.findall(a)
+    bl = wordsplitter.findall(b)
+    # re-arrange the words to lines since the diff algorithm is line-based
+    aln = [s if s == '\n' else s + '\n' for s in al]
+    bln = [s if s == '\n' else s + '\n' for s in bl]
+    an = ''.join(aln)
+    bn = ''.join(bln)
+    # run the diff algorithm, prepare atokens and btokens
+    atokens = []
+    btokens = []
+    blocks = mdiff.allblocks(an, bn, lines1=aln, lines2=bln)
+    for (a1, a2, b1, b2), btype in blocks:
+        changed = btype == '!'
+        for token in mdiff.splitnewlines(''.join(al[a1:a2])):
+            atokens.append((changed, token))
+        for token in mdiff.splitnewlines(''.join(bl[b1:b2])):
+            btokens.append((changed, token))
+
+    # yield deleted tokens, then inserted ones
+    for prefix, label, tokens in [('-', 'diff.deleted', atokens),
+                                  ('+', 'diff.inserted', btokens)]:
+        nextisnewline = True
+        for changed, token in tokens:
+            if nextisnewline:
+                yield (prefix, label)
+                nextisnewline = False
+            # special handling line end
+            isendofline = token.endswith('\n')
+            if isendofline:
+                chomp = token[:-1] # chomp
+                token = chomp.rstrip() # detect spaces at the end
+                endspaces = chomp[len(token):]
+            # scan tabs
+            for maybetab in tabsplitter.findall(token):
+                if '\t' == maybetab[0]:
+                    currentlabel = 'diff.tab'
+                else:
+                    if changed:
+                        currentlabel = label + '.changed'
+                    else:
+                        currentlabel = label + '.unchanged'
+                yield (maybetab, currentlabel)
+            if isendofline:
+                if endspaces:
+                    yield (endspaces, 'diff.trailingwhitespace')
+                yield ('\n', '')
+                nextisnewline = True
+
 def difflabel(func, *args, **kw):
     '''yields 2-tuples of (output, label) based on the output of func()'''
+    if kw.get(r'opts') and kw[r'opts'].worddiff:
+        dodiffhunk = diffsinglehunkinline
+    else:
+        dodiffhunk = diffsinglehunk
     headprefixes = [('diff', 'diff.diffline'),
                     ('copy', 'diff.extended'),
                     ('rename', 'diff.extended'),
@@ -2525,7 +2596,7 @@
     hunkbuffer = []
     def consumehunkbuffer():
         if hunkbuffer:
-            for token in diffsinglehunk(hunkbuffer):
+            for token in dodiffhunk(hunkbuffer):
                 yield token
             hunkbuffer[:] = []
 
diff --git a/mercurial/color.py b/mercurial/color.py
--- a/mercurial/color.py
+++ b/mercurial/color.py
@@ -90,14 +90,16 @@
     'branches.inactive': 'none',
     'diff.changed': 'white',
     'diff.deleted': 'red',
-    'diff.deleted.highlight': 'red bold underline',
+    'diff.deleted.changed': 'red',
+    'diff.deleted.unchanged': 'red dim',
     'diff.diffline': 'bold',
     'diff.extended': 'cyan bold',
     'diff.file_a': 'red bold',
     'diff.file_b': 'green bold',
     'diff.hunk': 'magenta',
     'diff.inserted': 'green',
-    'diff.inserted.highlight': 'green bold underline',
+    'diff.inserted.changed': 'green',
+    'diff.inserted.unchanged': 'green dim',
     'diff.tab': '',
     'diff.trailingwhitespace': 'bold red_background',
     'changeset.public': '',



To: quark, #hg-reviewers, durin42, yuja
Cc: indygreg, dhduvall, yuja, spectral, mercurial-devel


More information about the Mercurial-devel mailing list