[PATCH 1 of 1 default] hgweb: column / side-by-side diff functionality

Sun Jun 10 11:53:03 CDT 2012

Nice work! Live version here for the curious:

  http://mezard.eu/hg/hg-does-it-look-good-for-you/coldiff/e1aa1ed30030/mercurial/revset.py

Le 03/06/12 23:43, wujek srujek a écrit :
> # HG changeset patch
> # User wujek srujek <wujek.srujek at googlemail.com>
> # Date 1338755144 -7200
> # Node ID 2975d6fb8ad6cc24c020ed8507d9bc1c39652703
> # Parent  0a0cf3f26938ff7a084f2dcc9e59152ac6060e1e
> hgweb: column / side-by-side diff functionality
> 
> Adds new web command to the core, ``filecoldiff`` (and an alias, ``coldiff``),

I would prefer to use "compare" or "comparison" rather than "diff". Other opinions about this?

filecoldiff -> filecomp
fullcoldiff -> fullcomparison

> which enables colorful side-by-side diff, which for some might be much easier to
> work with than the standard line diff output. The idea how to implement comes
> from the SonicHq extension.
> 
> The web interface gets a new link to call the column diff functionality. It
> comes in two flavors: short output (the default), where only subsets (blocks) of
> the files' contents that were actually changed are returned, with 3 lines of
> context; and long output, where the whole files are output. The latter will
> result in much more output most of the time, especially where very few lines
> were changed in big files. The output modes are configurable with the new
> ``web.fullcoldiff`` setting in hgrc (taking values according to the
> ``mercurial.util.parsebool`` function). This hgrc setting can be overridden by
> adding ``fullcoldiff=<bool>`` to the request query string. The diff creates
> addressable lines, so as to enable sending links to appropriate lines, just as
> the standard diff does.
> 
> Known limitations:
> * the column diff is done against the first parent, just as the standard diff
> * this change allows examining diffs for single files only (as I am not sure if
>   examining the whole changeset in this way would be helpful)
> * syntax highlighting of the output changes is not performed (enabling the
>   highlight extension has no influence on it)
> * only the default (paper) style is updated, the rest will come next
> * the number of context lines in short output is not configurable (I'm not sure
>   if this is necessary)
>
> diff --git a/mercurial/help/config.txt b/mercurial/help/config.txt
> --- a/mercurial/help/config.txt
> +++ b/mercurial/help/config.txt
> @@ -1377,6 +1377,11 @@
>  ``errorlog``
>      Where to output the error log. Default is stderr.
>  
> +``fullcoldiff``
> +    Whether column / side-by-side diff should output the files' full contents,
> +    or just the changed blocks with context around them. Setting it to False
> +    results in more concise output. Default is False.
> +

I am not really convinced by the partial mode. Does it bring any information compared to a regular diff, besides a different formatting? For me, this file comparison view is really useful when diff does not give enough context to understand the change, in which case I prefer to have the full file.

Compare:

  http://mezard.eu/hg/hg-does-it-look-good-for-you/diff/e1aa1ed30030/mercurial/revset.py

  http://mezard.eu/hg/hg-does-it-look-good-for-you/coldiff/e1aa1ed30030/mercurial/revset.py

  http://mezard.eu/hg/hg-does-it-look-good-for-you/coldiff/e1aa1ed30030/mercurial/revset.py?fullcoldiff=1

I agree the last one would be better with a mean to navigate between groups.

>  ``hidden``
>      Whether to hide the repository in the hgwebdir index.
>      Default is False.
> diff --git a/mercurial/hgweb/webcommands.py b/mercurial/hgweb/webcommands.py
> --- a/mercurial/hgweb/webcommands.py
> +++ b/mercurial/hgweb/webcommands.py
> @@ -8,6 +8,7 @@
>  import os, mimetypes, re, cgi, copy
>  import webutil
>  from mercurial import error, encoding, archival, templater, templatefilters
> +from mercurial import util
>  from mercurial.node import short, hex
>  from mercurial.util import binary
>  from common import paritygen, staticfile, get_contact, ErrorResponse
> @@ -23,6 +24,7 @@
>     'log', 'rawfile', 'file', 'changelog', 'shortlog', 'changeset', 'rev',
>     'manifest', 'tags', 'bookmarks', 'branches', 'summary', 'filediff', 'diff',
>     'annotate', 'filelog', 'archive', 'static', 'graph', 'help',
> +   'filecoldiff', 'coldiff'
>  ]
>  
>  def log(web, req, tmpl):
> @@ -586,6 +588,28 @@
>  
>  diff = filediff
>  
> +def filecoldiff(web, req, tmpl):
> +    fctx = webutil.filectx(web.repo, req)
> +    rename = fctx and webutil.renamelink(fctx) or []
> +    full = web.configbool('web', 'fullcoldiff', False)
> +    if 'fullcoldiff' in req.form:
> +        full = util.parsebool(req.form['fullcoldiff'][0])
> +    coldiff = webutil.coldiff(tmpl, fctx, full)
> +    return tmpl('filecoldiff',
> +                file=fctx.path(),
> +                node=hex(fctx.node()),
> +                rev=fctx.rev(),
> +                date=fctx.date(),
> +                desc=fctx.description(),
> +                author=fctx.user(),
> +                rename=rename,
> +                branch=webutil.nodebranchnodefault(fctx),
> +                parent=webutil.parents(fctx),
> +                child=webutil.children(fctx),
> +                coldiff=coldiff)
> +
> +coldiff = filecoldiff

Do we need this alias?

> +
>  def annotate(web, req, tmpl):
>      fctx = webutil.filectx(web.repo, req)
>      f = fctx.path()
> diff --git a/mercurial/hgweb/webutil.py b/mercurial/hgweb/webutil.py
> --- a/mercurial/hgweb/webutil.py
> +++ b/mercurial/hgweb/webutil.py
> @@ -6,10 +6,12 @@
>  # This software may be used and distributed according to the terms of the
>  # GNU General Public License version 2 or any later version.
>  
> -import os, copy
> +import os, mimetypes, copy
>  from mercurial import match, patch, scmutil, error, ui, util
>  from mercurial.i18n import _
>  from mercurial.node import hex, nullid
> +from mercurial.util import binary

Please do not import symbols from modules. It is unfortunate the two previous lines give a bad example but these are the only ones tolerated in the code base.

> +from difflib import SequenceMatcher

Ditto, just import difflib.

Note we may replace this one with our internal differ, but this could be done after this patch is accepted.

>  def up(p):
>      if p[0] != "/":
> @@ -220,6 +222,106 @@
>      yield tmpl('diffblock', parity=parity.next(), blockno=blockno,
>                 lines=prettyprintlines(''.join(block), blockno))
>  
> +def coldiff(tmpl, fctx, full):
> +    '''Generator function that provides column / side-by-side diff data.'''
> +
> +    def filelines(f):
> +        if binary(f.data()):

if util.binary(f.data()):

> +            mt = mimetypes.guess_type(f.path())[0]
> +            if not mt:
> +                mt = 'application/octet-stream'
> +            return [_('(binary file %s, hash: %s)') % (mt, hex(f.filenode()))]
> +        return f.data().splitlines()
> +
> +    def diffline(type, leftlineno, leftline, rightlineno, rightline):
> +        lineid = leftlineno and "l%s" % leftlineno or ''

Maybe wrap:

  "l%s" % leftlineno

in parenthesis to avoid playing with operators priority.

> +        lineid += rightlineno and "r%s" % rightlineno or ''
> +        return tmpl('coldiffline',
> +                    type=type,
> +                    lineid=lineid,
> +                    leftlinenumber="% 6s" % (leftlineno or ''),
> +                    leftline=leftline or '',
> +                    rightlinenumber="% 6s" % (rightlineno or ''),
> +                    rightline=rightline or '')
> +
> +    def getblock(opcodes):
> +        for type, llo, lhi, rlo, rhi in opcodes:
> +            if type in ('equal'):

This is the same than:

  if type in 'equal':

which asks if type is a substring of 'equal'. You wanted:

  if type in ('equal',):

or just:

  if type == 'equal':

> +                for i in xrange(lhi - llo):
> +                    yield diffline(type=type,
> +                                   leftlineno=llo + i + 1,
> +                                   leftline=leftlines[llo + i],
> +                                   rightlineno=rlo + i + 1,
> +                                   rightline=rightlines[rlo + i])
> +            elif type == 'delete':
> +                for i in xrange(llo, lhi):
> +                    yield diffline(type=type,
> +                                   leftlineno=i + 1,
> +                                   leftline=leftlines[i],
> +                                   rightlineno=None,
> +                                   rightline=None)
> +            elif type == 'insert':
> +                for i in xrange(rlo, rhi):
> +                    yield diffline(type=type,
> +                                   leftlineno=None,
> +                                   leftline=None,
> +                                   rightlineno=i + 1,
> +                                   rightline=rightlines[i])
> +            else: # replace

Note this case handles all the others, you can just keep this one.

> +                len1 = lhi - llo
> +                len2 = rhi - rlo
> +                if len1 < len2:
> +                    count = len1
> +                else:
> +                    count = len2

count = min(len1, len2)

> +                for i in xrange(count):
> +                    yield diffline(type=type,
> +                                   leftlineno=llo + i + 1,
> +                                   leftline=leftlines[llo + i],
> +                                   rightlineno=rlo + i + 1,
> +                                   rightline=rightlines[rlo + i])
> +                if len1 > len2:
> +                    for i in xrange(llo + count, lhi):
> +                        yield diffline(type=type,
> +                                       leftlineno=i + 1,
> +                                       leftline=leftlines[i],
> +                                       rightlineno=None,
> +                                       rightline=None)
> +                elif len2 > len1:
> +                    for i in xrange(rlo + count, rhi):
> +                        yield diffline(type=type,
> +                                       leftlineno=None,
> +                                       leftline=None,
> +                                       rightlineno=i + 1,
> +                                       rightline=rightlines[i])
> +
> +    parents = fctx.parents()
> +    if not parents:
> +        leftrev = -1
> +        leftnode = nullid
> +        leftlines = ()
> +    else:
> +        pfctx = parents[0]
> +        leftrev = pfctx.filerev()
> +        leftnode = pfctx.filenode()
> +        leftlines = filelines(pfctx)
> +
> +    rightlines = filelines(fctx)
> +
> +    s = SequenceMatcher(None, leftlines, rightlines, False)

The last "autojunk" argument was added by python 2.7.1. There are ways to test function arguments count, but it would mean a different behaviour between python < 2.7.1 and >= 2.7.1. Also, this could be replaced later by bdiff.blocks(). Removing the last argument completely is probably the way to go.

       s = SequenceMatcher(None, leftlines, rightlines)

> +    if full:
> +        blocks = [tmpl('coldiffblock', lines=getblock(s.get_opcodes()))]
> +    else:
> +        blocks = (tmpl('coldiffblock', lines=getblock(oc))
> +                     for oc in s.get_grouped_opcodes())

Thank you for the patch!

--
Patrick Mézard