D3212: patch: implement a new worddiff algorithm

Tue Apr 10 11:00:59 EDT 2018

yuja requested changes to this revision.
yuja added a comment.
This revision now requires changes to proceed.

  I have no opinion about the "dim" thingy, but the series generally looks
  good to me.

  Thanks for tackling on the painfully slow `SequenceMatcher.ratio()` issue.

INLINE COMMENTS

> patch.py:53
>  tabsplitter = re.compile(br'(\t+|[^\t]+)')
> -_nonwordre = re.compile(br'([^a-zA-Z0-9_\x80-\xff])')
> +wordsplitter = re.compile(br'(\t+| +|[a-zA-Z0-9_\x80-\xff]+|'
> +                          '[^ \ta-zA-Z0-9_\x80-\xff])')

Nit: `_wordsplitter` as it is private constant

> patch.py:54
> +wordsplitter = re.compile(br'(\t+| +|[a-zA-Z0-9_\x80-\xff]+|'
> +                          '[^ \ta-zA-Z0-9_\x80-\xff])')
>  

Missed `br''` here though "\t" and "\x" of string escape are compatible with regexp's.

> patch.py:2536
> +        for token in mdiff.splitnewlines(''.join(bl[b1:b2])):
> +            btokens.append((changed, token))
> +

Nit: maybe we can sort out tokens here instead of re-parsing tabs, newlines, trailing whitespaces later.

But I'm not sure if that will make things simpler.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3212

To: quark, #hg-reviewers, durin42, yuja
Cc: yuja, spectral, mercurial-devel