[PATCH RFC] releasenotes: add similarity check function to compare incoming notes
raf at durin42.com
Mon Jun 26 10:23:50 EDT 2017
On Mon, Jun 26, 2017 at 10:53:27PM +0900, Yuya Nishihara wrote:
> On Fri, 23 Jun 2017 18:37:22 +0200, Rishabh Madan wrote:
> > # HG changeset patch
> > # User Rishabh Madan <rishabhmadan96 at gmail.com>
> > # Date 1498235213 -7200
> > # Fri Jun 23 18:26:53 2017 +0200
> > # Node ID a90693382178ca82b2918ee4b159dfb490d1bfc8
> > # Parent b6e6d8df88beb042f5a37123a0ea6a9b437f7755
> > releasenotes: add similarity check function to compare incoming notes
> > It is possible that the incoming note fragments might have some similar content
> > as the existing release notes. In case of a bug fix, we match for issueNNNN in $
> > existing notes. For other general cases, it makes use of fuzzywuzzy library to
> > get a similarity score. If the score is above a certain threshold, we ignore
> > the fragment otherwise add it. But the score might be misleading for small comm$
> > messages. So, it uses similarity function if only the length of string (in word$
> > is above a certain number. The patch also adds tests related to its usage.
> > But it needs improvement in the sense of combining the incoming notes. We can
> > use interactive mode for adding the notes. Maybe we can do this if similarity
> > score is under a certain range.
> > diff -r b6e6d8df88be -r a90693382178 hgext/releasenotes.py
> > --- a/hgext/releasenotes.py Fri Jun 23 17:15:53 2017 +0200
> > +++ b/hgext/releasenotes.py Fri Jun 23 18:26:53 2017 +0200
> > @@ -12,6 +12,7 @@
> > """
> > from __future__ import absolute_import
> > +from fuzzywuzzy import fuzz
> I have no idea if the releasenotes extension may depend on third-party modules.
> Given this extension wouldn't be widely used, it might be okay. But I'm not
> sure, literally.
I'm certainly not opposed to starting with fuzzywuzzy and moving to
difflib later if the results are good enough. I'll let y'all figure
that out though.
(There's some precedent already, like how we depend on pygments for highlight.)
> FWIW, have you tried difflib.SequenceMatcher()? Its algorithm would be more
> generic, but might be good enough.
> And I think this kind of functions will need a unit/doc test.
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
More information about the Mercurial-devel