[PATCH RFC] releasenotes: add similarity check function to compare incoming notes

Yuya Nishihara yuya at tcha.org
Tue Jun 27 11:11:36 EDT 2017


On Mon, 26 Jun 2017 10:23:50 -0400, Augie Fackler wrote:
> On Mon, Jun 26, 2017 at 10:53:27PM +0900, Yuya Nishihara wrote:
> > On Fri, 23 Jun 2017 18:37:22 +0200, Rishabh Madan wrote:
> > > # HG changeset patch
> > > # User Rishabh Madan <rishabhmadan96 at gmail.com>
> > > # Date 1498235213 -7200
> > > #      Fri Jun 23 18:26:53 2017 +0200
> > > # Node ID a90693382178ca82b2918ee4b159dfb490d1bfc8
> > > # Parent  b6e6d8df88beb042f5a37123a0ea6a9b437f7755
> > > releasenotes: add similarity check function to compare incoming notes
> > >
> > > It is possible that the incoming note fragments might have some similar content
> > > as the existing release notes. In case of a bug fix, we match for issueNNNN in $
> > > existing notes. For other general cases, it makes use of fuzzywuzzy library to
> > > get a similarity score. If the score is above a certain threshold, we ignore
> > > the fragment otherwise add it. But the score might be misleading for small comm$
> > > messages. So, it uses similarity function if only the length of string (in word$
> > > is above a certain number. The patch also adds tests related to its usage.
> > > But it needs improvement in the sense of combining the incoming notes. We can
> > > use interactive mode for adding the notes. Maybe we can do this if similarity
> > > score is under a certain range.
> > >
> > > diff -r b6e6d8df88be -r a90693382178 hgext/releasenotes.py
> > > --- a/hgext/releasenotes.py	Fri Jun 23 17:15:53 2017 +0200
> > > +++ b/hgext/releasenotes.py	Fri Jun 23 18:26:53 2017 +0200
> > > @@ -12,6 +12,7 @@
> > >  """
> > >
> > >  from __future__ import absolute_import
> > > +from fuzzywuzzy import fuzz
> >
> > I have no idea if the releasenotes extension may depend on third-party modules.
> > Given this extension wouldn't be widely used, it might be okay. But I'm not
> > sure, literally.
> 
> I'm certainly not opposed to starting with fuzzywuzzy and moving to
> difflib later if the results are good enough. I'll let y'all figure
> that out though.
> 
> (There's some precedent already, like how we depend on pygments for highlight.)

Okay.

Rishabh, can you add '#require fuzzywuzzy' rule to the releasenotes tests?
Perhaps you'll just need to copy-edit has_pygments() in tests/hghave.py.

Anyways, it's probably a good step to think about how to test this new
feature, and where functions will be split so they can be easily tested.


More information about the Mercurial-devel mailing list