Design related issues with similarity function in releasenotes extension.

Mon Jun 19 09:29:59 EDT 2017

I have added a regex check for issueNNNN for the incoming notes. Now it
simply looks for the same issue number in the notes.
Also made a minimum length in words check for the incoming.

Should I send a RFC patch for further discussion?

ᐧ

On Sat, Jun 17, 2017 at 1:29 AM, Augie Fackler <raf at durin42.com> wrote:

>
> > On Jun 15, 2017, at 16:46, Rishabh Madan <rishabhmadan96 at gmail.com>
> wrote:
> >
> > An important part of the release notes extension is to deal with the
> notes from the incoming commit messages and combining/ignoring them with an
> existing releasenotes file. To begin with, we first look for an exact match
> of the incoming notes fragments in the existing file. If a match is found,
> we simply ignore (don't add it to release notes) the fragment. Then we
> compare the remaining incoming notes fragments of a particular section
> (sections like fix, features, perf etc.) with the notes items under this
> same section in the existing release notes.
> > As of now, I'm using fuzzywuzzy's token set ratio method (
> http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/)
> for comparison, but there might be licensing issues that we may need to
> talk about in case we plan to use it. (Any other solutions if any can also
> be given a thought.) In the basic implementation (link to the image) that I
> made, I simply threshold the match score and ignore the fragment if the
> score is above a certain threshold.
> >
> > But the problem is that we simply can't afford to ignore some of them.
> For eg, if the message is really small, say in the case of bug fixes, then
> there are chances that it might cross the threshold even though it's
> different from what exists in the release notes. There can be other such
> cases too. One solution as Greg suggested is that we can just "union merge"
> both the old file and the incoming data when it can't be "merged"
> automatically. Then we could invoke a merge tool and ask the user to
> resolve conflicts. We could potentially record conflict resolutions based
> on the final result and store that somewhere to help guide future "merges.
>
> I know it's one-off for bug fixes, but maybe we could look for issueNNNN
> and use that as a stronger signal than fuzzywuzzy can provide?
>
> Maybe have a minimum length in words before we'll deduplicate? I like that
> idea less well, but maybe it's enough...
>
> > I would like to discuss these problems and it would be great if someone
> can suggest a better solution.
>
>

-- 
Rishabh Madan
Second Year Undergraduate student
Indian Institute of Technology, Kharagpur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20170619/d6ea6cd6/attachment.html>