Fwd: Discussing about release notes extension

Wed Jun 7 15:22:32 UTC 2017

---------- Forwarded message ----------
From: Gregory Szorc <gregory.szorc at gmail.com>
Date: Tue, Jun 6, 2017 at 5:24 AM
Subject: Re: Discussing about release notes extension
To: Rishabh Madan <rishabhmadan96 at gmail.com>

For similarity matching, I would start with something like this:

1. Parse the existing relnotes file into a data structure
2. For each incoming notes fragment, map to a section
3. Look for an exact match. If found, match it. This should hopefully be
the common case.
4. Repeat #3 for all incoming notes fragments. This will filter out all
exacts, leaving you with N incoming items and M unmatched items.
5. Compare the similarity of each N incoming items to the M unmatched items
using a function. I'm not sure what. Maybe maximum substring length. Maybe
common word count. Maybe a hybrid. Maybe something weighted by the relative
position in the list compared to other matches. I'm not sure. This will
require some experimentation.
6. Sort matches by similarity and match according to best match.

If this is good enough, great. If not, we may have to explore other
options, such as committing a special file leaving a log of how rewrites
occurred. But it feels complicated to design a UI for this, as you would
need to tell it which modification actions you performed. That's not very
user convenient!

Another idea is to just "union merge" both the old file and the incoming
data when it can't be "merged" automatically. Then we could invoke a merge
tool and ask the user to resolve conflicts. We could potentially record
conflict resolutions based on the final result and store that somewhere to
help guide future "merges." Just a thought.

This is a complicated problem. It will likely require a bit of trial and
error.

On Mon, Jun 5, 2017 at 7:16 AM, Rishabh Madan <rishabhmadan96 at gmail.com>
wrote:

> Hey!
>
> I am working on the release notes extension as part of my GSoC project
> this year. I had a conversation with one of my mentors, Kevin. He suggested
> me to clear some of my doubts with you regarding this extension.
>
> I was planning to begin with writing a similarity comparison function that
> uses fuzzy string comparison for comparing the relnotes from revisions to
> the existing relnotes file. I wrote some code
> <https://bitbucket.org/madan96/hg/commits/6ac6c325c07015958863e2e22ef685f1fe7f5400>,
> I first mapped all the content under one section to the section name and
> then compared it with the incoming commit but the results were really bad.
>
> Can you please help me with designing how the similarity function should
> actually function because since the data structure of the incoming commit
> gets really complicated once we get blocks from the minirst parser, it
> becomes really difficult to extract commit message strings from it.
>
> In the meantime, I am working to improve the existing parser as there
> seems to be a bug with merging of bullet points.
>
> Thanks
> Rishabh Madan
> á§
>

-- 
Rishabh Madan
Second Year Undergraduate student
Indian Institute of Technology, Kharagpur
á§
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20170607/76231605/attachment.html>