[PATCH 2 of 2 v6] releasenotes: add similarity check function to compare incoming notes

Sun Jul 30 10:38:33 EDT 2017

On Sun, Jul 30, 2017 at 11:05 AM, Yuya Nishihara <yuya at tcha.org> wrote:

> On Sat, 29 Jul 2017 15:30:46 +0530, Rishabh Madan wrote:
> > # HG changeset patch
> > # User Rishabh Madan <rishabhmadan96 at gmail.com>
> > # Date 1501320794 -19800
> > #      Sat Jul 29 15:03:14 2017 +0530
> > # Node ID d75147e1966090d53f773f327886f0103152b1ca
> > # Parent  1d79b04c402f3f431ca052b677b1021ddd93a10e
> > releasenotes: add similarity check function to compare incoming notes
>
> > +            titlediter = iter(self.titledforsection(section))
> > +            nontitlediter = iter(self.nontitledforsection(section))
> > +            existingnotes = converttitled(titlediter) + \
> > +                convertnontitled(nontitlediter)
>
> You don't need these explicit casts from list to iterator.
>
> >              for title, paragraphs in other.titledforsection(section):
> >                  if self.hastitledinsection(section, title):
> >                      # TODO prompt for resolution if different and
> running in
> > @@ -100,16 +106,32 @@
> >                               (title, section))
> >                      continue
> >
> > -                # TODO perform similarity comparison and try to match
> against
> > -                # existing.
> > +                incoming_str = converttitled(paragraphs)[0]
>
> So converttitled() takes
>
>  a) a list of (title, paragraphs) tuples
>  b) a list of paragraphs
>
> Which is the valid use?
>

The first one is what it takes as input.

>
> > +def converttitled(iterable):
> > +    """
> > +    Convert titled paragraphs to strings
> > +    """
> > +    string_list = []
> > +    for titledparagraphs in iterable:
> > +        str = ""
> > +        for paragraphs in titledparagraphs:
> > +            if isinstance(paragraphs, basestring):
> > +                continue
>
> Maybe this isinstace() is necessary because the type of the given iterable
> is
> unstable?
>
I need to iterate to iterate through a data like `('Title fix ', [['adds
fix to notes']])`. So for this case, title is a string but notes is a list,
hence I'm using isinstance to check if the value is a string or not. In
case it is, that means it's a list, so I just continue iterating.

>
> > +            else:
> > +                for para in paragraphs:
> > +                    str += ' '.join(para) + ' '
>
> Nit: str.join([str]) is generally cheaper than doing str + str repeatedly.
>
> > +def getissuenum(incoming_str):
> > +    """
> > +    Returns issue number from the incoming string if it exists
> > +    """
> > +    issue = re.search(RE_ISSUE, incoming_str, re.IGNORECASE)
> > +    if issue:
> > +        issue = issue.group()
> > +        issue = "".join(issue.split())
>
> I couldn't figure out why it strips whitespace. Perhaps this could be
> stated
> explicitly as follows.
>
>   r'\(bissue) ?([0-9]{4,6}(?![0-9]))\b'
>   group(1) + group(2)
>

 Initially I used to strip whitespace so that we can process issue number
if there's a space between 'issue' and the number. But with the current
pattern, I don't think we would need any such measures.

>
> > +def findissue(existing, issue, ui):
> > +    """
> > +    Returns true if issue number already exists in notes.
> > +    """
> > +    if any(issue in s for s in existing):
> > +        ui.write(_("\"%s\" already exists in notes; "
> > +                 "ignoring\n") % issue)
> > +        return True
> > +    else:
> > +        return False
> > +
> > +def similar(existing, incoming_str, ui):
> > +    """
> > +    Returns true if similar note found in existing notes.
> > +    """
> > +    if len(incoming_str.split()) > 10:
> > +        merge = similaritycheck(incoming_str, existing)
> > +        if not merge:
> > +            ui.write(_("\"%s\" already exists in notes file; "
> > +                     "ignoring\n") % incoming_str)
> > +            return True
> > +        else:
> > +            return False
> > +    else:
> > +        return False
>
> Nit: I slightly prefer (ui, ...) over(..., ui) because it seems more
> common. Or
> even ui.write() could be moved to the caller.

 I'll make the necessary changes for this. Thanks for pointing out.
ᐧ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20170730/e37e0a54/attachment.html>