[PATCH 2 of 2 v6] releasenotes: add similarity check function to compare incoming notes

Sun Jul 30 01:35:04 EDT 2017

On Sat, 29 Jul 2017 15:30:46 +0530, Rishabh Madan wrote:
> # HG changeset patch
> # User Rishabh Madan <rishabhmadan96 at gmail.com>
> # Date 1501320794 -19800
> #      Sat Jul 29 15:03:14 2017 +0530
> # Node ID d75147e1966090d53f773f327886f0103152b1ca
> # Parent  1d79b04c402f3f431ca052b677b1021ddd93a10e
> releasenotes: add similarity check function to compare incoming notes

> +            titlediter = iter(self.titledforsection(section))
> +            nontitlediter = iter(self.nontitledforsection(section))
> +            existingnotes = converttitled(titlediter) + \
> +                convertnontitled(nontitlediter)

You don't need these explicit casts from list to iterator.

>              for title, paragraphs in other.titledforsection(section):
>                  if self.hastitledinsection(section, title):
>                      # TODO prompt for resolution if different and running in
> @@ -100,16 +106,32 @@
>                               (title, section))
>                      continue
>  
> -                # TODO perform similarity comparison and try to match against
> -                # existing.
> +                incoming_str = converttitled(paragraphs)[0]

So converttitled() takes

 a) a list of (title, paragraphs) tuples
 b) a list of paragraphs

Which is the valid use?

> +def converttitled(iterable):
> +    """
> +    Convert titled paragraphs to strings
> +    """
> +    string_list = []
> +    for titledparagraphs in iterable:
> +        str = ""
> +        for paragraphs in titledparagraphs:
> +            if isinstance(paragraphs, basestring):
> +                continue

Maybe this isinstace() is necessary because the type of the given iterable is
unstable?

> +            else:
> +                for para in paragraphs:
> +                    str += ' '.join(para) + ' '

Nit: str.join([str]) is generally cheaper than doing str + str repeatedly.

> +def getissuenum(incoming_str):
> +    """
> +    Returns issue number from the incoming string if it exists
> +    """
> +    issue = re.search(RE_ISSUE, incoming_str, re.IGNORECASE)
> +    if issue:
> +        issue = issue.group()
> +        issue = "".join(issue.split())

I couldn't figure out why it strips whitespace. Perhaps this could be stated
explicitly as follows.

  r'\(bissue) ?([0-9]{4,6}(?![0-9]))\b'
  group(1) + group(2)

> +def findissue(existing, issue, ui):
> +    """
> +    Returns true if issue number already exists in notes.
> +    """
> +    if any(issue in s for s in existing):
> +        ui.write(_("\"%s\" already exists in notes; "
> +                 "ignoring\n") % issue)
> +        return True
> +    else:
> +        return False
> +
> +def similar(existing, incoming_str, ui):
> +    """
> +    Returns true if similar note found in existing notes.
> +    """
> +    if len(incoming_str.split()) > 10:
> +        merge = similaritycheck(incoming_str, existing)
> +        if not merge:
> +            ui.write(_("\"%s\" already exists in notes file; "
> +                     "ignoring\n") % incoming_str)
> +            return True
> +        else:
> +            return False
> +    else:
> +        return False

Nit: I slightly prefer (ui, ...) over(..., ui) because it seems more common. Or
even ui.write() could be moved to the caller.