[PATCH 2 of 2 v6] releasenotes: add similarity check function to compare incoming notes
Yuya Nishihara
yuya at tcha.org
Sun Jul 30 01:35:04 EDT 2017
On Sat, 29 Jul 2017 15:30:46 +0530, Rishabh Madan wrote:
> # HG changeset patch
> # User Rishabh Madan <rishabhmadan96 at gmail.com>
> # Date 1501320794 -19800
> # Sat Jul 29 15:03:14 2017 +0530
> # Node ID d75147e1966090d53f773f327886f0103152b1ca
> # Parent 1d79b04c402f3f431ca052b677b1021ddd93a10e
> releasenotes: add similarity check function to compare incoming notes
> + titlediter = iter(self.titledforsection(section))
> + nontitlediter = iter(self.nontitledforsection(section))
> + existingnotes = converttitled(titlediter) + \
> + convertnontitled(nontitlediter)
You don't need these explicit casts from list to iterator.
> for title, paragraphs in other.titledforsection(section):
> if self.hastitledinsection(section, title):
> # TODO prompt for resolution if different and running in
> @@ -100,16 +106,32 @@
> (title, section))
> continue
>
> - # TODO perform similarity comparison and try to match against
> - # existing.
> + incoming_str = converttitled(paragraphs)[0]
So converttitled() takes
a) a list of (title, paragraphs) tuples
b) a list of paragraphs
Which is the valid use?
> +def converttitled(iterable):
> + """
> + Convert titled paragraphs to strings
> + """
> + string_list = []
> + for titledparagraphs in iterable:
> + str = ""
> + for paragraphs in titledparagraphs:
> + if isinstance(paragraphs, basestring):
> + continue
Maybe this isinstace() is necessary because the type of the given iterable is
unstable?
> + else:
> + for para in paragraphs:
> + str += ' '.join(para) + ' '
Nit: str.join([str]) is generally cheaper than doing str + str repeatedly.
> +def getissuenum(incoming_str):
> + """
> + Returns issue number from the incoming string if it exists
> + """
> + issue = re.search(RE_ISSUE, incoming_str, re.IGNORECASE)
> + if issue:
> + issue = issue.group()
> + issue = "".join(issue.split())
I couldn't figure out why it strips whitespace. Perhaps this could be stated
explicitly as follows.
r'\(bissue) ?([0-9]{4,6}(?![0-9]))\b'
group(1) + group(2)
> +def findissue(existing, issue, ui):
> + """
> + Returns true if issue number already exists in notes.
> + """
> + if any(issue in s for s in existing):
> + ui.write(_("\"%s\" already exists in notes; "
> + "ignoring\n") % issue)
> + return True
> + else:
> + return False
> +
> +def similar(existing, incoming_str, ui):
> + """
> + Returns true if similar note found in existing notes.
> + """
> + if len(incoming_str.split()) > 10:
> + merge = similaritycheck(incoming_str, existing)
> + if not merge:
> + ui.write(_("\"%s\" already exists in notes file; "
> + "ignoring\n") % incoming_str)
> + return True
> + else:
> + return False
> + else:
> + return False
Nit: I slightly prefer (ui, ...) over(..., ui) because it seems more common. Or
even ui.write() could be moved to the caller.
More information about the Mercurial-devel
mailing list