[PATCH 2 of 2 v6] releasenotes: add similarity check function to compare incoming notes
Rishabh Madan
rishabhmadan96 at gmail.com
Sun Jul 30 10:38:33 EDT 2017
On Sun, Jul 30, 2017 at 11:05 AM, Yuya Nishihara <yuya at tcha.org> wrote:
> On Sat, 29 Jul 2017 15:30:46 +0530, Rishabh Madan wrote:
> > # HG changeset patch
> > # User Rishabh Madan <rishabhmadan96 at gmail.com>
> > # Date 1501320794 -19800
> > # Sat Jul 29 15:03:14 2017 +0530
> > # Node ID d75147e1966090d53f773f327886f0103152b1ca
> > # Parent 1d79b04c402f3f431ca052b677b1021ddd93a10e
> > releasenotes: add similarity check function to compare incoming notes
>
> > + titlediter = iter(self.titledforsection(section))
> > + nontitlediter = iter(self.nontitledforsection(section))
> > + existingnotes = converttitled(titlediter) + \
> > + convertnontitled(nontitlediter)
>
> You don't need these explicit casts from list to iterator.
>
> > for title, paragraphs in other.titledforsection(section):
> > if self.hastitledinsection(section, title):
> > # TODO prompt for resolution if different and
> running in
> > @@ -100,16 +106,32 @@
> > (title, section))
> > continue
> >
> > - # TODO perform similarity comparison and try to match
> against
> > - # existing.
> > + incoming_str = converttitled(paragraphs)[0]
>
> So converttitled() takes
>
> a) a list of (title, paragraphs) tuples
> b) a list of paragraphs
>
> Which is the valid use?
>
The first one is what it takes as input.
>
> > +def converttitled(iterable):
> > + """
> > + Convert titled paragraphs to strings
> > + """
> > + string_list = []
> > + for titledparagraphs in iterable:
> > + str = ""
> > + for paragraphs in titledparagraphs:
> > + if isinstance(paragraphs, basestring):
> > + continue
>
> Maybe this isinstace() is necessary because the type of the given iterable
> is
> unstable?
>
I need to iterate to iterate through a data like `('Title fix ', [['adds
fix to notes']])`. So for this case, title is a string but notes is a list,
hence I'm using isinstance to check if the value is a string or not. In
case it is, that means it's a list, so I just continue iterating.
>
> > + else:
> > + for para in paragraphs:
> > + str += ' '.join(para) + ' '
>
> Nit: str.join([str]) is generally cheaper than doing str + str repeatedly.
>
> > +def getissuenum(incoming_str):
> > + """
> > + Returns issue number from the incoming string if it exists
> > + """
> > + issue = re.search(RE_ISSUE, incoming_str, re.IGNORECASE)
> > + if issue:
> > + issue = issue.group()
> > + issue = "".join(issue.split())
>
> I couldn't figure out why it strips whitespace. Perhaps this could be
> stated
> explicitly as follows.
>
> r'\(bissue) ?([0-9]{4,6}(?![0-9]))\b'
> group(1) + group(2)
>
Initially I used to strip whitespace so that we can process issue number
if there's a space between 'issue' and the number. But with the current
pattern, I don't think we would need any such measures.
>
> > +def findissue(existing, issue, ui):
> > + """
> > + Returns true if issue number already exists in notes.
> > + """
> > + if any(issue in s for s in existing):
> > + ui.write(_("\"%s\" already exists in notes; "
> > + "ignoring\n") % issue)
> > + return True
> > + else:
> > + return False
> > +
> > +def similar(existing, incoming_str, ui):
> > + """
> > + Returns true if similar note found in existing notes.
> > + """
> > + if len(incoming_str.split()) > 10:
> > + merge = similaritycheck(incoming_str, existing)
> > + if not merge:
> > + ui.write(_("\"%s\" already exists in notes file; "
> > + "ignoring\n") % incoming_str)
> > + return True
> > + else:
> > + return False
> > + else:
> > + return False
>
> Nit: I slightly prefer (ui, ...) over(..., ui) because it seems more
> common. Or
> even ui.write() could be moved to the caller.
I'll make the necessary changes for this. Thanks for pointing out.
ᐧ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20170730/e37e0a54/attachment.html>
More information about the Mercurial-devel
mailing list