[PATCH] notify extension: truncate lines to 998 characters to conform to RFC 5322

William Gallafent william at gallaf.net
Thu Oct 27 11:45:39 CDT 2011


On 27 October 2011 17:09, Matt Mackall <mpm at selenic.com> wrote:
> On Thu, 2011-10-27 at 14:51 +0100, William Gallafent wrote:
>> According to RFC 5322 ( http://www.rfc-editor.org/rfc/rfc5322.txt )
>> section 2.1.1, lines in internet messages must not exceed 998
>> characters in length. The Notify extension does not enforce this when
>> sending emails containing the result of a diff. The following patch
>> fixes this by truncating any such lines found in the diff output,
>> adding the conventional "..." to indicate that truncation has taken
>> place on that line, and preceding the diff output with a warning that
>> the truncation has taken place.
>
> Looks like we have a case of battling, evolving standards from the early
> 70s.

RFC 5322 was published in 2008, but I'll take “evolving” :) …
regardless of the history of the RFC, it says “MUST”! So we should aim
to ensure that no mail originating in mercurial has any lines longer
than 998 characters, because if that is not the case, then mercurial
is sending non-compliant messages.

> Such a truncated diff is no longer a correct diff.

Agreed, but see my next paragraph.

> Applying it may in
> fact result in '...' ending up in the source erroneously. So at a
> minimum, the warning needs to end up -in the email-, preferably right in
> the middle of the diff where the truncation occurred to prevent it from
> applying.

As I understand it, the purpose of the notify extension is _not_ to
provide complete and unadulterated diffs, but to _notify_ watchers
that a change has been made, and _indicate_ the nature of that change
by providing first a diffstat, and then the first three hundred lines
(by default) of the diff.

Not to labour the point, but: the diff representing the change is
_already_ truncated, if it is longer than 300 lines; any change with a
diff longer than 300 lines will already no longer be the correct full
diff for this change.

> But we've also got the patchbomb extension, which will have the same
> issue.

Quite possibly. I expect that once a good fix for notify is agreed, a
similar fix for patchbomb will be obvious. I think we should
concentrate on the notify extension here though, and keep the two
discussions separate. My reason for that is that my understanding of
the purpose of the the patchbomb extension is that is designed to send
patches to people in order that those patches be applied; on the other
hand, the purpose of the notify extension is simply to tell
subscribers that a change has occurred, and to give them an outline
desription of the change (the fact that this description includes [a
part of] patch along with the other information is simply because that
gives a good overview of the nature of the change …)

> And there, the whole point is for the remote end to apply the
> patch. Intentionally crippling a patch there is a non-starter.

Definitely. It feels that attaching the diff as an attachment to the
email might be the only solution there; is there any reason why that
should not be done? I won't comment further here since I'd like to
stick to notify and leave patchbomb for later.

> I'm not sure whether it's better to send a correct patch and hope the
> client doesn't choke on it, or simply abort and force the user to use an
> attachment.

It feels to me that the answer to that is probably “correct patch” for
patchbomb, but “human-readable” for notify, given the different
purposes of these two extensions. In either case, conformance is my
goal here :)

> Notably, RFC821 (1982) has this to say about line lengths:
>
>          ****************************************************
>          *                                                  *
>          *  TO THE MAXIMUM EXTENT POSSIBLE, IMPLEMENTATION  *
>          *  TECHNIQUES WHICH IMPOSE NO LIMITS ON THE LENGTH *
>          *  OF THESE OBJECTS SHOULD BE USED.                *
>          *                                                  *
>          ****************************************************

Similar but less shouty text remains in section 4.5.3.1 of the current
equivalent, rfc5321. My reading, though, indicates that the following
section of the same document is more relevant to this discussion:

---B<---
4.5.3.1.6.  Text Line

   The maximum total length of a text line including the <CRLF> is 1000
   octets (not counting the leading dot duplicated for transparency).
   This number may be increased by the use of SMTP Service Extensions.
---B<---

Note also that the introduction to this section, “4.5.3.1.  Size
Limits and Minimums” states “Objects larger than these sizes SHOULD be
avoided when possible.”, while also saying that “some Internet mail
constructs such as encoded X.400 addresses (RFC 2156 [35]) will often
require larger objects. Clients MAY attempt to transmit these, but
MUST be prepared for a server to reject them if they cannot be handled
by it.”. I think it is possible to avoid larger objects here without
damaging the purpose of the notify extension. If I'm wrong about that
purpose, please correct me (and explain the 300-line truncation that
already occurs!) …

Since we would rather mail sent by mercurial is not rejected by
mailservers, I suggest that being compliant here is important. The way
to be compliant probably differs between notify and patchbomb. I think
my approach is the right one for notify, and agree that something
should be done about patchbomb (outside this thread!) to make it more
compliant too, probably by wrapping the patches as attachments to
ensure their integrity is maintained.

(The rest of this message discusses only technicalities of the patch,
not its purpose or nature!)

[removed my introduction to the patch]
>> --- /usr/share/pyshared/hgext/notify.py~      2011-08-01 23:08:59.000000000 +0000
>> +++ /usr/share/pyshared/hgext/notify.py       2011-10-27 13:06:12.000000000 +0000
>> @@ -257,6 +257,11 @@
>>          chunks = patch.diff(self.repo, prev, ref, opts=patch.diffopts(self.ui))
>>          difflines = ''.join(chunks).splitlines()
>>
>> +        if any(len(l)>998 for l in difflines):
>
> The any function is post-py2.4-ism. Simpler to just use a for loop over
> the lines. Or util.any.

OK.

>
>> +            msg = _('\nwarning: long diff lines truncated to 998 characters to conform to RFC5322.\n')
>> +            self.ui.write(msg)
>
> ui.warning would be more appropriate here, but who's going to read it?
> It'll generally end up in an web server log file.

The warning is for the human recipient of the email. In the same way
as the truncation of the patch at 300 lines generates a warning for
the human recipient, so should this line-length truncation. For me it
ends up in the email sent to the recipient of the notification, as per
the other self.ui.write calls in this function (from which I copied
that line!) - confirmed in testing.

>> +            difflines = [(l[:995]+"...",l)[len(l)<999] for l in difflines]
>> +
>
> You'll notice we use PEP8 spacing around operators like '>', '+', and
> ','.

OK, will correct. You can tell I'm new to Python as well as to Mercurial :)

-- 
Bill Gallafent.


More information about the Mercurial-devel mailing list