[issue2589] Mercurial doesn't treat unicode line separator as newline

Wed Jan 12 10:52:30 UTC 2011

New submission from Steve Streeting <steve at torusknot.com>:

I had a problem with parsing the output of Mercurial recently because when 
asked to display the first line of a commit description (for example, in hg 
summary or hg log with a template param of {desc|firstline}), if the commit 
text includes a unicode line separator (U+2028 or byte sequence E2, 80, A8), 
Mercurial ignores it and continues treating the text afterwards as part of 
the first line. 

The problem then arises that when parsing this data, anything which 
correctly processes this unicode character becomes confused, because the 
output for one record was supposed to be on one line, but there's actually a 
newline embedded in a single record. 

When asked for the first line of the commit description, Mercurial should be 
splitting on the Unicode line separator too, especially if HGENCODING=UTF-8.

----------
messages: 14880
nosy: sjstreeting
priority: bug
status: unread
title: Mercurial doesn't treat unicode line separator as newline

____________________________________________________
Mercurial issue tracker <bugs at mercurial.selenic.com>
<http://mercurial.selenic.com/bts/issue2589>
____________________________________________________