5798 – annotate output is broken if the file contains multiple CR in a line

Bug 5798 - annotate output is broken if the file contains multiple CR in a line

Summary: annotate output is broken if the file contains multiple CR in a line

Status:	RESOLVED FIXED

Alias:	None

Product:	Mercurial
Classification:	Unclassified
Component:	Mercurial (show other bugs)
Version:	default branch
Hardware:	PC Linux

Importance:	normal bug
Assignee:	Yuya Nishihara

URL:
Keywords:

Depends on:
Blocks:

Reported:	2018-02-20 09:59 UTC by Hideki EIRAKU
Modified:	2018-03-09 08:18 UTC (History)
CC List:	3 users (show)

See Also:
Python Version:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Hideki EIRAKU 2018-02-20 09:59 UTC

I found an incorrect output from hg annotate when I wrote a shell
script with multiple CR in a line, for example:

------------------------------------------------------------
echo -n Wait
sleep 1
echo -n '^M    ^M'
# Note: ^M is CR ('\r')
------------------------------------------------------------

The following bash script reproduces the bug:

------------------------------------------------------------
hg init
echo -e 'line0 \rtest\rtest' > a
echo line2 >> a
echo line3 >> a
hg add a
hg commit -m a -u a
hg annotate a | sed -n l
------------------------------------------------------------

The sed command is for looking at the CR in the output.  The output
is:

------------------------------------------------------------
0: line0 \r0: test\r0: test$
------------------------------------------------------------

The file a contains 3 lines, but the 'hg annotate a' outputs only the
first line.  The line2 and line3 are gone.  It seems that the annotate
command shows revision number 0 at every CR.

After the above command, the following command creates a next commit:

------------------------------------------------------------
echo -e 'line0 \rtest\rtest' > a
echo line1 >> a
echo line2 >> a
echo line3 >> a
hg commit -m b -u a
hg annotate a | sed -n l
------------------------------------------------------------

The command inserts the line1 between line0 and line2.  The output is:

------------------------------------------------------------
0: line0 \r1: test\r0: test$
0: line1$
------------------------------------------------------------

Interestingly, the 'hg annotate a' outputs revision 1 at the first CR,
and the new line1 as revision 0.  The line2 and line3 are still gone.

I guess that a revision number for each line is searched based on the
LF newline, but the annotate counts lines based on not only LF but CR
as a newline.  Both should use the same newline code to avoid such
broken output.  If both uses LF only, the expected output is:

------------------------------------------------------------
0: line0 \rtest\rtest$
1: line1$
0: line2$
0: line3$
------------------------------------------------------------

If both uses CR and LF for newline, the expected output is:

------------------------------------------------------------
0: line0 \r0: test\r0: test$
1: line1$
0: line2$
0: line3$
------------------------------------------------------------

Comment 1 Yuya Nishihara 2018-02-20 10:21 UTC

Wild guess, this would be caused by str.splitlines() vs mdiff.splitlines()
or str.split('\n').

Comment 2 Sangeet Kumar Mishra 2018-02-24 04:55 UTC

I was going through this issue and found that the problem lies in the definition of function `lines()` in `annotate.py` under the `annotate` function.
The problem is, it's not considering Carriage Returns as new lines.

For Example: `'line0 \rtest\rtest'` actually is three lines, but the function counts it as 1 line!

I have corrected this behaviour, by modifying the function to consider CR also.

Shall I send a patch?

Comment 3 Sangeet Kumar Mishra 2018-02-24 05:14 UTC

I am writing the tests, and will send a patch asap!

Comment 4 Yuya Nishihara 2018-02-24 05:21 UTC

> I have corrected this behaviour, by modifying the function to consider CR also.

That might fix the missing output lines, but the output would be still incorrect
because the diff is computed against lines split by LF.

> Shall I send a patch?

I've already sent a patch on Wednesday, so no need to hurry.

Comment 5 Sangeet Kumar Mishra 2018-02-24 08:57 UTC

(In reply to Yuya Nishihara from comment #4)
Oh! Thanks for letting me know.

Can you tell how you solved this issue? 
A link to the patch would be fine.

Comment 6 Yuya Nishihara 2018-02-24 20:46 UTC

Just s/x.splitlines()/mdiff.splitnewlines(x)/.

http://mercurial.markmail.org/message/zrznoj76wgsnhn6x

Comment 7 HG Bot 2018-03-02 19:10 UTC

Fixed by https://mercurial-scm.org/repo/hg/rev/0a7c59a4c835
Yuya Nishihara <yuya@tcha.org>
annotate: do not poorly split lines at CR (issue5798)

mdiff and lines(text) take only LF as a line separator, but str.splitlines()
breaks our assumption. Use mdiff.splitnewlines() consistently.

It's hard to read \r in tests, so \r is replaced with [CR]. I had to wrap
sed by a shell function to silence check-code warning.

(please test the fix)

Comment 8 Hideki EIRAKU 2018-03-09 07:43 UTC

Thanks Yuya, it works now.