I found an incorrect output from hg annotate when I wrote a shell script with multiple CR in a line, for example: ------------------------------------------------------------ echo -n Wait sleep 1 echo -n '^M ^M' # Note: ^M is CR ('\r') ------------------------------------------------------------ The following bash script reproduces the bug: ------------------------------------------------------------ hg init echo -e 'line0 \rtest\rtest' > a echo line2 >> a echo line3 >> a hg add a hg commit -m a -u a hg annotate a | sed -n l ------------------------------------------------------------ The sed command is for looking at the CR in the output. The output is: ------------------------------------------------------------ 0: line0 \r0: test\r0: test$ ------------------------------------------------------------ The file a contains 3 lines, but the 'hg annotate a' outputs only the first line. The line2 and line3 are gone. It seems that the annotate command shows revision number 0 at every CR. After the above command, the following command creates a next commit: ------------------------------------------------------------ echo -e 'line0 \rtest\rtest' > a echo line1 >> a echo line2 >> a echo line3 >> a hg commit -m b -u a hg annotate a | sed -n l ------------------------------------------------------------ The command inserts the line1 between line0 and line2. The output is: ------------------------------------------------------------ 0: line0 \r1: test\r0: test$ 0: line1$ ------------------------------------------------------------ Interestingly, the 'hg annotate a' outputs revision 1 at the first CR, and the new line1 as revision 0. The line2 and line3 are still gone. I guess that a revision number for each line is searched based on the LF newline, but the annotate counts lines based on not only LF but CR as a newline. Both should use the same newline code to avoid such broken output. If both uses LF only, the expected output is: ------------------------------------------------------------ 0: line0 \rtest\rtest$ 1: line1$ 0: line2$ 0: line3$ ------------------------------------------------------------ If both uses CR and LF for newline, the expected output is: ------------------------------------------------------------ 0: line0 \r0: test\r0: test$ 1: line1$ 0: line2$ 0: line3$ ------------------------------------------------------------
Wild guess, this would be caused by str.splitlines() vs mdiff.splitlines() or str.split('\n').
I was going through this issue and found that the problem lies in the definition of function `lines()` in `annotate.py` under the `annotate` function. The problem is, it's not considering Carriage Returns as new lines. For Example: `'line0 \rtest\rtest'` actually is three lines, but the function counts it as 1 line! I have corrected this behaviour, by modifying the function to consider CR also. Shall I send a patch?
I am writing the tests, and will send a patch asap!
> I have corrected this behaviour, by modifying the function to consider CR also. That might fix the missing output lines, but the output would be still incorrect because the diff is computed against lines split by LF. > Shall I send a patch? I've already sent a patch on Wednesday, so no need to hurry.
(In reply to Yuya Nishihara from comment #4) Oh! Thanks for letting me know. Can you tell how you solved this issue? A link to the patch would be fine.
Just s/x.splitlines()/mdiff.splitnewlines(x)/. http://mercurial.markmail.org/message/zrznoj76wgsnhn6x
Fixed by https://mercurial-scm.org/repo/hg/rev/0a7c59a4c835 Yuya Nishihara <yuya@tcha.org> annotate: do not poorly split lines at CR (issue5798) mdiff and lines(text) take only LF as a line separator, but str.splitlines() breaks our assumption. Use mdiff.splitnewlines() consistently. It's hard to read \r in tests, so \r is replaced with [CR]. I had to wrap sed by a shell function to silence check-code warning. (please test the fix)
Thanks Yuya, it works now.