Second version of my diff.tab color extension patches

Martin Geisler martin at geisler.net
Mon Aug 25 14:11:28 CDT 2014


Jordi Gutiérrez Hermoso <jordigh at octave.org> writes:

> On Mon, 2014-08-25 at 11:16 +0200, Martin Geisler wrote:
>> Jordi Gutiérrez Hermoso <jordigh at octave.org> writes:
>
>> > I don't know what alternatives there are, nor do I think we can do
>> > much better than a regexp. Fundamentally, I think we have to split
>> > up a line into tab and non-tab blocks. I don't think this can be
>> > avoided.
>> 
>> I think you can just split on a single '\t'. You'll get a list of
>> strings -- empty strings when there are adjacent tabs. So you iterate
>> over that list and output a tab before every string, except for the
>> first. Like this:
>> 
>>   for i, token in enumerate(stripline.split('\t')):
>>       if i > 0:
>>           yield ('\t', 'diff.tab')
>>       yield (token, label)
>> 
>> Unlike your solution, you'll end up with multiple yields when there are
>> adjacent tabs.
>
> I'm not sure what the "i > 0" block is checking for. This seems like a
> bug. It's not like at this point I'm already sure that the line starts
> with tabs.

The conditional is there to make sure that we only yield tabs *between*
the non-tab tokens.

> At any rate, there's a deeper problem with this. You will get a single
> empty string no matter how many adjacent tabs there are, i.e. you lose
> some information in the split.

No, that's only how str.split works when you don't give it an argument
and it is splitting on whitespace:

   >>> 'foo \t bar'.split()
   ['foo', 'bar']
   >>> 'foo\t\t\tbar'.split('\t')
   ['foo', '', '', 'bar']

> I also wonder what could str.split possibly be doing that could be
> faster than a compiled regexp.

Well, they both have to scan the string. With the non-failing regexp
that shouldn't introduce any backtracking, I would expect them to have
the same overall time-complexity.

-- 
Martin Geisler

http://google.com/+MartinGeisler
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20140825/8d30afe1/attachment.pgp>


More information about the Mercurial-devel mailing list