Submitting code (was Re: [PATCH] auto rename...)

Sun Oct 5 18:06:18 CDT 2008

Bill Barry wrote:
> Just to be clear on that, I don't believe Matt was trying to sound rude
> or condescending. These are suggestions to make the patches easier to be
> reviewed, more likely to be reviewed and more likely to eventually be
> included.
Thanks for writing this. I am currently unsure if this feature
is worth working on at all from the feedback I got. I am also
doing this in my spare time, developing the algorithm was real fun,
but integration sounds like a lot of work:

- There are currently 5 bugs in the addremove feature (4 minor, 1 major,
  not counting the speed problem), this makes 5 patches.
- For the speed and name matching, -- it's complex --, let's say it's again
  5 larger and not-so-easy-to-understand patches.

If this "obscure feature" does not need this and no one is willing to go
through this, I will stop right now and scrap it all, but I would have
wanted to know that a little earlier before annoying anybody. So there
is a decision to be made here.

So far to project management. Thanks for looking at the code and
offering help, I wrote some comments to it:

> Btw, some thoughts on the current implementation as is:
> 1. haddremove.py is rather large (esp. the findrenames method)
You are right, it should be split up into smaller functions, but
it still has a different approach. I use the function findrenames
like a pseudo class, so that's why it looks rather long, but isn't.
Simple example:

def xy():

  var1 = ..

  def _locfcn1():
    do something with var1
    ..

  def _locfcn2():
    ..

  _locfcn1()

The local functions share the scope of the parent function,
so you have less clutter than with "self" when using classes, or
when passing-on all variables per arguments. If you have never
programmed in C++ you may have trouble with that style.
Changing variables in the inner scope only works for mutable
types without defining it locally. I don't know if this is
against the coding guidelines.

> 2. levenshtein_distance can be made significantly more efficient by
> skipping over matching chars at the beginning and end of the string,
This function makes only takes only very little of the overall
execution time, so I kept it as simple as possible, no optimization at all.
But I'd like to see the faster version, if you have one.

> also it is defined to be the length of one string if the other is empty
This is exactly what the function does.

> 3. test-addremove-similar contains several whitespace modifications
> which don't need to be there (there should be no reason to change a test
> unless the test has become invalid, otherwise I suppose you could try to
> submit a patch for that).
The test became invalid because it is also tested in the namematching test.
There is no need to test the same thing twice.

> 4. I would assume that by sorting the two lists you could skip the whole
> rest of the inner list once you can figure out that you cannot have a
> better match based on the histograms
This was not explained in the code comments, so I will add it.
The reason why there are 3 lists is, that there may be files that are
best match to more than one file. So you always need all the matches
in case one file looses its best match.

> I would see no problem sharing an MQ repo if you intend to be
> collaborating on a patch.
Yes, of course. Is a repo on bitbucket not better?