Constant naming convention

Tue Jan 10 10:42:29 EST 2017

On 01/02/2017 11:27 PM, Kevin Bullock wrote:
> A few overall notes on this thread, and then some specific clarifications below:
>
> First, it is not the current project policy to defer any decisions until October 2017. It is also true that none of the core team is advocating radical changes to the way we develop Mercurial, either before or after that date.
>
> Second, this topic is a matter of project policy, so we should take it up as the steering committee if we want to take any further action on it.

To me, that part, and a large share of Ryan emails, highlights the fact 
that we need to spend more time at the steering committee level to 
define our decision process on multiple aspect of the project. For now, 
I would rather see energy put in these definition first, before we start 
discussing more practical aspect of the project as the one here.
My general feeling is that we have been putting the card before the hoax 
here.

>> On Dec 28, 2016, at 10:17, Pierre-Yves David <pierre-yves.david at ens-lyon.org> wrote:
>>
>> On 12/15/2016 11:38 PM, Augie Fackler wrote:
>>> (I’m trying to be brief here - hopefully it doesn’t come across as upset, because I’m not - mostly I was blindsided by a policy claim that I don’t remember.)
>>>
>>>> On Dec 15, 2016, at 3:40 PM, Pierre-Yves David <pierre-yves.david at ens-lyon.org> wrote:
>>>>
>>>> So, to sum up, my stance is "Let us not revisit decision made by Matt for a while". Of course they might be case were we could make an exception if it is really worth it. Code style does not seems to be high profile enough for that (and is a lot of work to adjust).
>>>
>>> In broad strokes, I agree that we should make an effort to not cause a wave of sea change. My perception of this particular issue is that:
>>>
>>> 1) The codebase is already somewhat inconsistent (despite the efforts of mpm and others) when it comes to constant naming
>>
>> My perception (and apparently Kevin's too) is that the codebase is currently consistent¹. A very quick check of the current code seems to confirm it is consistent (but that was a quick check). So, I think you need to back this statement with stronger fact than just your perception.
>
> No, I said there is a _prevailing_ style, not that we have consistency. Augie was not the only one to draw the conclusion that we're not consistent today. There are sufficient exceptions to question the strength of the rule. Thank you for digging up the applicable section in CodingStyle, though. It seems the convention is written down after all.

There seems to be wording confusion here. So lets try to lift some of 
it. Over the course of this thread, I said the described the code base as:

>> consistent for the very vast majority.

and

>> currently consistent¹
>> […]
>> [1] sure there is a handful of exception.

This seemed compatible with your statement that we had a:

> prevailing (though by no means universal) style

You seems to disagree with that compatibility, sorry for misinterpreting 
you.

That said, the question is still open of how consistent the current 
code-base is. In a previous email, I've asked for the people suggesting 
the changes to provide data to justify it. But so far I've only seen two 
peoples expressing feeling that the code base was too inconsistent. 
Feelings are great to bootstrap questions and start leads to potential 
improvements. However I don't think it is reasonable to take actions 
without gathering proper data first. So far, I've not seen actual data. 
Only well known anecdotal exception that don't provide serious 
information in themselves.

So, I went to gather basic data about the actual situation. I used basic 
grepping of assignment without any indentation in mercurial/ and hgext/ 
pythons file. This isn't perfect but that gives some basic data. I've 
filtered out vendor-ed code, many of the re-assignment from other 
modules and code directly tied to a specific OS API. Assignment of both 
type can contains '_', (especially leading '_'). Here are my finding:

upper case assignment:
----------------------

Only 4 modules or packages use "many" upper case constants. 3 is core, 1 
in hgext (various parts of 'convert'). Adding the one with 1 or two 
instances, we got up to 10 modules/package with upper case constants, 
(about 40 assignments total). Most of it is very old code (probably 
prior Matt started enforcing current style) and a few are recent slip in 
style-review.

Detailed data below:

   mercurial/revlog.py (10 cases)
   mercurial/hgweb/ (9 cases)
   mercurial/httpconnection.py 1 case
   mercurial/util.py 2 case
   mercurial/graphmod.py (5 spots)
   hgext/factotum.py 1 cases
   hgext/largefiles/ 1 cases
   hgext/highlight/ 1 cases
   hgext/mq.py 1 case
   hgext/convert/ (8 case)

Lower case assignment:
----------------------

Full check on lower case assignments is harder because there is a large 
quantity of them. I've filtered out thing that standed out as vendored 
code and the vast majority of local assignments from other modules (eg 
'pickle = util.pickle'). The result seems mostly composed of actual 
constant assignment for scanning through it.

That filtering leaves about 500 instances of lower case assignments, 
spread over 120 files.

Conclusion
----------

Lets drop about 20% more of the lowercase assignments and files to take 
in account the imperfect filtering. This still lives use with a 1 vs 10 
ration in favor of lower case for both "assignment count" and "module". 
This is a bit higher than what I expected but still show "upper case" as 
rare occurrence (<10%, handful of modules).

So from my point of view, the occurrence is rare-enough and limited to 
few-enough module to consider the currently documented style (lower for 
all) as vastly prevailing and the code base as overall consistent.
(I'm of course okay with hearing other conclusion based on these data 
(or better data)).

It is also interesting to note that use of upper case is concentrated in 
some key module (eg: revlog, hgweb). That might play a key part in the 
strong contrast in consistency perception between the feeling of the two 
people who said the code to be too inconsistent and my feeling that is 
was overall consistent.

Of course, The extraction I did is fairly basic. For example it does not 
takes in accounts indented, but still module level constants. In 
addition, the data about lower case assignment can be refined further. 
But I already spent more time digging this that I was willing to, so I 
did not went further. If someone else wants to keeps digging, feels free 
to do it.

> […]

Cheers,

-- 
Pierre-Yves David