[PATCH 1 of 3 V2] util: add method to hash nested combination of python data structures

Fri Jul 17 19:04:06 CDT 2015

On 07/14/2015 04:15 PM, Yuya Nishihara wrote:
> On Tue, 14 Jul 2015 12:50:59 +0200, Pierre-Yves David wrote:
>> On 07/12/2015 03:37 PM, Yuya Nishihara wrote:
>>> On Sun, 12 Jul 2015 14:27:41 +0100, Pierre-Yves David wrote:
>>>> On 07/09/2015 03:23 PM, Yuya Nishihara wrote:
>>>>> On Wed, 8 Jul 2015 16:55:09 +0000, Laurent Charignon wrote:
>>>>>> On Jul 8, 2015, at 6:32 AM, Yuya Nishihara <yuya at tcha.org<mailto:yuya at tcha.org>> wrote:
>>>>>> On Tue, 7 Jul 2015 15:21:08 +0000, Laurent Charignon wrote:
>>>>>> On Jul 7, 2015, at 5:54 AM, Yuya Nishihara <yuya at tcha.org<mailto:yuya at tcha.org>> wrote:
>>>>>> On Mon, 6 Jul 2015 11:42:40 -0700, Laurent Charignon wrote:
>>>>>> - We can, but if the config ends up being the same we don't want to restart
>>>>>> the command server preemptively.
>>>>>>
>>>>>> Hmm, because config files are edited by user, I think we can assume that the
>>>>>> mtime change denotes the content change in most cases.
>>>>>>
>>>>>> How about automated deployment through configuration tools?
>>>>>
>>>>> Does the deployment tool run with chg?
>>>>
>>>> Goal will be to have everything running with chg.
>>>>
>>>> I think we should stick to the config hashing for now (or any "always"
>>>> right alternative). Using mtime have issues (cf foozy series about
>>>> dirstate and mtime). I would rather go with the full and solution
>>>> solution first and then see if performance are an issue looking for a
>>>> better solution.
>>>>
>>>> One of the goal here is to use chg with generic test, script and
>>>> automatic deployment tool. They will modify config and run hg close to
>>>> each other.
>>>
>>> Oh, I see. If we want to run the tests using chg, mtime isn't enough,
>>> especially on ext3 that provides low time resolution.
>>>
>>>> What is the slow down involved by the config hashing here? My
>>>> expectation is that it will nefligible. In all cases, I would be
>>>> surprise if chg+config-hashing is slower that no-chg.
>>>
>>> My point is that the proposed hashing function has little benefit. It's slower
>>> than __eq__() and we have objects to be compared.
>>>
>>> Also, I'm not sure if a hash() value is strong enough for collision because it
>>> is designed for a hash table where speed is more important than cryptographic
>>> hashes.
>>
>> I do not have a strong opinion on how we compare config, as long as we
>> do not use loosy heristic like mtime that are know to have issue.
>>
>> That said, if hashing is a simple, valid and done way to do __eq__ we
>> should maybe move forward with it. If this create performance issue, we
>> can be smarter later.
>
> Well, IMHO, x == y is simpler than computehash(x) == computehash(y). That is
> the reason why I don't think we need to make config hashable.

but x == y implies that we parse the config to be able to compare it. 
That will be slower than hashing full file.
(But we do not really care about any of that for a first step)

>
>>>> This server lifecycle being controled by its many client seems strange
>>>> to me. In my opinion, the only responsibility the client should have
>>>> regarding the server is to get one rolling if it does not already
>>>> exists. All life cycles decisions (shutdown//reload if config changed,
>>>> shutdown after timeout) should be done by the server, probably controled
>>>> by config option.
>>>
>>> I won't argue if the server can restart itself. I just don't like the
>>> following:
>>>
>>>    1. server shutdown by itself if config changed
>>>    2. but client have to start new server
>>
>> I'm not sure what is your issue with it. I see this as expressable as:
>>
>>    Client start server if none is available.
>
> The patches for chg do "client starts new server if it said dirty and
> would be shut down by itself."
>
> https://bitbucket.org/lc2817/chg/commits/483b35203d92d450f7e969b0491e0e991bef6094
>
>   1. client start server if none is available
>   2. talk to server
>   3. server says it is dirty (and shut down by itself)
>   4. client start new server assuming that dirty server would go away
>
> So the client have to know if the server is dirty or not anyway. We can
> eliminate (2) and (3) if the client can check if config files are dirty.

Not that the server says "I'm shutting down" which is broader than 
"dirty" (so the assumption it is going away it is going away is 
sensible). As we have to handle the case where the server is shutting 
down anyway (cf below). Having all the config checking in the server 
seems simpler as a start.

> Ideally, the server should monitor hgrc files and shut down before client
> connects, but it won't be easily implemented:
>
>    inotify_add_watch(hgrc-files)
>    while select([socket, hgrc-files...]):
>        if socket:
>            fork_to_process_new_connection()
>        if hgrc-files are dirty:
>            exit()

1) You still have a risk of the server being waken up by the two events.
2) We cannot rely on the inotify_add_watch on all OS and in all cases 
(if too many change happens).

So we need to reverse the check anyway

         if hgrc-files are dirty:
             exit()
         if socket:
             fork_to_process_new_connection()

So we have to handle the case where client connected to the socket but 
the server is going down anyway.

-- 
Pierre-Yves David