[PATCH 1 of 3 V2] util: add method to hash nested combination of python data structures

Fri Jul 17 23:52:25 CDT 2015

On Sat, 18 Jul 2015 02:04:06 +0200, Pierre-Yves David wrote:
> On 07/14/2015 04:15 PM, Yuya Nishihara wrote:
> > On Tue, 14 Jul 2015 12:50:59 +0200, Pierre-Yves David wrote:
> >> On 07/12/2015 03:37 PM, Yuya Nishihara wrote:
> >>> On Sun, 12 Jul 2015 14:27:41 +0100, Pierre-Yves David wrote:
> >>>> What is the slow down involved by the config hashing here? My
> >>>> expectation is that it will nefligible. In all cases, I would be
> >>>> surprise if chg+config-hashing is slower that no-chg.
> >>>
> >>> My point is that the proposed hashing function has little benefit. It's slower
> >>> than __eq__() and we have objects to be compared.
> >>>
> >>> Also, I'm not sure if a hash() value is strong enough for collision because it
> >>> is designed for a hash table where speed is more important than cryptographic
> >>> hashes.
> >>
> >> I do not have a strong opinion on how we compare config, as long as we
> >> do not use loosy heristic like mtime that are know to have issue.
> >>
> >> That said, if hashing is a simple, valid and done way to do __eq__ we
> >> should maybe move forward with it. If this create performance issue, we
> >> can be smarter later.
> >
> > Well, IMHO, x == y is simpler than computehash(x) == computehash(y). That is
> > the reason why I don't think we need to make config hashable.
> 
> but x == y implies that we parse the config to be able to compare it.
> That will be slower than hashing full file.
> (But we do not really care about any of that for a first step)

computehash(x) implies x is parsed. If x is an unparsed str, we can simply
do hash(x) or sha1(x). We don't need a hashable config object.

> >>>> This server lifecycle being controled by its many client seems strange
> >>>> to me. In my opinion, the only responsibility the client should have
> >>>> regarding the server is to get one rolling if it does not already
> >>>> exists. All life cycles decisions (shutdown//reload if config changed,
> >>>> shutdown after timeout) should be done by the server, probably controled
> >>>> by config option.
> >>>
> >>> I won't argue if the server can restart itself. I just don't like the
> >>> following:
> >>>
> >>>    1. server shutdown by itself if config changed
> >>>    2. but client have to start new server
> >>
> >> I'm not sure what is your issue with it. I see this as expressable as:
> >>
> >>    Client start server if none is available.
> >
> > The patches for chg do "client starts new server if it said dirty and
> > would be shut down by itself."
> >
> > https://bitbucket.org/lc2817/chg/commits/483b35203d92d450f7e969b0491e0e991bef6094
> >
> >   1. client start server if none is available
> >   2. talk to server
> >   3. server says it is dirty (and shut down by itself)
> >   4. client start new server assuming that dirty server would go away
> >
> > So the client have to know if the server is dirty or not anyway. We can
> > eliminate (2) and (3) if the client can check if config files are dirty.
> 
> Not that the server says "I'm shutting down" which is broader than 
> "dirty" (so the assumption it is going away it is going away is 
> sensible). As we have to handle the case where the server is shutting 
> down anyway (cf below). Having all the config checking in the server 
> seems simpler as a start.

Even if the server has the responsibility of checking dirty configs, it does
not need hacks [1] to shut down by itself. Because the client have to know
the server is dirty anyway, it can do kill-then-restart the server.

 [1]: these monkey patching are necessary to set the dirty flag before fork()
      https://bitbucket.org/lc2817/chg/commits/fcb582d7bde9
      https://bitbucket.org/lc2817/chg/commits/97e28d236303

Also, it won't work if we want to restart the server when HGPLAIN is changed,
for example.

 1. client: connect
 2. master-server: check dirty
 3. master-server: fork() child-server
 4. master-server: shut down if dirty
 5. child-server: tell client that master-server is shutting down if dirty
 ...
 6. client: send environment variables to child-server

> > Ideally, the server should monitor hgrc files and shut down before client
> > connects, but it won't be easily implemented:
> >
> >    inotify_add_watch(hgrc-files)
> >    while select([socket, hgrc-files...]):
> >        if socket:
> >            fork_to_process_new_connection()
> >        if hgrc-files are dirty:
> >            exit()
> 
> 1) You still have a risk of the server being waken up by the two events.
> 2) We cannot rely on the inotify_add_watch on all OS and in all cases 
> (if too many change happens).
> 
> So we need to reverse the check anyway
> 
>          if hgrc-files are dirty:
>              exit()
>          if socket:
>              fork_to_process_new_connection()
> 
> So we have to handle the case where client connected to the socket but
> the server is going down anyway.

True. In order to run tests with chg, we'll have to handle the race when
connect() and config change occur almost at the same time.