[PATCH 1 of 3 V2] util: add method to hash nested combination of python data structures

Tue Jul 14 09:15:05 CDT 2015

On Tue, 14 Jul 2015 12:50:59 +0200, Pierre-Yves David wrote:
> On 07/12/2015 03:37 PM, Yuya Nishihara wrote:
> > On Sun, 12 Jul 2015 14:27:41 +0100, Pierre-Yves David wrote:
> >> On 07/09/2015 03:23 PM, Yuya Nishihara wrote:
> >>> On Wed, 8 Jul 2015 16:55:09 +0000, Laurent Charignon wrote:
> >>>> On Jul 8, 2015, at 6:32 AM, Yuya Nishihara <yuya at tcha.org<mailto:yuya at tcha.org>> wrote:
> >>>> On Tue, 7 Jul 2015 15:21:08 +0000, Laurent Charignon wrote:
> >>>> On Jul 7, 2015, at 5:54 AM, Yuya Nishihara <yuya at tcha.org<mailto:yuya at tcha.org>> wrote:
> >>>> On Mon, 6 Jul 2015 11:42:40 -0700, Laurent Charignon wrote:
> >>>> - We can, but if the config ends up being the same we don't want to restart
> >>>> the command server preemptively.
> >>>>
> >>>> Hmm, because config files are edited by user, I think we can assume that the
> >>>> mtime change denotes the content change in most cases.
> >>>>
> >>>> How about automated deployment through configuration tools?
> >>>
> >>> Does the deployment tool run with chg?
> >>
> >> Goal will be to have everything running with chg.
> >>
> >> I think we should stick to the config hashing for now (or any "always"
> >> right alternative). Using mtime have issues (cf foozy series about
> >> dirstate and mtime). I would rather go with the full and solution
> >> solution first and then see if performance are an issue looking for a
> >> better solution.
> >>
> >> One of the goal here is to use chg with generic test, script and
> >> automatic deployment tool. They will modify config and run hg close to
> >> each other.
> >
> > Oh, I see. If we want to run the tests using chg, mtime isn't enough,
> > especially on ext3 that provides low time resolution.
> >
> >> What is the slow down involved by the config hashing here? My
> >> expectation is that it will nefligible. In all cases, I would be
> >> surprise if chg+config-hashing is slower that no-chg.
> >
> > My point is that the proposed hashing function has little benefit. It's slower
> > than __eq__() and we have objects to be compared.
> >
> > Also, I'm not sure if a hash() value is strong enough for collision because it
> > is designed for a hash table where speed is more important than cryptographic
> > hashes.
> 
> I do not have a strong opinion on how we compare config, as long as we 
> do not use loosy heristic like mtime that are know to have issue.
> 
> That said, if hashing is a simple, valid and done way to do __eq__ we 
> should maybe move forward with it. If this create performance issue, we 
> can be smarter later.

Well, IMHO, x == y is simpler than computehash(x) == computehash(y). That is
the reason why I don't think we need to make config hashable.

> >> This server lifecycle being controled by its many client seems strange
> >> to me. In my opinion, the only responsibility the client should have
> >> regarding the server is to get one rolling if it does not already
> >> exists. All life cycles decisions (shutdown//reload if config changed,
> >> shutdown after timeout) should be done by the server, probably controled
> >> by config option.
> >
> > I won't argue if the server can restart itself. I just don't like the
> > following:
> >
> >   1. server shutdown by itself if config changed
> >   2. but client have to start new server
> 
> I'm not sure what is your issue with it. I see this as expressable as:
> 
>   Client start server if none is available.

The patches for chg do "client starts new server if it said dirty and
would be shut down by itself."

https://bitbucket.org/lc2817/chg/commits/483b35203d92d450f7e969b0491e0e991bef6094

 1. client start server if none is available
 2. talk to server
 3. server says it is dirty (and shut down by itself)
 4. client start new server assuming that dirty server would go away

So the client have to know if the server is dirty or not anyway. We can
eliminate (2) and (3) if the client can check if config files are dirty.

Ideally, the server should monitor hgrc files and shut down before client
connects, but it won't be easily implemented:

  inotify_add_watch(hgrc-files)
  while select([socket, hgrc-files...]):
      if socket:
          fork_to_process_new_connection()
      if hgrc-files are dirty:
          exit()

> >> (on other note, I think have chg available in core would also be good.
> >> Would you be okay with that?)
> >
> > Maybe in contrib? That will allow me to eliminate dirty hacks in chg.
> 
> Contrib would be the obvious start to start with. Once we have the test 
> running with it, we can decide if we wants more. But having it in 
> contrib will the whole idea of test run with chg (also a bit perf with), 
> much more easy for everybody.

Sounds good. Environment variables will be the next headache to run tests
with chg.