[PATCH 03 of 15 v2] dirstate: create class for status lists

Mon Oct 6 13:20:07 CDT 2014

On 10/05/2014 08:08 AM, Martin von Zweigbergk wrote:
> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz at gmail.com>
> # Date 1412480848 25200
> #      Sat Oct 04 20:47:28 2014 -0700
> # Node ID 925b7bbb3a5444a6cf67052d80bff483d5c19a4b
> # Parent  065114309f6206ec11a48dd088c287211467620b
> dirstate: create class for status lists
>
> Callers of various status() methods (on dirstate, context, repo) get a
> tuple of 7 elements, where each element is a list of files. This
> results in lots of uses of indexes where names would be much more
> readable. For example, "status.ignored" seems clearer than "status[4]"
> [1]. So, let's introduce a simple named tuple containing the 7 status
> fields: modified, added, removed, deleted, unknown, ignored, clean.
>
> This patch introduces the class and updates the status methods to
> return instances of it. Later patches will update the callers.
>
>   [1] Did you even notice that it should have been "status[5]"?
>
> diff --git a/hgext/largefiles/overrides.py b/hgext/largefiles/overrides.py
> --- a/hgext/largefiles/overrides.py
> +++ b/hgext/largefiles/overrides.py
> @@ -11,6 +11,7 @@
>   import os
>   import copy
>   
> +from mercurial import dirstate
>   from mercurial import hg, commands, util, cmdutil, scmutil, match as match_, \
>           archival, pathutil, revset
>   from mercurial.i18n import _
> @@ -1149,7 +1150,8 @@
>           modified, added, removed, deleted, unknown, ignored, clean = r
>           unknown = [f for f in unknown if lfdirstate[f] == '?']
>           ignored = [f for f in ignored if lfdirstate[f] == '?']
> -        return modified, added, removed, deleted, unknown, ignored, clean
> +        return dirstate.status(modified, added, removed, deleted, unknown,
> +                               ignored, clean)
>       repo.status = overridestatus
>       orig(ui, repo, *dirs, **opts)
>       repo.status = oldstatus
> diff --git a/hgext/largefiles/reposetup.py b/hgext/largefiles/reposetup.py
> --- a/hgext/largefiles/reposetup.py
> +++ b/hgext/largefiles/reposetup.py
> @@ -10,6 +10,7 @@
>   import copy
>   import os
>   
> +from mercurial import dirstate
>   from mercurial import error, manifest, match as match_, util
>   from mercurial.i18n import _
>   from mercurial import localrepo
> @@ -242,7 +243,7 @@
>                       wlock.release()
>   
>               self.lfstatus = True
> -            return result
> +            return dirstate.status(*result)
>   
>           # As part of committing, copy all of the largefiles into the
>           # cache.
> diff --git a/mercurial/context.py b/mercurial/context.py
> --- a/mercurial/context.py
> +++ b/mercurial/context.py
> @@ -7,6 +7,7 @@
>   
>   from node import nullid, nullrev, short, hex, bin
>   from i18n import _
> +import dirstate
>   import mdiff, error, util, scmutil, subrepo, patch, encoding, phases
>   import match as matchmod
>   import os, errno, stat
> @@ -340,8 +341,7 @@
>           for l in r:
>               l.sort()
>   
> -        # we return a tuple to signify that this list isn't changing
> -        return tuple(r)
> +        return dirstate.status(*r)
>   
>   
>   def makememctx(repo, parents, text, user, date, branch, files, store,
> @@ -1482,7 +1482,7 @@
>           # (s[1] is 'added' and s[2] is 'removed')
>           s = list(s)
>           s[1], s[2] = s[2], s[1]
> -        return tuple(s)
> +        return dirstate.status(*s)
>   
>   class committablefilectx(basefilectx):
>       """A committablefilectx provides common functionality for a file context
> diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> --- a/mercurial/dirstate.py
> +++ b/mercurial/dirstate.py
> @@ -26,6 +26,49 @@
>       def join(self, obj, fname):
>           return obj._join(fname)
>   
> +class status(tuple):
> +    '''Named tuple with a list of files per status.
> +    '''

Hmm. There are several things I would like to point out ... but they 
seem to all be inherited from the Python namedtuple. It might make sense 
to stay as similar as possible but slightly different.

Why not use a generic named tuple implementation ... especially on 
Python 2.6+ where it is built-in? There might be good reasons, but they 
should documented.

I guess one reason is that the standard implementation creates the class 
as source code and execs it. -1 for elegance, +1 for getting the work done.

Another reason is that this implementation defaults to having 
unspecified parameters as empty lists ... but is that really relevant / 
necessary?

> +
> +    __slots__ = ()
> +
> +    def __new__(cls, modified=[], added=[], removed=[], deleted=[], unknown=[],
> +                ignored=[], clean=[]):

http://docs.python-guide.org/en/latest/writing/gotchas/#mutable-default-arguments 
- this is the only significant issue I see with this series.

> +        return tuple.__new__(cls, (modified, added, removed, deleted, unknown,
> +                ignored, clean))
> +
> +    @property
> +    def modified(self):
> +        return self[0]

I think I would prefer to store the members by name (in slots?) and 
avoid doing the array index lookup. That would make it more like an 
object that behaved like a tuple than the opposite.

But these status objects should however never be created in tight loops 
anyway, so these (and other) optimizations might be premature.

If we really want to optimize, it could perhaps make sense to use our 
propertycache ... but I don't know where on this slippery lane we want 
to stop.

> +
> +    @property
> +    def added(self):
> +        return self[1]
> +
> +    @property
> +    def removed(self):
> +        return self[2]
> +
> +    @property
> +    def deleted(self):
> +        return self[3]
> +
> +    @property
> +    def unknown(self):
> +        return self[4]
> +
> +    @property
> +    def ignored(self):
> +        return self[5]
> +
> +    @property
> +    def clean(self):
> +        return self[6]
> +
> +    def __repr__(self, *args, **kwargs):
> +        return (('status(modified=%r, added=%r, removed=%r, deleted=%r, ' +
> +                 'unknown=%r, ignored=%r, clean=%r)') % self)

repr usually looks like '<status yada yada>'  - I would expect this to 
follow the same pattern. But ok, this is how Python does it ...

/Mads