D7550: chg: fix chg to work with py3.7+ "coercing" the locale

spectral (Kyle Lippincott) phabricator at mercurial-scm.org
Thu Dec 5 23:46:31 UTC 2019


spectral created this revision.
Herald added subscribers: mercurial-devel, mjpieters.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  When the environment is empty (specifically: it doesn't contain LC_ALL,
  LC_CTYPE, or LANG), Python will "coerce" the locale environment variables to be
  a UTF-8 capable one. It sets LC_CTYPE in the environment, and this breaks chg,
  since chg operates by:
  
  - start hg, using whatever environment the user has when chg starts
  - hg stores a hash of this "original" environment, but python has already set LC_CTYPE even though the user doesn't have it in their environment
  - chg calls setenv over the commandserver. This clears the environment inside of hg and sets it to be exactly what the environment in chg is (without LC_CTYPE).
  - chg calls validate to ensure that the environment hg is using (after the setenv call) is the one that the chg process has - if not, it is assumed the user changed their environment and we should use a different server. This will *never* be true in this situation because LC_CTYPE was removed.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D7550

AFFECTED FILES
  mercurial/chgserver.py
  tests/test-chg.t

CHANGE DETAILS

diff --git a/tests/test-chg.t b/tests/test-chg.t
--- a/tests/test-chg.t
+++ b/tests/test-chg.t
@@ -329,3 +329,25 @@
   YYYY/MM/DD HH:MM:SS (PID)> loaded repo into cache: $TESTTMP/cached2 (in  ...s)
   YYYY/MM/DD HH:MM:SS (PID)> log -R cached
   YYYY/MM/DD HH:MM:SS (PID)> loaded repo into cache: $TESTTMP/cached (in  ...s)
+
+Test that chg works even when python "coerces" the locale (py3.7+, which is done
+by default if none of LC_ALL, LC_CTYPE, or LANG are set in the environment)
+
+  $ cat > $TESTTMP/debugenv.py <<EOF
+  > from mercurial import encoding
+  > from mercurial import registrar
+  > cmdtable = {}
+  > command = registrar.command(cmdtable)
+  > @command(b'debugenv', [], b'', norepo=True)
+  > def debugenv(ui):
+  >     for k in [b'LC_ALL', b'LC_CTYPE', b'LANG']:
+  >         v = encoding.environ.get(k)
+  >         if v is not None:
+  >             ui.write(b'%s=%s\n' % (k, encoding.environ[k]))
+  > EOF
+  $ LANG= LC_ALL= LC_CTYPE= chg \
+  >    --config extensions.debugenv=$TESTTMP/debugenv.py debugenv
+  LC_ALL=
+  LC_CTYPE=C.UTF-8 (py37 !)
+  LC_CTYPE= (no-py37 !)
+  LANG=
diff --git a/mercurial/chgserver.py b/mercurial/chgserver.py
--- a/mercurial/chgserver.py
+++ b/mercurial/chgserver.py
@@ -549,6 +549,38 @@
         except ValueError:
             raise ValueError(b'unexpected value in setenv request')
         self.ui.log(b'chgserver', b'setenv: %r\n', sorted(newenv.keys()))
+
+        # Python3 has some logic to "coerce" the C locale to a UTF-8 capable
+        # one, and it sets LC_CTYPE in the environment to C.UTF-8 if none of
+        # 'LC_CTYPE', 'LC_ALL' or 'LANG' are set (to any value). This can be
+        # disabled with PYTHONCOERCECLOCALE=0 in the environment.
+        #
+        # When fromui is called via _inithashstate, python has already set
+        # this, so that's in the environment right when we start up the hg
+        # process. Then chg will call us and tell us to set the environment to
+        # the one it has; this might NOT have LC_CTYPE, so we'll need to
+        # carry-forward the LC_CTYPE that was coerced in these situations.
+        #
+        # If this is not handled, we will fail config+env validation and fail
+        # to start chg. If this is just ignored instead of carried forward, we
+        # may have different behavior between chg and non-chg.
+        if pycompat.ispy3:
+            # Rename for wordwrapping purposes
+            oldenv = encoding.environ
+            if not any(e.get(b'PYTHONCOERCECLOCALE') == b'0' for e in [oldenv,
+                                                                       newenv]):
+                keys = [b'LC_CTYPE', b'LC_ALL', b'LANG']
+                old_keys = [k for k, v in oldenv.items() if k in keys and v]
+                new_keys = [k for k, v in newenv.items() if k in keys and v]
+                # If the user's environment (from chg) doesn't have ANY of the
+                # keys that python looks for, and the environment (from
+                # initialization) has ONLY LC_CTYPE and it's set to C.UTF-8,
+                # carry it forward.
+                if (not new_keys and
+                    old_keys == [b'LC_CTYPE'] and
+                    oldenv[b'LC_CTYPE'] == b'C.UTF-8'):
+                    newenv[b'LC_CTYPE'] = oldenv[b'LC_CTYPE']
+
         encoding.environ.clear()
         encoding.environ.update(newenv)
 



To: spectral, #hg-reviewers
Cc: mjpieters, mercurial-devel


More information about the Mercurial-devel mailing list