[PATCH 0 of 5] Patches and new win32mbcs extension

Shun-ichi Goto shunichi.goto at gmail.com
Wed Jan 9 13:44:01 UTC 2008


These patches are request to fix for mercurial core code to cooperate
with new win32mbcs extension to handle MBCS filenames correctly on
windows.

Describing a problem around MBCS issue is omitted here.
See description if previous patchbomb mail if need.

  Subject: [PATCH 0 of 5] Fix to handle MBCS filename correctly
  Message-Id: <patchbomb.1199622371 at yomi>
  Date: Sun, 06 Jan 2008 21:26:11 +0900


There are 4 patches and one new extension:
# The extension code is posted for review.
# I'll put it on wiki page later.

 1) (first 4 patches)
    Remove/alternate codes using os.sep to use existing/new functions.
    These change is intended to allow to be hooked by win32mbcs extension.

    For example:
       s.replace('\\', '/') => util.normpath(s)  ... use existing function
       s.split(os.sep) => util.splitpath(s)      ... use new function
       s.endswith(os.sep) => util.endswithsep(s) ... use new function
       do not use rfindall(os.sep)               ... change code

    These are almost same with previous patch I sent except:
     * changed commit description and function doc string as
       suggested from Matt.
     * fix bug in patch against util.path_auditor().


 2) (last patch is new extension, for review)
    Introduce a new extension called 'win32mbcs.py'.
    This extension wraps some python built-in functions (os.path.*, etc.)
    and mercurail function (util.xxx) to handle raw encoded MBCS
    string. By enabling this extension, wrapper is installed and activated.
    This is usefull for:
      * Japanese Windows user using shift_jis encoding.
      * (maybe) Chinese Windows user using big5 encoding.
    There's no mean for Unix users.
    
    This extension assumes the path strings are encoded by
    util._encoding as local file system encoding. But it checks passed
    argument is exactly encoded to util._encoding, then call original
    function with converting arguments to unicode and re-encoding
    return value.  If the string is encoded by other encoding, warn to
    user and call original function without any conversion.  If string
    is unicode, simply call original.

    By this extension, some important functions (os.path.*, util.*)
    are altered to own spec, but I belive it is safe and behave as
    usual.

Opinions are welcome.


More information about the Mercurial-devel mailing list