Bug 3600 - "hg archive" creates "zip" archives with unexpected timestamp for users in non-zero offset timezone areas
Summary: "hg archive" creates "zip" archives with unexpected timestamp for users in no...
Status: RESOLVED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: Mercurial (show other bugs)
Version: unspecified
Hardware: All All
: normal feature
Assignee: Bugzilla
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-08-27 07:22 UTC by FUJIWARA Katsunori
Modified: 2012-10-19 14:25 UTC (History)
5 users (show)

See Also:
Python Version: ---


Attachments
Adding extended timestamp extra field to solve (1.09 KB, patch)
2012-09-09 12:04 UTC, Jun Omae
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description FUJIWARA Katsunori 2012-08-27 07:22 UTC
"hg archive" creates archive files with timestamp in GMT.

"tar" archives seem to be extracted by "tar" command with local
timezone offset of the environment where archive files are
EXTRACTED.

But "zip" archives seem to be extracted without local timezone
offset: at least, "unzip" on Linux and Explorer on Windows 7
extract in such manner.

For example, the zip file archiving the changeset commited at
"2012-08-27 19:44 +0900" creates files with "2012-08-27 10:44"
timestamp.

In one of the most usual usecase, commiting changesets, archiving
files in one of them and extracting archived files are done in
the same timezone area. So, files with GMT timestamp look strange
in such case.

Of course, each actions (commiting/archiving/extracting) may be
done in different timezone areas. But in such case, many of users
seem not to mind about timestamp of extracted files.

So, "zip" archives should be created with not GMT but localtime
of the environment where archive files are CREATED.

Or, should they be choosable as archive type ?

- "zip" for localtime, and "gmtzip" for GMT, or
- "zip" for GMT, and "ltzip" for localtime
Comment 1 kiilerix 2012-08-27 09:07 UTC
A slightly different perspective on this story:

The zip format do not have any awareness of timezones in its timestamps. The timezone must thus somehow be handled 'manually' when creating or extracting zip files. The traditional wisdom has been to use the local timezone in both cases.

The timezone setting on modern systems is however only a matter of how timestamps should be displayed locally. Timers and timestamps "always" uses UTC. Using the local timezone setting when creating archives is not really an option - especially not in a VCS where we try hard to track data correctly and consistently and make them reproducible.

The right way to handle zip archives is thus to use the UTC timezone when creating or extracting zips. Use something like "TZ=UTC unzip foo.zip" on unix.

... BUT that is a bit unfortunately that we have to claim that Mercurial is the only tool that use the zip format correctly :-(
Comment 2 FUJIWARA Katsunori 2012-08-29 03:19 UTC
(In reply to comment #1)

Thank you for your comments, kiilerix.

Please let me confirm about cause of this problem:

- "tar" extracting implementations ignore TIMEZONE information,
  so extracted files has timestamp in GMT: and "ls -l" shows it
  with local timezone offset, so users can see appropriate
  datetime

- some (or many ?) "zip" extracting implementations cares about
  local timezone offset INCORRECTLY, so extracted files has
  timestamp not in GMT (maybe not in localtime, too): this causes
  wrong datetime of extracted files.

So, "zip" extracting implementations seem to be responsible to
this timestamp problem.

But for many of users extracting "zip" archives created by "hg
archive", Mercurial seems to be responsible to this timestamp
problem, even though "zip" extracting implementations should be
so in fact.

In addition to it, if users understood that "zip" should be
extracted with "TZ=UTC", there is no easy way to specify "TZ=UTC"
for extracting from Explorer on Windows, isn't it ?

So, what about adding new "lt.zip" archive type to create "zip"
archives with timestamp not in GMT but in localtime ?

This seems to be suitable for one of the most usual usecases:
committing/archiving/extracting in same timezone offset area.
Comment 3 kiilerix 2012-08-29 05:55 UTC
(In reply to comment #2)
> - "tar" extracting implementations ignore TIMEZONE information,
>   so extracted files has timestamp in GMT: and "ls -l" shows it
>   with local timezone offset, so users can see appropriate
>   datetime

I think it is important to get this right: tar like most filesystems and systems stores timestamps in UTC, not "ignoring" the timezone as a kind of error but acknowledging that timestamps must be stored in a globally unambigious way and that the local timezone doesn't matter for this purpose.

(btw: Mercurial commits will in addition to this timestamp also keep a record of which timezone the user was using.)

The timestamp situation might be more complex on Windows.

And yes, it is a fact that zip just is different and doesn't have the means to do the right thing. Mercurial might have to adapt to that somehow.
Comment 4 Matt Mackall 2012-08-29 16:17 UTC
I'm going to set this to WONTFIX.

As it happens, unzip(1) on Unix systems does the right thing today. So any change we make here to fix Windows by default will break Unix. That would be a regression, which would make this change a net step backwards.

Also note that there are almost certainly some set of Windows archive utilities that also get this right. Judging from web searches, Winzip seems to be one of them, 7-zip seems to not be.

Which leaves us with adding some form of command line option that will probably create more confusion than it solves ('this option will help only if you are creating zip files that are only going to be extracted on Windows only in the same time zone you're currently in only with broken extractors').
Comment 5 kiilerix 2012-08-29 19:15 UTC
My testing showed that unzip(1) didn't do the right thing.

Further testing shows that we both are right.

For hg archives the timestamp do depend on the timezone:

  $ hg archive hg-x.zip
  $ rm -rf x; TZ=UTC unzip -q hg-x.zip ; ls -l x/foo 
  -rw-r--r--. 1 mk mk 4 Aug 30 00:44 x/foo
  $ rm -rf x; TZ=UTC-1 unzip -q hg-x.zip ; ls -l x/foo 
  -rw-r--r--. 1 mk mk 4 Aug 29 23:44 x/foo

For zip(1) archives the timezone doesn't matter:
  $ zip -rq zip-x.zip x
  $ rm -rf x; TZ=UTC unzip -q zip-x.zip ; ls -l x/foo 
  -rw-r--r--. 1 mk mk 4 Aug 29 23:44 x/foo
  $ rm -rf x; TZ=UTC-1 unzip -q zip-x.zip ; ls -l x/foo 
  -rw-r--r--. 1 mk mk 4 Aug 29 23:44 x/foo

But it do depend on the timezone if we don't use extra file attributes:
  $ zip -rqX zip-x.zip x
  $ rm -rf x; TZ=UTC unzip -q zip-x.zip ; ls -l x/foo 
  -rw-r--r--. 1 mk mk 4 Aug 30  2012 x/foo
  $ rm -rf x; TZ=UTC-1 unzip -q zip-x.zip ; ls -l x/foo 
  -rw-r--r--. 1 mk mk 4 Aug 30 00:44 x/foo
  $ rm -rf x; TZ=UTC-3 unzip -q zip-x.zip ; ls -l x/foo 
  -rw-r--r--. 1 mk mk 4 Aug 29 22:44 x/foo

It seems like the situation could be improved by somehow using extra file attributes when Mercurial creates zips.
Comment 6 FUJIWARA Katsunori 2012-08-30 06:13 UTC
I can also confirm that zip archive file with extended attribute
(created by zip on Unix) can be extracted with expected timestamp
by unzip on Unix and Explorer on Windows.

So, adding extended attribute seems to resolve this problem.

But according to my quick looking at Python zipfile module
source (of Python 2.6), there is no way to add/record extended
file attribute to zip archive file in it.

Do I just overlook ?
Comment 7 kiilerix 2012-08-30 06:35 UTC
(In reply to comment #6)
> But according to my quick looking at Python zipfile module
> source (of Python 2.6), there is no way to add/record extended
> file attribute to zip archive file in it.

I think you are right. It should probably be investigated/reported/discussed upstream. There might be a workaround or monkey patch that could make it work and make it feasible to fix this issue.
Comment 8 Matt Mackall 2012-08-30 13:49 UTC
Reopening.
Comment 9 Jun Omae 2012-09-09 12:04 UTC
Created attachment 1693 [details]
Adding extended timestamp extra field to solve
Comment 10 Jun Omae 2012-09-09 12:05 UTC
(In reply to comment #6)
Comment 11 Patrick Mézard 2012-09-09 12:16 UTC
@Jun: could you post you patch on mercurial-devel mailing list using patchbomb extension? Patches cannot be reviewed on the bug tracker

  http://mercurial.selenic.com/wiki/ContributingChanges
Comment 12 Jun Omae 2012-09-09 12:19 UTC
(In reply to comment #6)
> I can also confirm that zip archive file with extended attribute
> (created by zip on Unix) can be extracted with expected timestamp
> by unzip on Unix and Explorer on Windows.

We can use `ZipInfo.extra` to create the attribute.
http://docs.python.org/library/zipfile.html#zipfile.ZipInfo.extra

I think that the attribute is "Extended Timestamp Extra Field".
http://www.opensource.apple.com/source/zip/zip-6/unzip/unzip/proginfo/extra.fld

I confirmed the attached patch, http://bz.selenic.com/attachment.cgi?id=1693, works well to me with UnZip 5.52 on CentOS 5 and 7-zip 9.20 on Windows XP.

BTW,
> Add an attachment (do not attach patches, please!)
Sorry about attaching the patch....
Comment 13 HG Bot 2012-09-24 18:11 UTC
Fixed by http://selenic.com/repo/hg/rev/133d13e44544
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
archival: add "extended-timestamp" extra block for zip archives (issue3600)

Before this patch, zip archives created by "hg archive" are extracted
with unexpected timestamp, if TZ is not configured as GMT.

This patch adds "extended-timestamp" extra block to zip archives, and
unzip will extract such archives with timestamp specified in added
extra block, even though TZ is not configured as GMT.

Please see documents below for detail about specification of zip file
format and "extended-timestamp" extra block:

  http://www.pkware.com/documents/casestudies/APPNOTE.TXT
  http://www.opensource.apple.com/source/zip/zip-6/unzip/unzip/proginfo/extra.fld

Original implementation of this patch was suggested by "Jun Omae
<jun66j5@gmail.com>".

(please test the fix)