[Bug 3512] New: Add encode/decode filter for ZIP-based binary file format normalization

Sat Jun 23 15:36:21 CDT 2012

http://bz.selenic.com/show_bug.cgi?id=3512

          Priority: wish
            Bug ID: 3512
                CC: mercurial-devel at selenic.com
          Assignee: bugzilla at selenic.com
           Summary: Add encode/decode filter for ZIP-based binary file
                    format normalization
          Severity: feature
    Classification: Unclassified
                OS: All
          Reporter: avw at gmx.ch
          Hardware: All
            Status: UNCONFIRMED
           Version: unspecified
         Component: Mercurial
           Product: Mercurial

As far as I can tell the usage of ZIP-based file formats is quite popular. This
includes not only things like Java JARs and WARs, but also very common file
formats such as DOCX, XLSX and PPTX. According to the documentation Mercurial
is not very efficient with these due to the nature of the ZIPping, where the
whole archive file can change just if one byte in one of the contained files
has actually changed.

Having a filter that re-packs the file into an equivalent, but normalized ZIP
archive (with a fixed order of the files, and files just stored not compressed)
would enable Mercurial to efficiently find and store the binary deltas of these
files.

I'm aware that this is very similar to the example given in the
http://mercurial.selenic.com/wiki/EncodeDecodeFilter wiki page, has real-world
data been collected for this approach?

-- 
You are receiving this mail because:
You are on the CC list for the bug.