Mercurialの限界は?

Mercurialは現在ファイルやインデックスや マニフェスト は効率のためメモリ内で処理します。

その他、ファイル名の長さやファイルの大きさ、ファイルの内容、ファイル数、リビジョン数には制限はありません。 (see also RepoSamples for the sizes of some example repositories.)

ネットワークプロトコルはビッグエンディアンであると規定されています。

ファイル名は null 文字や改行を含んではいけません。コミッタのアドレスは改行を含むことはできません。

Mercurialは主にUNIX上で開発されているため、他への移植版にUnix的要素が表れることがあります。

Mercurial encodes filenames (see CaseFolding, CaseFoldingPlan, fncacheRepoFormat) when storing them in the repository. Most notably, uppercase characters in filenames are encoded as two characters in the filename in the repository ("FILE""_f_i_l_e").

Mercurialはデータをどのように格納するの?

The fundamental storage type in Mercurial is a revlog. A revlog is the set of all revisions of a named object. Each revision is either stored compressed in its entirety or as a compressed binary delta against the previous version. The decision of when to store a full version is made based on how much data would be needed to reconstruct the file. This lets us ensure that we never need to read huge amounts of data to reconstruct a object, regardless of how many revisions of it we store.

In fact, we should always be able to do it with a single read, provided we know when and where to read. This is where the index comes in. Each revlog has an index containing a special hash (nodeid) of the text, hashes for its parents, and where and how much of the revlog data we need to read to reconstruct it. Thus, with one read of the index and one read of the data, we can reconstruct any version in time proportional to the object size.

Similarly, revlogs and their indices are append-only. This means that adding a new version is also O(1) seeks.

Revlogs are used to represent all revisions of files, manifests, and changesets. Compression for typical objects with lots of revisions can range from 100 to 1 for things like project makefiles to over 2000 to 1 for objects like the manifest.

バイナリファイルの取り扱いはどうなっているの?

Core Mercurial tracks but never modifies file content, and it is thus binary safe. See BinaryFiles for more discussion of commands which interpret file content, e.g. merge, diff, export and annotate.

Windows式改行とUnix式の改行の違いはどう?

See Win32TextExtension for techniques which automatically convert Windows line endings into Unix line endings when committing files to the repository, and convert back again when updating the workspace. This is not default Mercurial behaviour, and requires users to edit their configuration files to turn it on. Adopting this policy on line endings probably implies enabling a hook to prevent non-compliant commits from getting into your repository, which in turn forces people contributing code to enable the extension.

キーワード置き換え(例: $Id$)については?

See KeywordPlan and KeywordExtension.

Mercurialはどのようにして差分を計算している?

Mercurial の diff は従来の diff アルゴリズムとは異なる方法で計算されます(もちろん出力は完全に互換です)。アルゴリズムは Python の difflib をベースにして C 言語に最適化されており、データサイズを最小化することよりも人が読みやすい diff を生成することに重点を置いています。このアルゴリズムは内部的なデルタ圧縮(delta compression)にも使用されています。

デルタ圧縮アルゴリズムについて詳しく調べた経緯は、ベンチマークによるとこの実装が他よりシンプルかつ高速で、従来の diff アルゴリズムの理論値より小さな差分を生成ことがわかったからです。 これは従来のアルゴリズムが挿入・削除・不変な要素に対して同じコストがかかると仮定しているためです。

manifestやチェンジセットをどのように格納している?

A manifest is simply a list of all files in a given revision of a project along with the nodeids of the corresponding file revisions. So grabbing a given version of the project means simply looking up its manifest and reconstructing all the file revisions pointed to by it.

A changeset is a list of all files changed in a check-in along with a change description and some metadata like user and date. It also contains a nodeid to the relevant revision of the manifest.

ハッシュ値はどのように計算している?

Mercurial hashes both the contents of an object and the hash of its parents to create an identifier that uniquely identifies an object's contents and history. This greatly simplifies merging of histories because it avoid graph cycles that can occur when a object is reverted to an earlier state.

All file revisions have an associated hash value (the nodeid). These are listed in the manifest of a given project revision, and the manifest hash is listed in the changeset. The changeset hash (the changeset ID) is again a hash of the changeset contents and its parents, so it uniquely identifies the entire history of the project to that point.

リポジトリの完全性をどのように調べている?

Every time a revlog object is retrieved, it is checked against its hash for integrity. It is also incidentally doublechecked by the Adler32 checksum used by the underlying zlib compression.

Running 'hg verify' decompresses and reconstitutes each revision of each object in the repository and cross-checks all of the index metadata with those contents.

But this alone is not enough to ensure that someone hasn't tampered with a repository. For that, you need cryptographic signing.

Mercurialにおける署名はどうなっている?

Take a look at the hgeditor script for an example. The basic idea is to use GPG to sign the manifest ID inside that changelog entry. The manifest ID is a recursive hash of all of the files in the system and their complete history, and thus signing the manifest hash signs the entire project contents.

ハッシュ値は衝突しない? SHA1の脆弱性はどうなの?

The SHA1 hashes are large enough that the odds of accidental hash collision are negligible for projects that could be handled by the human race. The known weaknesses in SHA1 are currently still not practical to attack, and Mercurial will switch to SHA256 hashing before that becomes a realistic concern.

Collisions with the "short hashes" are not a concern as they're always checked for ambiguity and are still long enough that they're not likely to happen for reasonably-sized projects (< 1M changes).

See also: http://selenic.com/pipermail/mercurial/2009-April/025526.html by Matt Mackall.

How does "hg commit" determine which files have changed?

If hg commit is called without file arguments, it commits all files that have "changed" (see Commit). Note however, that Mercurial doesn't detect changes that do not change the file time or size (This is by design. See also msg3438 in issue618 and DirState).

What is the difference between rollback and strip?

They overlap a bit, but are really quite different:

JapaneseFAQ/TechnicalDetails (last edited 2009-09-14 13:07:48 by YuyaNishihara)