This page is proposed for deletion. See our wiki cleanup plan for more information.
(for a short intro of the basic concepts of Mercurial, see UnderstandingMercurial)
The act of creating a changeset is called a commit or checkin. A changeset includes the actual changes to the files and some meta information. The meta information in a changeset includes:
- the list of changed files
- information about who made the change (the "committer"), why ("comments") and when (date/time, timezone)
the name of the branch ("default", if omitted or not set)
Each changeset has zero, one or two parent changesets. It has two parent changesets, if the commit was a merge. It has no parent, if the changeset is a root in the repository. There may be multiple roots in a repository (normally, there is only one), each representing the start of a branch.
If a changeset is not the head of a branch, it has one or more child changesets (it is then the parent of its child changesets).
"Updating" back to a changeset which already has a child, changing files and then committing creates a new child changeset, thus starting a new branch. Branches can be named.
All changesets of a repository are stored in the changelog.
Here's what the internal representation of a changeset looks like:
$ hg debugdata .hg/00changelog.d 1208 1102691ceab8c8f278edecd80f2e3916090082dd <- the corresponding manifest nodeid email@example.com <- the committer 1126146623 25200 <- the date, in seconds since the epoch, and seconds offset from UTC mercurial/commands.py <- the list of changed files, followed by the commit message Clean up local clone file list We now use an explicit list of files to copy during clone so that we don't copy anything we shouldn't.
Committing a new changeset
Committing a changeset to the repository involves updating the Revlogs for all modified files, the Manifest, and the Changelog.The following outlines the process of committing a new changeset to a repository, which is a two-stage process. The first stage walks from top to bottom, from the changelog, to the manifest, to the files. The second stage goes back up, from the files, to the manifest, to the changelog.
First stage (top to bottom)
The first step is to get the Changelog of the parent revision. The changelog is a virtual file, in that it doesn't necessarily exist directly as a file in the repository. Instead, it is versioned in a Revlog, just like all of your tracked files. From the revlog, any version of the Changelog can be constructed on the fly, as needed.
The changelog has one version (one entry in its revlog) for every revision of the repository. Each version of the changelog stores meta information about the revision, including a timestamp for the commit, the username that made the commit, and the commit log. The most important thing it stores is a Nodeid which indicates a specific version of the Manifest.
Like the changelog, the manifest is a versioned virtual file. It has its own revlog, and the nodeid specified in the changelog uniquely identifies one of the entries in the manifests revlog (i.e., a specific version of the manifest), so the second step is to take the nodeid indicated in the changelog and fetch that version of the manifest. Remember, this is the version of the manifest used by the parent revision.
Each version of the manifest is like a snapshot of the files in the repository at a given moment (i.e., in a particular revision of the repository). The manifest doesn't store the contents of the files directly, instead it stores a Nodeid for each tracked file. Just like the manifest nodeid stored in the changelog, each nodeid in the manifest indicates a particular version of that file (i.e., a particular entry in the file's revlog).
The third and final step of the first stage is to get the revision specified in the manifest for each file that has been modified in this changeset. These are the parent versions of the files.
Notice that this first stage is basically just updating the repository to the parent revision, except that nothing is actually changed on disk or in the filesystem, the update is created virtually and kept in memory.
Second stage (bottom to top)
With all of the parent versions identified and reconstructed for the changelog, the manifest, and all the modified files, the second stage can begin to construct new entries for each of the effected revlogs. The first step in creating a revlog entry is to determine the new nodeid, which will uniquely identify that entry in the revlog. Nodeids are constructed by hashing the nodeids of the two parent versions, and the complete contents of the new version of the file. Remember that the parent nodeids are not the same as the parent ChangeSetId. The parent nodeids are the identifiers for other entries in the same revlog. The ChangeSetId is only a parent id for the changelog, not for the manifest of the files. For the manifest, the parent nodeid is the one that was specified in the parent version of the changelog, and for the files, the parent nodeid is the one specified in the parent version of the manifest.
The fact that the nodeid requires the complete contents of the new version of the file is the reason that the second stage needs to go from bottom to top. Nodeids for the new versions of the tracked files are computed first, then the manifest is updated with these new nodeids to create the new version of the manifest.
With the new version of the manifest prepared, a new manifest node id can be computed. This at last allows us to generate the new changelog, and then the new nodeid for the changelog, whcih will be the ChangeSetId for the new revision of the repository.
The final stage is to actually update the revlogs for the changelog, the manifest, and all the modified files. The reason this comes last is because each revlog entry incldues the ChangeSetId for the repository revision it corresponds to, and we didn't have this until the very end.