Note:

This page is primarily intended for developers of Mercurial.

This document represents a proposal only. This proposal can also be considered only applicable for repositories that require it for performance/scaling reasons.

The current dirstate format is very simple - it's just a list containing for every file filename, state of the file, modification date and mode. It also stores the source for tracked copies and moves. The handling of such simple format is also simple - we are reading the whole dirstate to memory process it there and when we want to write it we rewrite the whole file.

Unfortunately for big repos the dirstate can easily exceed 50 megabytes. For some operations involving multiple dirstate reads and writes like rebase the dirstate operations contribute to 30% of the execution time. While there are other efforts to limit the working directory size (sparse extension, narrow clones) sometimes the user may want to have the whole big working directory.

Before hgwatchman was created there was no need for much better format as every status operation involved iterating over all files in the repo anyway. Now when we know which files had changed we can just check those files in dirstate.

To handle such a big working dirstate faster we need to store dirstate in more organized format:

I'm trying to implement a prototype of such dirstate with sqllite as backend storage. I'm currently implementing is as extension which will be soon available in https://bitbucket.org/facebook/hg-experimental repo

SQLDirstatePlan (last edited 2016-05-28 11:01:21 by rcl)