/!\ This page is primarily intended for Mercurial's developers.

Dirstate Format Improvements Plan

1. Current state:

  1. The dirstate is stored in a file and reflects the state of all the files in the repository tracked by hg
  2. The dirstate is written in a random order (Python dict iteration)
  3. Status spends most of its time reading / parsing / using the dirstate and status is used in many highly used commands, here it what it does:
    1. Get the list of files that changed recently from hg watchman
    2. Read the entire dirstate, store it in memory
    3. Iterate through the dirstate and check the status of the file returned from watchman, we are interested in two questions:
      1. is the file in the dirstate?
      2. is the file modified/added/removed?
  4. For a dirstate of about ~100Mb, it takes 5s to build and write it and 350ms to read it

2. Improvement plan:

We can improve 3.2, 3.3.1, and 3.3.2.

3.1 is not improvable easily.




3. New on-disk format

The new dirstate format will look like the previous format with the addition of:

- Version of the format (as we will have more than one type of dirstate format)

- Awareness of directories / tree-structured / stem compression

- Checksums for files in lookup state so we don't have to visit revlogs

- Sorted order

- Number of entries (to avoid guessing the size of the dictionary to hold the dirstate)

CategoryDeveloper CategoryNewFeatures

DirstateFormatImprovementsPlan (last edited 2017-08-17 17:27:41 by SaurabhSingh)