[issue3360] dirstate handling gets slow with medium-sized manifest

Bryan O'Sullivan bugs at mercurial.selenic.com
Mon Apr 9 18:44:50 CDT 2012


New submission from Bryan O'Sullivan <bos at serpentine.com>:

Running on a synthetic repo with 120,000 files in the tip rev.

  # hg --time add a.txt
  Time: real 1.750 secs (user 1.670+0.000 sys 0.070+0.000)

Run a second time, to guarantee that the dirstat will not be written:

  # hg --time add a.txt
  a.txt already tracked!
  Time: real 1.250 secs (user 1.190+0.000 sys 0.070+0.000)

I interpret this as "read takes 1.2 seconds, write takes 0.5 seconds extra".

All the time seems to be related to case folding checks. A statprof run:

  %   cumulative      self          
 time    seconds   seconds  name    
 16.79      0.19      0.19  utf_8.py:16:decode
 13.14      0.83      0.15  scmutil.py:57:__init__
 12.04      0.13      0.13  utf_8.py:15:decode
 10.22      0.11      0.11  dirstate.py:218:__getitem__
  9.49      0.11      0.11  encoding.py:176:lower
  9.49      0.11      0.11  dirstate.py:208:__getitem__
  8.03      0.09      0.09  encoding.py:171:lower
  7.66      0.09      0.09  encoding.py:179:lower
  4.74      0.37      0.05  encoding.py:174:lower
  2.55      0.03      0.03  encoding.py:168:lower
  1.82      0.02      0.02  dirstate.py:225:__iter__

That second entry is from casecollisionauditor.

If I turn casecollisionauditor into a no-op, time for the expected-collision 
case goes back down to something reasonable:

  # hg --time add a.txt
  a.txt already tracked!
  Time: real 0.110 secs (user 0.070+0.000 sys 0.040+0.000)

Writing remains expensive:

  # hg --time revert a
  Time: real 1.500 secs (user 1.360+0.000 sys 0.140+0.000)
  # hg --time add a.txt
  Time: real 0.780 secs (user 0.730+0.000 sys 0.060+0.000)

This time, it's iterating through the dirstate to write it out that's causing 
problems. Here's a statprof profile of "hg add" with the case-folding stuff 
killed:

  %   cumulative      self          
 time    seconds   seconds  name    
 17.45      0.14      0.14  dirstate.py:486:write
 17.02      0.14      0.14  dirstate.py:30:_finddirs
 15.32      0.51      0.13  dirstate.py:120:_dirs
 12.34      0.10      0.10  dirstate.py:38:_incdirs
  5.96      0.05      0.05  dirstate.py:32:_finddirs
  5.53      0.05      0.05  dirstate.py:488:write
  4.26      0.24      0.03  dirstate.py:36:_incdirs
  3.40      0.03      0.03  dirstate.py:119:_dirs
  3.40      0.03      0.03  dirstate.py:40:_incdirs
  2.55      0.02      0.02  dirstate.py:487:write
  2.55      0.02      0.02  dirstate.py:471:write
  2.13      0.02      0.02  dirstate.py:33:_finddirs
  2.13      0.02      0.02  dirstate.py:118:_dirs
  1.28      0.01      0.01  dirstate.py:35:_incdirs
  1.28      0.01      0.01  dirstate.py:470:write

----------
messages: 19551
nosy: bos
priority: bug
status: unread
title: dirstate handling gets slow with medium-sized manifest

____________________________________________________
Mercurial issue tracker <bugs at mercurial.selenic.com>
<http://mercurial.selenic.com/bts/issue3360>
____________________________________________________


More information about the Mercurial-devel mailing list