D5296: store: don't read the whole fncache in memory

Wed Feb 27 10:46:24 EST 2019

pulkit added a comment.

  In https://phab.mercurial-scm.org/D5296#87890, @indygreg wrote:

  > I suspect https://phab.mercurial-scm.org/rHG9fca5b056c0a2f673aefa64f7ec7488bd9188d9d made things faster because the code before was using 1 I/O operation for every entry. I would also not be surprised if CPython from that era did something very inefficient with regards to line reading.
  >
  > The current code is pretty bad because it buffers the entire file content in memory! I agree we should change it.
  >
  > I like this patch as written. If profiling shows it to be slow, I think there is room to optimize `util.iterfile()` or even to teach the vfs layer how to efficiently open files for line-based I/O. This is something I could help optimize if needed.
  >
  > While I'm here, the fncache file being a newline delimited list of full file paths is kinda ridiculous. We could do much better by using compression and/or a more complicated data structure. It is kinda silly that we have to load this decoded data structure into memory. So if your file on disk is ~100MB, you are going to have a Python set that also consumes ~100MB. That's really not great.

  I agree. We should either use compression or use a much complicated data structure here. Right now there is no way to check whether an entry exists or not without parsing the whole fncache. It does not affect much on small repositories but on large repositories such as mozilla, the difference of parsing the whole fncache is noticable. We have perffncacheload taking ~4 seconds :/

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5296

To: pulkit, #hg-reviewers
Cc: indygreg, yuja, mjpieters, mercurial-devel