D5296: store: don't read the whole fncache in memory

indygreg (Gregory Szorc) phabricator at mercurial-scm.org
Mon Feb 25 21:46:14 EST 2019


indygreg added a comment.


  I suspect https://phab.mercurial-scm.org/rHG9fca5b056c0a2f673aefa64f7ec7488bd9188d9d made things faster because the code before was using 1 I/O operation for every entry. I would also not be surprised if CPython from that era did something very inefficient with regards to line reading.
  
  The current code is pretty bad because it buffers the entire file content in memory! I agree we should change it.
  
  I like this patch as written. If profiling shows it to be slow, I think there is room to optimize `util.iterfile()` or even to teach the vfs layer how to efficiently open files for line-based I/O. This is something I could help optimize if needed.
  
  While I'm here, the fncache file being a newline delimited list of full file paths is kinda ridiculous. We could do much better by using compression and/or a more complicated data structure. It is kinda silly that we have to load this decoded data structure into memory. So if your file on disk is ~100MB, you are going to have a Python set that also consumes ~100MB. That's really not great.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5296

To: pulkit, #hg-reviewers
Cc: indygreg, yuja, mjpieters, mercurial-devel


More information about the Mercurial-devel mailing list