D5296: store: don't read the whole fncache in memory
indygreg (Gregory Szorc)
phabricator at mercurial-scm.org
Mon Feb 25 21:46:14 EST 2019
indygreg added a comment.
I suspect https://phab.mercurial-scm.org/rHG9fca5b056c0a2f673aefa64f7ec7488bd9188d9d made things faster because the code before was using 1 I/O operation for every entry. I would also not be surprised if CPython from that era did something very inefficient with regards to line reading.
The current code is pretty bad because it buffers the entire file content in memory! I agree we should change it.
I like this patch as written. If profiling shows it to be slow, I think there is room to optimize `util.iterfile()` or even to teach the vfs layer how to efficiently open files for line-based I/O. This is something I could help optimize if needed.
While I'm here, the fncache file being a newline delimited list of full file paths is kinda ridiculous. We could do much better by using compression and/or a more complicated data structure. It is kinda silly that we have to load this decoded data structure into memory. So if your file on disk is ~100MB, you are going to have a Python set that also consumes ~100MB. That's really not great.
REPOSITORY
rHG Mercurial
REVISION DETAIL
https://phab.mercurial-scm.org/D5296
To: pulkit, #hg-reviewers
Cc: indygreg, yuja, mjpieters, mercurial-devel
More information about the Mercurial-devel
mailing list