How many flags are in filelog.flags() allowed?

Benoit Boissinot benoit.boissinot at ens-lyon.org
Sat Apr 24 17:33:33 CDT 2010


On Sun, Apr 25, 2010 at 12:13:50AM +0200, Klaus Koch wrote:
> Hello,
> 
> Thanks for the prompt answer.
> 
> Do I understand it correctly that you plan to implement a 'fake'
> filelog for bigfiles which would ensure that the data of bigfiles is
> not read entirely into memory and that their sha1 is computed not
> before the final commit?

No, the plan (no ETA for the moment, it's just some ideas flying around)
is to avoid storing data for bigfiles.

Not reading in memory and optimizing dirstate handling for bigfiles is
orthogonal to that.
> 
> This would fix the three problems with bigfiles: 
> 1) they are big which increases the repository storage and transfer time for pulls and pushes
> 2) all their data is read into memory, causing mmap exceptions
> 3) it needs some time to compute their sha1.
> 
> When Mercurial knows that some files should not be put into the repo,
> it can skip to read their data into memory and into the repository.
> My basic approach is (was I have to say now) to let dirstate.flacfunc
> add a flag 'S' for files bigger than a given threshold or matching
> given patterns.

that's cleanly doable using the remaining dirstate flags.

> The content of such files is then stored in more or less compressed
> files in a separate directory.  The data stored in the repository
> consists of a single string '.snap://<datastore_filename>.<sha1>\n'
> which acts as a pointer to the original data.

Using "fake" filelogs for that would be cleaner (and more transparent).
But if course for that to be useful for you, you need the "don't put the
data in memory" mode implemented.
> 
> Regarding the bigfile extension
> http://mercurial.selenic.com/wiki/BigfilesExtension:
> The basic approach is to log all bigfiles in a file .bigfiles and use
> that as another .hgignore, or with a patch from me to use the content
> as --ignore pattern (with the .hgignore approach you risk that file
> are shadowed, e.g. a bigfile a would shadow a normal file path1/a).
> These patterns in .hgbigfiles are then used for every normal Mercurial
> command so that they do not see the big files.
> 
> This causes several problems:
> The time to generate the ignore regulare expression is at least linear
> to the number of files in .hgignore,

reading the dirstate is linear in the numer of files tracked too, that
it doesn't change much in the overall complexity.

> and as long as a bigfile is not listed in .hgignore, Mercurial will
> compute its sha1 to compare it with a former smaller version already
> stored in the repo.

What do you mean? We already use some heuristics to avoid reading the
file.

> Besides, it is no longer possible to see the history of a file by a
> simple 'hg log [file]'.

Indeed, bigfiles isn't "transparent".

Benoit
-- 
:wq


More information about the Mercurial-devel mailing list