Question about the mapping between tracked files and the filelogs

Thu May 29 05:03:08 CDT 2008

Hi there,

After reading the excellent "Behind the Scenes" chapter of the unofficial book,
I have a question about the mapping from tracked files in the manifest
to the file revlogs that actually contain the file data.
It seems that currently, there is exactly one revlog per tracked file,
or rather, per tracked file name, and the mapping is
done by the implicit path folding rule.

The current contents of each tracked file are identified by a node id
that references an entry in the file's revlog.

I was wondering what would break if one allowed for arbitrary mappings
between tracked files (file revisions) and revlogs.
The manifest would have to include the revlog name or number in
addition to the node id.

The advantage of that approach is that similar or identical files
could share storage (rename would become cheaper).

The problem of longish escaped revlog filenames could also be mitigated.
(One could give up on using "real" filenames in the data store)

Longish revlogs (many updates) could also be partitioned if the revlog
index file should grow too big.

One problem that I can immediately spot is that the revlogs cannot be
used to extract the associated changesets for
a given file anymore (since they would mix data for "unrelated" files
and might not even be complete for a given file).
Is this how "hg log myfile.txt" works now?
This could maybe be avoided by keeping the revlog index files on a
per-file basis, but sharing the associated data files.

If there are any crucial flaws in this idea, please point them out
lest I get too excited about it...

Thanks,

Thilo