[PATCH] introduce filenamelog repository layout

Fri Jul 11 17:36:25 CDT 2008

On Fri, 2008-07-11 at 23:36 +0200, Adrian Buehlmann wrote:
> On 11.07.2008 20:21, Matt Mackall wrote:
> > On Fri, 2008-07-11 at 19:10 +0200, Adrian Buehlmann wrote:
> >> # HG changeset patch
> >> # User Adrian Buehlmann <adrian at cadifra.com>
> >> # Date 1215795701 -7200
> >> # Node ID 4c44bdd7f45f62a21feaab6e41a44dd8e8ec9151
> >> # Parent  2134d6c09432e4e3dbee18d93ec9242a332f7cdc
> >> introduce filenamelog repository layout
> >>
> >> * adds a new entry 'filenamelog' to .hg/requires for new repos
> >> * writes new file .hg/store/filenamelog
> > 
> > What's the format?
> > 
> 
> Very simple. Please read the code of the new filenamelog.py.

I glanced at your code, saw it was doing nonsensical things like
escaping null bytes and decided I must not understand it. Nulls can't
appear in filenames.

> Entries are \n separated. Each entry consists of two \0 separated
> paths. The first one being basically the name used by filelog
> (see filelog.encodedir), encoded by filenamelog.fnlogencode to mask
> things like zero bytes, the second one being basically the first one
> encoded by util.fnlogencode, which is the path of the filelog
> stored on disk.

> The second one could be omitted as it can be calculated from
> the first one. I haven't done so for two reasons:
> 
> * humans can look into filenamelog and see the encoded name

That's handy, but it's probably not worth more than doubling the disk
space and parsing time.

> * streamclone is faster, as it doesn't need to call
>   util.fnlogencode again

I'd be surprised if it was measurably faster if not in fact slower. It
is reading all the other data in the repo after all. Also, it means
reading and parsing >2x the data vs reading and splitlines() on 1x the
data and encoding.

> >> * hash-encodes filenames with long paths (issue839)
> > 
> > What's the format?

> data/FIRST/SECOND/THIRD/FOURTH/FIFTH/SIXTH/SEVENTH/EIGHTH/NINETH/TENTH/ELEVENTH/LOREM.TXT.i
> 
> it is written to:
> 
> dh/_f_i_r_s/_s_e_c_o/_t_h_i_r/_f_o_u_r/_f_i_f_t/_s_i_x_t/_s_e_v_e/_e_i_g_h/213bfeabe713cd5571ac605bbc0cf5de4e682b43.i
> 
> because the other encoding would result in a path longer than MAX_PATH_LEN_IN_HGSTORE,
> which I've defined as 120 (your fixed limit requirement of the parren encoding
> switching idea - the hybrid scheme).

Ok.

> >> * encodes Windows reserved filenames (issue793)
> > 
> > What's the format?
> 
> data/aux.bla/bla.aux/prn/PRN/lpt/com3/nul/coma/foo.NUL/normal.c.i
> 
> is encoded as:
> 
> df/au~78.bla/bla.aux/pr~6e/_p_r_n/lpt/co~6d3/nu~6c/coma/foo._n_u_l/normal.c.i

Ok.

Here's how I'd like to see things evolve:

patch 1:
move filename encoding functions out of util into filelog.py where they
belong (util has no damn reason to know about encodings)
this probably means adding a pair of filelogopeners for the two current
layouts
update localrepo appropriately
no visible functional changes

patch 2:
add functions in filelog to find all the files in a repo, one per layout
teach localrepo about it   
teach streamclone to ask repo for the file list rather than digging
around on its own
no visible functional changes

patch 3:
add a new filelogopener and filelist function for your new format
update localrepo appropriately

Here all the groundwork is done before patch 3, making the actual new
layout patch much smaller and more self-contained

-- 
Mathematics is the supreme nostalgia of our time.