[PATCH] introduce filenamelog repository layout

Fri Jul 11 18:05:02 CDT 2008

on 12.07.2008 00:36, Matt Mackall wrote:
> On Fri, 2008-07-11 at 23:36 +0200, Adrian Buehlmann wrote:
>> On 11.07.2008 20:21, Matt Mackall wrote:
>>> On Fri, 2008-07-11 at 19:10 +0200, Adrian Buehlmann wrote:
>>>> # HG changeset patch
>>>> # User Adrian Buehlmann <adrian at cadifra.com>
>>>> # Date 1215795701 -7200
>>>> # Node ID 4c44bdd7f45f62a21feaab6e41a44dd8e8ec9151
>>>> # Parent  2134d6c09432e4e3dbee18d93ec9242a332f7cdc
>>>> introduce filenamelog repository layout
>>>>
>>>> * adds a new entry 'filenamelog' to .hg/requires for new repos
>>>> * writes new file .hg/store/filenamelog
>>> What's the format?
>>>
>> Very simple. Please read the code of the new filenamelog.py.
> 
> I glanced at your code, saw it was doing nonsensical things like
> escaping null bytes and decided I must not understand it. Nulls can't
> appear in filenames.

That's exactly what current (unpatched) util.encodefilename does.
It encodes \0 as ~00.

Ok. So I will not encode null bytes for .hg/store/filenamelog

>> Entries are \n separated. Each entry consists of two \0 separated
>> paths. The first one being basically the name used by filelog
>> (see filelog.encodedir), encoded by filenamelog.fnlogencode to mask
>> things like zero bytes, the second one being basically the first one
>> encoded by util.fnlogencode, which is the path of the filelog
>> stored on disk.
> 
>> The second one could be omitted as it can be calculated from
>> the first one. I haven't done so for two reasons:
>>
>> * humans can look into filenamelog and see the encoded name
> 
> That's handy, but it's probably not worth more than doubling the disk
> space and parsing time.

Ok. I'll leave that 2nd entry away.

>> * streamclone is faster, as it doesn't need to call
>>   util.fnlogencode again
> 
> I'd be surprised if it was measurably faster if not in fact slower. It
> is reading all the other data in the repo after all. Also, it means
> reading and parsing >2x the data vs reading and splitlines() on 1x the
> data and encoding.

Ok.

>>>> * hash-encodes filenames with long paths (issue839)
>>> What's the format?
> 
>> data/FIRST/SECOND/THIRD/FOURTH/FIFTH/SIXTH/SEVENTH/EIGHTH/NINETH/TENTH/ELEVENTH/LOREM.TXT.i
>>
>> it is written to:
>>
>> dh/_f_i_r_s/_s_e_c_o/_t_h_i_r/_f_o_u_r/_f_i_f_t/_s_i_x_t/_s_e_v_e/_e_i_g_h/213bfeabe713cd5571ac605bbc0cf5de4e682b43.i
>>
>> because the other encoding would result in a path longer than MAX_PATH_LEN_IN_HGSTORE,
>> which I've defined as 120 (your fixed limit requirement of the parren encoding
>> switching idea - the hybrid scheme).
> 
> Ok.
> 
>>>> * encodes Windows reserved filenames (issue793)
>>> What's the format?
>> data/aux.bla/bla.aux/prn/PRN/lpt/com3/nul/coma/foo.NUL/normal.c.i
>>
>> is encoded as:
>>
>> df/au~78.bla/bla.aux/pr~6e/_p_r_n/lpt/co~6d3/nu~6c/coma/foo._n_u_l/normal.c.i
> 
> Ok.
> 
> Here's how I'd like to see things evolve:
> 
> patch 1:
> move filename encoding functions out of util into filelog.py where they
> belong (util has no damn reason to know about encodings)
> this probably means adding a pair of filelogopeners for the two current
> layouts
> update localrepo appropriately
> no visible functional changes
> 
> patch 2:
> add functions in filelog to find all the files in a repo, one per layout
> teach localrepo about it   
> teach streamclone to ask repo for the file list rather than digging
> around on its own
> no visible functional changes
> 
> patch 3:
> add a new filelogopener and filelist function for your new format
> update localrepo appropriately
> 
> Here all the groundwork is done before patch 3, making the actual new
> layout patch much smaller and more self-contained
> 

Thanks. I will try doing that.