[PATCH 3 of 3] introduce filenamelog repository layout

Mon Jul 21 18:04:27 CDT 2008

On 21.07.2008 23:16, Matt Mackall wrote:
> On Mon, 2008-07-21 at 21:43 +0200, Adrian Buehlmann wrote:
> 
>> +def debugfilenamelog(ui, repo, **opts):
>> +    """dumps the filenamelog, showing the encoded filename for each entry
>> +
>> +    Writes two lines of output per filenamelog entry: The first line is the
>> +    filenamelog entry itself, the second line is the encoded filename.
>> +
>> +    By default, outputs the hashed filenames only. Specify -f to output
>> +    only the full name (non-hased) filenames, or -a to get both.
> 
> That's confusing.

I was trying to output only the interesting things by default (the
hashed filenames). I admit, this might be overkill.

Not sure if debugfilenamelog is worth being dissected that closely
though. It's just a debugging feature.

But if I know what's wanted, I can change it.

>> +    Specify -q to suppress the statistic printed at the end.
>> +    """
>> +    all = opts["all"]
>> +    fullonly = opts["full"]
>> +    n = 0
>> +    nh = 0
>> +    for f in repo.store.readfnlog():
>> +        ef = repo.encodefn(f)
>> +        hashed = ef.startswith('dh/')
>> +        if hashed:
>> +          nh += 1
>> +        if all or (hashed and not fullonly) or (fullonly and not hashed):
>> +            ui.write("   '%s'\n-> '%s'\n" % (f, ef))
>> +        n += 1
>> +    ui.status("(filenamelog has %i filenames, %i hashed)\n" % (n, nh))
> 
> Couldn't we use repo.storefiles + repo.sjoin to do this without
> knowing/caring about what sort of store we were using? A similar comment
> applies to your verify addition.

repo.storefiles locks the repo again, and also returns the meta data files
as well (changelog, manifest), which neither debugfilenamelog nor verify want
to see.

Also, repo.storefiles stats the files to get the lengths, which neither
debugfilenamelog nor verify want to have (we could do that with a param,
of course).

verify already locks the repo and debugfilenamelog doesn't need to lock it.

>> diff --git a/mercurial/filenamelog.py b/mercurial/filenamelog.py
>> new file mode 100644
>> --- /dev/null
>> +++ b/mercurial/filenamelog.py
>> @@ -0,0 +1,44 @@
>> +# filenamelog.py - logging all filenames of a Mercurial repository
>> +#
>> +# Copyright 2008 Matt Mackall <mpm at selenic.com>
>> +#
>> +# This software may be used and distributed according to the terms
>> +# of the GNU General Public License, incorporated herein by reference.
>> +
>> +from i18n import _
>> +import util
>> +
>> +LOGNAME = 'filenamelog'
>> +
>> +def abort(text, linenum = None):
>> +    lineinfo = ""
>> +    if (linenum != None):
>> +        lineinfo = ", line %i" % linenum
>> +    raise util.Abort("%s%s: %s" % (LOGNAME, lineinfo, text))
>> +
>> +def append(opener, entries, transaction):
>> +    if len(entries) == 0:
>> +        return
>> +    fp = opener(LOGNAME, mode='a+')
>> +    fp.seek(0, 2)
>> +    offset = fp.tell()
>> +    if transaction != None:
>> +        transaction.add(LOGNAME, offset)
>> +    for p in entries:
>> +        fp.write(p + '\n') # assuming that filenames don't contain '\n'
>> +    fp.close()
>> +
>> +def entries(opener):
>> +    # yields: path, line number
>> +    n = 1
>> +    try:
>> +        fp = opener(LOGNAME, mode='rb')
>> +    except IOError:
>> +        # skip nonexisting file
>> +        return
>> +    n = 1
>> +    for line in fp:
>> +        if (len(line) < 2) or (line[-1] != '\n'):
>> +            abort(_('invalid entry'), LOGNAME, n)
>> +        yield line[:-1], n
>> +        n += 1
> 
> Looks like this wants enumerate(). Or to not care about line numbers.
> And possibly a class?

The line numbers were handy for verify being able to complain about
filenamelog corruption.

You mean a filenamelog class?

>> @@ -910,6 +911,7 @@
>>  
>>              n = self.changelog.add(mn, changed + removed, text, trp, p1, p2,
>>                                     user, wctx.date(), extra)
>> +            self.store.addnewfiles(tr)
>>              self.hook('pretxncommit', throw=True, node=hex(n), parent1=xp1,
>>                        parent2=xp2)
>>              tr.close()
>> @@ -1981,6 +1983,8 @@
>>              # make changelog see real files again
>>              cl.finalize(trp)
>>  
>> +            self.store.addnewfiles(tr)
>> +
>>              newheads = len(self.changelog.heads())
>>              heads = ""
>>              if oldheads and newheads != oldheads:
>> @@ -2055,6 +2059,7 @@
>>              for chunk in util.filechunkiter(fp, limit=size):
>>                  ofp.write(chunk)
>>              ofp.close()
>> +        self.store.addnewfiles(None)
>>          elapsed = time.time() - start
>>          if elapsed <= 0:
>>              elapsed = 0.001
> 
> That's a bit magical. Let me see if I can figure it out..
> 
> Ok, that's all a little hairy, but I don't have any suggestions just
> yet.

That's pretty much the key point of the whole patch series.
Resulting from the problem of filelog not being allowed to report
new store files back to repo.

Which in turn results from having to maintain a filenamelog at
all, which results from not being allowed to save the filenames
of the hashed files inside themselves so that streamclone could
have asked the filelog file for the name of the workspace file it
tracks (which djc called a layering violation).

>> +    def hasfnlog(self):
>> +        return False
>> +
>> +    def readfnlog(self):
>> +        res = {}
>> +        return res
> 
> We may be able to kill these.

verify needs to know if there is a filenamelog that needs to be verified.
(at least that was my thinking when I created these methods.)