[PATCH 3 of 3] introduce filenamelog repository layout

Mon Jul 21 19:20:21 CDT 2008

On Tue, 2008-07-22 at 01:04 +0200, Adrian Buehlmann wrote:
> On 21.07.2008 23:16, Matt Mackall wrote:
> > On Mon, 2008-07-21 at 21:43 +0200, Adrian Buehlmann wrote:
> > 
> >> +def debugfilenamelog(ui, repo, **opts):
> >> +    """dumps the filenamelog, showing the encoded filename for each entry
> >> +
> >> +    Writes two lines of output per filenamelog entry: The first line is the
> >> +    filenamelog entry itself, the second line is the encoded filename.
> >> +
> >> +    By default, outputs the hashed filenames only. Specify -f to output
> >> +    only the full name (non-hased) filenames, or -a to get both.
> > 
> > That's confusing.
> 
> I was trying to output only the interesting things by default (the
> hashed filenames). I admit, this might be overkill.
> 
> Not sure if debugfilenamelog is worth being dissected that closely
> though. It's just a debugging feature.
> 
> But if I know what's wanted, I can change it.
> 
> >> +    Specify -q to suppress the statistic printed at the end.
> >> +    """
> >> +    all = opts["all"]
> >> +    fullonly = opts["full"]
> >> +    n = 0
> >> +    nh = 0
> >> +    for f in repo.store.readfnlog():
> >> +        ef = repo.encodefn(f)
> >> +        hashed = ef.startswith('dh/')
> >> +        if hashed:
> >> +          nh += 1
> >> +        if all or (hashed and not fullonly) or (fullonly and not hashed):
> >> +            ui.write("   '%s'\n-> '%s'\n" % (f, ef))
> >> +        n += 1
> >> +    ui.status("(filenamelog has %i filenames, %i hashed)\n" % (n, nh))
> > 
> > Couldn't we use repo.storefiles + repo.sjoin to do this without
> > knowing/caring about what sort of store we were using? A similar comment
> > applies to your verify addition.
> 
> repo.storefiles locks the repo again, and also returns the meta data files
> as well (changelog, manifest), which neither debugfilenamelog nor verify want
> to see.

On the other hand, it's a debug function. If locking is an issue, we can
migrate the lock back to its one user.

> Also, repo.storefiles stats the files to get the lengths, which neither
> debugfilenamelog nor verify want to have (we could do that with a param,
> of course).
> 
> verify already locks the repo and debugfilenamelog doesn't need to lock it.
> 
> >> diff --git a/mercurial/filenamelog.py b/mercurial/filenamelog.py
> >> new file mode 100644
> >> --- /dev/null
> >> +++ b/mercurial/filenamelog.py
> >> @@ -0,0 +1,44 @@
> >> +# filenamelog.py - logging all filenames of a Mercurial repository
> >> +#
> >> +# Copyright 2008 Matt Mackall <mpm at selenic.com>
> >> +#
> >> +# This software may be used and distributed according to the terms
> >> +# of the GNU General Public License, incorporated herein by reference.
> >> +
> >> +from i18n import _
> >> +import util
> >> +
> >> +LOGNAME = 'filenamelog'
> >> +
> >> +def abort(text, linenum = None):
> >> +    lineinfo = ""
> >> +    if (linenum != None):
> >> +        lineinfo = ", line %i" % linenum
> >> +    raise util.Abort("%s%s: %s" % (LOGNAME, lineinfo, text))
> >> +
> >> +def append(opener, entries, transaction):
> >> +    if len(entries) == 0:
> >> +        return
> >> +    fp = opener(LOGNAME, mode='a+')
> >> +    fp.seek(0, 2)
> >> +    offset = fp.tell()
> >> +    if transaction != None:
> >> +        transaction.add(LOGNAME, offset)
> >> +    for p in entries:
> >> +        fp.write(p + '\n') # assuming that filenames don't contain '\n'
> >> +    fp.close()
> >> +
> >> +def entries(opener):
> >> +    # yields: path, line number
> >> +    n = 1
> >> +    try:
> >> +        fp = opener(LOGNAME, mode='rb')
> >> +    except IOError:
> >> +        # skip nonexisting file
> >> +        return
> >> +    n = 1
> >> +    for line in fp:
> >> +        if (len(line) < 2) or (line[-1] != '\n'):
> >> +            abort(_('invalid entry'), LOGNAME, n)
> >> +        yield line[:-1], n
> >> +        n += 1
> > 
> > Looks like this wants enumerate(). Or to not care about line numbers.
> > And possibly a class?
> 
> The line numbers were handy for verify being able to complain about
> filenamelog corruption.

> You mean a filenamelog class?

Yes.

> >> @@ -910,6 +911,7 @@
> >>  
> >>              n = self.changelog.add(mn, changed + removed, text, trp, p1, p2,
> >>                                     user, wctx.date(), extra)
> >> +            self.store.addnewfiles(tr)
> >>              self.hook('pretxncommit', throw=True, node=hex(n), parent1=xp1,
> >>                        parent2=xp2)
> >>              tr.close()
> >> @@ -1981,6 +1983,8 @@
> >>              # make changelog see real files again
> >>              cl.finalize(trp)
> >>  
> >> +            self.store.addnewfiles(tr)
> >> +
> >>              newheads = len(self.changelog.heads())
> >>              heads = ""
> >>              if oldheads and newheads != oldheads:
> >> @@ -2055,6 +2059,7 @@
> >>              for chunk in util.filechunkiter(fp, limit=size):
> >>                  ofp.write(chunk)
> >>              ofp.close()
> >> +        self.store.addnewfiles(None)
> >>          elapsed = time.time() - start
> >>          if elapsed <= 0:
> >>              elapsed = 0.001
> > 
> > That's a bit magical. Let me see if I can figure it out..
> > 
> > Ok, that's all a little hairy, but I don't have any suggestions just
> > yet.
> 
> That's pretty much the key point of the whole patch series.
> Resulting from the problem of filelog not being allowed to report
> new store files back to repo.
> 
> Which in turn results from having to maintain a filenamelog at
> all, which results from not being allowed to save the filenames
> of the hashed files inside themselves so that streamclone could
> have asked the filelog file for the name of the workspace file it
> tracks (which djc called a layering violation).

Yep, I'm aware. Here are some thoughts:

One thing I didn't point out is that your opener only needs to add to
the log when it's in write mode.

Also note that in the rare occassions when we walk the file list, we're
typically calling stat() to get the file list. This gives us the
opportunity to check that all the files mentioned actually exist, so we
don't actually have to worry about journalling this file. Because we
have the lock held, at the end of storefiles(), we can rewrite it if we
find errors. So I think we can drastically simplify things here:

- opener only appends the file in write/append mode if the file doesn't
exist
- storefiles() or filelog.files() skips reporting files that don't exist
- we rewrite the list if we find errors (extra files)

And I think that makes it all transparent to the higher levels?

> >> +    def hasfnlog(self):
> >> +        return False
> >> +
> >> +    def readfnlog(self):
> >> +        res = {}
> >> +        return res
> > 
> > We may be able to kill these.
> 
> verify needs to know if there is a filenamelog that needs to be verified.
> (at least that was my thinking when I created these methods.)

It doesn't, actually. It needn't know anything about the store. All it
needs to know is that for every file the changelog mentions, there is a
file returned by storefiles() and vice-versa. And that test can happen
on any store we can call storefiles() on.

-- 
Mathematics is the supreme nostalgia of our time.