[PATCH 3 of 3] introduce filenamelog repository layout

Matt Mackall mpm at selenic.com
Tue Jul 22 14:35:18 CDT 2008


On Tue, 2008-07-22 at 21:09 +0200, Adrian Buehlmann wrote:
> 
> On 22.07.2008 17:29, Matt Mackall wrote:
> > On Tue, 2008-07-22 at 10:12 +0200, Adrian Buehlmann wrote:
> >>> Also note that in the rare occassions when we walk the file list, we're
> >>> typically calling stat() to get the file list.
> >     ^^^^^^^^^^^^^^^^^^^^^^^^
> > 
> >>> - storefiles() or filelog.files() skips reporting files that don't exist
> >> This assumes we stat for the entries to validate the entry before
> >> using it.
> > 
> > The only user of this thing that matters performance-wise is
> > store.walk() as called by streamclone, which is already statting all the
> > files to determine their size. 
> > 
> >> I currently fail to see what would justify doing such a drastic
> >> thing.
> > 
> > It removes all the transaction complexity and completely hides the whole
> > issue from everything outside of the opener for the new store.
> 
> There is no need to do this, as my patch demonstrates.

What your patch demonstrates is that yes, it can be made to work your
way, but that it's not very pretty. You've got to have hooks in opener,
store, commit, streamclone, and clone to deal with new files. And you've
got to deal with transactions. My approach hooks in only at opener.

> This does not justify to abandon the append-only nature
> of the filenamelog.

My approach is append-only until it's broken by a rollback or some other
disaster, just like yours. Yes, rewriting when we find breakage is
suboptimal. But a) it's dead simple b) it's infrequent and c) we're
going to have the lock held anyway.

> >>> - we rewrite the list if we find errors (extra files)
> >> This assumes we have a write lock when reading the filenamelog.
> >> This might not be true in the future. For example, in the future,
> >> we might be able to implement a lock-free streamclone.
> > 
> > It's not likely that streamclone can be made lock-free. But take a look
> > at how localrepo.status works, starting at:
> > 
> > 	        # update dirstate for files that are actually clean             
> > 
> >>> And I think that makes it all transparent to the higher levels?
> >>>
> >>>>>> +    def hasfnlog(self):
> >>>>>> +        return False
> >>>>>> +
> >>>>>> +    def readfnlog(self):
> >>>>>> +        res = {}
> >>>>>> +        return res
> >>>>> We may be able to kill these.
> >>>> verify needs to know if there is a filenamelog that needs to be verified.
> >>>> (at least that was my thinking when I created these methods.)
> >>> It doesn't, actually. It needn't know anything about the store. All it
> >>> needs to know is that for every file the changelog mentions, there is a
> >>> file returned by storefiles() and vice-versa. And that test can happen
> >>> on any store we can call storefiles() on.
> >> There's a new file 'filenamelog' in the store we want to trust to have
> >> the filenames of all files in the store.
> > 
> > And how do we do that? We check that for every file mentioned in the
> > changelog and manifest, there's a file mentioned by repo.storefiles(). 
> > Which is a good test regardless of what store type we're using.
> 
> You haven't replied to my argument that storefiles() needs to dir-walk
> the store directory for the non-filenamelog repos.

That's because it's so obviously a good thing! Of course verify should
verify that every revlog is mentioned in the changelog!

Similar arguments apply to debugfilenamelog: if we're going to have a
way to get a list of all the revlogs and their corresponding names, of
course it should work regardless of the store type.

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial-devel mailing list