bfiles filename encoding

Benjamin Pollack benjamin at
Mon Jun 7 09:45:07 CDT 2010

On Jun 7, 2010, at 9:28 AM, Greg Ward wrote:

> On Sat, Jun 5, 2010 at 6:29 PM, Benjamin Pollack <benjamin at> wrote:
>> Greg: the more I play with this, and with bfiles on Windows, the more I'm thinking that at least the push destinations should be encoded using the fncache naming strategy.
> Well, I *know* that the structure of bfiles' central store will have
> to change the minute someone tries to bfput a file called "aux" to a
> central store running on Windows.  Or even "foo" and "Foo".  In fact
> the case-sensitivity issue will almost certainly bite on OS X just as
> soon as I write a test for it.  The only solution I can see is to
> encode filenames on the central store, and reusing Mercurial's code
> for doing that seems very desirable.
> But I don't understand what you mean by "the push destinations should
> be encoded".  Are you talking about wire protocol changes?  That seems
> unnecessary; this is all about dealing with filesystems that are not
> 100% traditional Unix filesystems: HFS+ and NTFS.

I didn't fully think through what I wrote, for which I apologize, but we're actually on the same page. Assuming you plan to accept Alex's patches to allow storing on an HTTP serve, the idea would be to allow any old server that supported PUTing to the file system to support bfiles out-of-the-box.  If we use whatever encoding scheme we work out for the client to determine the path to propose on the server, we're fine either way. So the next step is to pick such a scheme.

>> (There's an argument for .hgbfiles being that way, too, due to file name length limits on Windows, but I'm happy to discuss that issue separately.)
> Oh crap, I hadn't thought about that.  But is it really a problem?  I
> mean, if you have
>  .hgbfiles/really/long/deep/path/to/bigfile
> then that represents
>  really/long/deep/path/to/bigfile
> which is only slightly shorter than the path in .hgbfiles.  So
> mangling paths in .hgbfiles to workaround Windows brain damage only
> buys, what, 10 more bytes of headroom in the path?  Not worth it,

If that were the only reason, I'd agree, but it seems to me we still want to use store.hybridencode or a variant for other reasons.

It did literally just occur to me while writing the previous sentence that we can't just use store.hybridencode, though, even with the other changes I was proposing to Adrian. When the reversible mapping is used, you're fine, but the non-reversible one won't work, because it'd mean that the file being looked up on the remote end could change based on where you cloned on your machine. I frakking hate Windows.

So apparently, yes, we need a new encoding scheme anyway.

> Perhaps we should cook up a new filename encoding algorithm for
> bfiles.  If it works, we could even propose it for core Mercurial once
> people have the appetite for yet another change there.

How would you feel about simply using the SHA1 checksum of the full path as the filename? It wouldn't be reversible, but it'd also always work--and that's already how you're storing the file contents anyway.


More information about the Mercurial-devel mailing list