[Gsoc - 2016] Allow largefiles to be at a different location

Matt Harbison mharbison72 at gmail.com
Fri Mar 4 22:08:38 EST 2016


On Fri, 04 Mar 2016 06:10:28 -0500, Piotr Listkiewicz  
<piotr.listkiewicz at gmail.com> wrote:

>>
>> I'm not sure if there's a lot of teaching value in this, but maybe
>> consider replacing some of the os.path.* code in largefiles with the vfs
>> layer.  There might be enough there that you can see how it looks up  
>> files
>> in the store, and so forth.  If nothing else, it will give you  
>> something to
>> do when looking at the code.
>>     https://www.mercurial-scm.org/wiki/WindowsUTF8Plan
>
>
> I would like to do it, but i need guidance.
>
> In function lfutil.storepath(
> https://selenic.com/hg/file/e00e57d83653/hgext/largefiles/lfutil.py#l175)
> is returned path to the largefile directory in .hg, it is used nearly
> everywhere else as base path.

I'm not sure I follow.  A quick reading of the code looks like the callers  
use this as-is: an absolute path.

> This method could be refactored to return vfs object - which would be  
> used
> instead of invoking for example util.makedirs, but i have no idea how
> should i do it properly

I'm a bit unclear on the plan for this.  The wiki says the plan is to get  
rid of util.* on repo _relative paths_.  Somewhere in the last few years,  
I got the impression from the mailing list that util.* methods were fine  
(I can't find a citation for this).  If this future unicode layer is to be  
used unconditionally on Windows, I'm not sure why the appropriate util.*  
methods can't be replaced, similar to how util is assigned methods from  
posix.py or windows.py.

I don't recall exactly why I didn't use a vfs object here.  But it may  
have been because you may need a vfs object relative to this repo's  
.hg/largefiles, or the share source's cache, or the standalone user cache  
directory.  I wasn't sure if it was good to have more (rarely used) vfs  
fields in localrepo for sharing and largefiles, so I guess I punted.

I think foozy did a bunch of vfs stuff in the last year or so, so I Cc'd  
him.

> ( i also dont understand why this function
> returns repo.vfs.reljoin(repo.sharedpath, longname, hash)
> or repo.join(longname, hash)  - i just dont get the difference).

repo.join() will give you an absolute path, after you give it repo  
relative path components: "/path/to/repo/$longname/$hash".

repo.sharedpath is already an absolute path (to the other local repo that  
is the source of the sharing).  reljoin() just adds the given parts  
together, without the implicit '/path/to/repo' prefix.  So you end up with  
"/path/to/shareparent/.hg/$longname/$hash".

The other thing to pay attention to as you read the vfs code is that  
repo.vfs is relative to "/path/to/repo/.hg" (repo data), and repo.wvfs is  
relative to "/path/to/repo" (working directory).

> I would appreciate any hints or guidance.
>
> This is more advanced, but would be nice to have in some form:
>>     https://bz.mercurial-scm.org/show_bug.cgi?id=4242
>> (My recollection of this is that it wants to verify the files upstream  
>> on
>> the default path instead of touching anything locally.  I think I end up
>> using '--config paths.default=' to force it to verify locally.)
>
>
> I sent patch to it, can you take a look and do code-review?

Sorry, I was too busy to get to it earlier.

Another thing I ran into today testing that patch is that it keeps  
prompting for a password when I verify against a password protected https  
server.  Whatever it is doing, it isn't reusing the same connection for  
each file it fetches.  Usually it isn't an issue for me, but the Windows  
source install doesn't know how to access the keyring extension shipped  
with thg.  That sort of connection management might be a good thing to  
understand for what you want to do.

>
>> Is there some specific area you are wondering about?  Maybe look for
>> commits that start with 'largefiles:'.  I fixed a series of bugs about a
>> year ago or so, and tried to leave enough in the commit comment to  
>> explain
>> what was wrong or how something works.
>
>
> Now im trying to understand how largefile works in big picture(and inner
> workings of it) , so i have no specific area at this moment.
>
> 2016-03-01 4:05 GMT+01:00 Matt Harbison <mharbison72 at gmail.com>:
>
>> On Mon, 29 Feb 2016 09:57:21 -0500, Piotr Listkiewicz <
>> piotr.listkiewicz at gmail.com> wrote:
>>
>> Hello,
>>> I am Piotr Listkiewicz (nicknamed liscju), Computer Science student  
>>> from
>>> Cracow in Poland, maybe some of you remembers me from 3.6 Sprint in
>>> London.
>>>
>>> I am interested in working on "Allow largefiles to be at a different
>>> location" project, but i need guidance.
>>>
>>> First of all ,are there any easy bugs that you would recommend for the
>>> newcomer for largefile extension for familiarizing myself with the  
>>> source
>>> code?
>>>
>>
>> I don't think there are any easy bugs left.  The wrapping done by the
>> extension can make things surprisingly complicated.  There are a few
>> additional archived (hidden) largefile bugs on bz, but I'm not sure they
>> are easy either.
>>
>> I'm not sure if there's a lot of teaching value in this, but maybe
>> consider replacing some of the os.path.* code in largefiles with the vfs
>> layer.  There might be enough there that you can see how it looks up  
>> files
>> in the store, and so forth.  If nothing else, it will give you  
>> something to
>> do when looking at the code.
>>
>>     https://www.mercurial-scm.org/wiki/WindowsUTF8Plan
>>
>> This is more advanced, but would be nice to have in some form:
>>
>>     https://bz.mercurial-scm.org/show_bug.cgi?id=4242
>>
>> (My recollection of this is that it wants to verify the files upstream  
>> on
>> the default path instead of touching anything locally.  I think I end up
>> using '--config paths.default=' to force it to verify locally.)
>>
>> Secondly , are there any other documents that didn't mentioned at
>>> https://www.mercurial-scm.org/wiki/SummerOfCode/Ideas2016 that would be
>>> helpful for familiarizing myself with largefile and project subject in
>>> general?
>>>
>>
>> The wiki is pretty good at a high level:
>>
>>     https://www.mercurial-scm.org/wiki/LargefilesExtension
>>
>> The "magic" of this extension is mostly that it patches up the matcher
>> object and hands off to core Mercurial to do most of the work, in most
>> situations.  e.g. if the user does `hg add --large foo`, the largefiles
>> add() code changes the matcher to contain '.hglf/foo', remove 'foo', and
>> then passes it into the core add() function.  (All without touching  
>> normal
>> file references, if any, of course.)
>>
>> Is there some specific area you are wondering about?  Maybe look for
>> commits that start with 'largefiles:'.  I fixed a series of bugs about a
>> year ago or so, and tried to leave enough in the commit comment to  
>> explain
>> what was wrong or how something works.
>>
>> Unfortunately, I don't know anything about the wire protocol to help you
>> there.
>>
>>
>> I would be interested in any piece of advice what should i do, how to  
>> start
>>> working on the project and all relevant information as well.
>>>


More information about the Mercurial-devel mailing list