Bug 3584 - largefiles: hg is fetching largefiles over http when default path is network share
Summary: largefiles: hg is fetching largefiles over http when default path is network ...
Status: VERIFIED FIXED
Alias: None
Product: Mercurial
Classification: Unclassified
Component: largefiles (show other bugs)
Version: earlier
Hardware: Macintosh Mac OS
: normal bug
Assignee: Bugzilla
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-08-12 01:14 UTC by Peter Linss
Modified: 2017-11-01 18:05 UTC (History)
5 users (show)

See Also:
Python Version: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Linss 2012-08-12 01:14 UTC
I have a repository on a local server with a read-only file share. The same repository is served via Apache for push (and remote) access (with authentication).

On my workstation's local repository .hg/hgrc file, the default path is the mount of the file share and the default-push is an https URL to the Apache server. With previous versions of Mercurial, updating copied largefiles from the network share and was reasonably fast. Now, with V2.3 an hg up is copying the largefiles via https and is significantly slower (verified with hg up --debug, debug output shows authentication to the https server and lists it's URL).
Comment 1 Matt Mackall 2012-09-30 18:18 UTC
Where/how do you think Mercurial is getting the http URL if not from your config files?
Comment 2 Peter Linss 2012-09-30 19:04 UTC
(In reply to comment #1)

That's the curious part, isn't it? I figure it's either using the URL from default-push instead of default, or there's some cached state left over somewhere (due to bug 3160 I often get failures while fetching largefiles if I forget to increase the file handle limit).

I'll try to reproduce this with a smaller test repository that I can share, the repository in question has about 64Gb of largefiles. Note that I made a fresh close of the repository and that is behaving as expected.
Comment 3 Matt Mackall 2012-10-19 15:19 UTC
Perhaps 'hg debugconfig paths' will tell us something useful.
Comment 4 Peter Linss 2012-10-19 16:59 UTC
Ok, I just reproduced this with a fresh test repository, same result. On further digging, it appears that when updating the repository, the repository itself is fetched from the 'default' path, but the largefiles are fetched from the 'default-push' path.

FWIW, the only paths listed in hg debugconfig paths are 'default' and 'default-push', and this was reproduced with version 2.3.1.

The scenario is (URLs modified for privacy): server with repository served via https://hg.example.com/test and also available as read-only local file share (AFS), two workstations on LAN with server's file share mounted as /Volumes/test

On the server:
hg init test
cd test
echo "one" > one.txt
hg add one.txt --large
hg commit

On workstation one:
hg clone /Volumes/test
cd test
(edit .hg/hgrc to contain:
[paths]
default = /Volumes/test
default-push = https://hg.example.com/test
)
echo "two" > two.txt
hg add two.txt --large
hg commit
hg push

Result: pushes over https as expected

On workstation two:
hg clone /Volumes/test
cd test
(edit .hg/hgrc to contain:
[paths]
default = /Volumes/test
default-push = https://hg.example.com/test
)
hg pull -u --debug --verbose

The output of the pull is:

pulling from /Volumes/test
query 1; heads
searching for changes
all local heads known remotely
1 changesets found
list of changesets:
f70fb8571ca4b0cbf89c6fc752f38eea9215180a
adding changesets
bundling: 1/1 changesets (100.00%)
bundling: 1/1 manifests (100.00%)
bundling: .hglf/two.txt 1/1 files (100.00%)
changesets: 1 chunks
add changeset f70fb8571ca4
adding manifests
manifests: 1/1 chunks (100.00%)
adding file changes
adding .hglf/two.txt revisions
files: 1/1 chunks (100.00%)
added 1 changesets with 1 changes to 1 files
listing keys for "phases"
updating the branch cache
calling hook changegroup.lfiles: <function checkrequireslfiles at 0x109cbade8>
checking for updated bookmarks
listing keys for "bookmarks"
 searching for copies back to rev 1
 unmatched files in other:
  .hglf/two.txt
resolving manifests
overwrite: False, partial: False
ancestor: 667923aeb35c, local: 667923aeb35c+, remote: f70fb8571ca4
.hglf/two.txt: remote created -> g
updating: .hglf/two.txt 1/1 files (100.00%)
getting .hglf/two.txt
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
getting changed largefiles
using https://hg.example.com/test
sending capabilities command
using auth.example.* for authentication
hg.example.com certificate successfully verified
[HgKeyring] Keyring URL: https://hg.example.com/
[HgKeyring] Username found in .hg/hgrc: plinss
[HgKeyring] Looking for password for user plinss and url https://hg.example.com/
[HgKeyring] Keyring password found. Url: https://hg.example.com/test, user: plinss, passwd: *******
using auth.example.* for authentication
hg.example.com certificate successfully verified
getting largefiles: 0/1 lfile (0.00%)
getting two.txt:7bbef45b3bc70855010e02460717643125c3beca
sending batch command
using auth.example.* for authentication
[HgKeyring] Keyring URL: https://hg.example.com/
[HgKeyring] Cached auth data found. Url: https://hg.example.com/test, user: plinss, passwd: *******
using auth.example.* for authentication
hg.example.com certificate successfully verified
sending getlfile command
using auth.example.* for authentication
[HgKeyring] Keyring URL: https://hg.example.com/
[HgKeyring] Cached auth data found. Url: https://hg.example.com/test, user: plinss, passwd: *******
using auth.example.* for authentication
hg.example.com certificate successfully verified
found 7bbef45b3bc70855010e02460717643125c3beca in store
1 largefiles updated, 0 removed
caching new largefiles
0 largefiles cached

As you can see, the pull came from the file mount, but the largefiles came over https.

Repeating the test (by committing a third largefile on workstation 1) but removing the 'default-push' from the hgrc of workstation 2 and the largefiles are fetched directly from the file mount. So it looks quite clear that the largefiles are fetched from the 'default-push' path instead of the 'default' path.
Comment 5 Matt Mackall 2012-10-20 16:24 UTC
You're absolutely right.

downloadlfiles
 cachelfiles
  _openstore
   path = ui.expandpath('default-push', 'default')

In short, largefiles has no idea whether it's pushing or pulling when it gets down to _openstore.. and assumes it's pushing.

Instead, we should probably store two URLs in the store object and use one for get and one for put.
Comment 6 Matt Harbison 2012-10-20 20:12 UTC
(In reply to comment #5)

I may be missing something here, but it looks like it does know if it is pushing or pulling- sorta.  'lfpullsource' is set on the repo in overridepull() for use in _openstore().

It looks like what has changed is 'lfpullsource' is now run through ui.expandpath(), since this change, which landed in 2.1:

changeset:   15943:f9efb325ea32
branch:      stable
user:        Na'Tosha Bard <natosha@unity3d.com>
date:        Fri Jan 20 11:56:12 2012 +0100
summary:     largefiles: fix caching largefiles from an aliased repo (issue3212)

overridepull() sets lfpullsource to whatever the 'source' parameter is (which is likely None in this case), and on the very next line sets 'source' to default and invokes the original pull command.  Meanwhile in _openstore(), 'lfpullsource' is None, so it uses expands 'default-push' and failing that, 'default'.  All this makes corresponds to the report.  Does 'hg pull -u default' work as expected?


What I don't get is how this ever worked.  The old code was:

        path = (getattr(repo, 'lfpullsource', None) or
                ui.expandpath('default-push', 'default'))

Assuming 'lfpullsource' is None, don't you still end up with an expansion of 'default-push'?  The other relevant code in overridepull() and _openstore() looks like the original import of largefiles.

So the fix looks easy- set lfpullsource to 'default' instead of None in overridepull().  But any thoughts on what a test for this should look like without the SSL setup?
Comment 7 Matt Harbison 2012-10-21 01:42 UTC
Nevermind the test case- patch submitted here:

http://www.selenic.com/pipermail/mercurial-devel/2012-October/045564.html
Comment 8 HG Bot 2012-10-22 18:13 UTC
Fixed by http://selenic.com/repo/hg/rev/1e4eb1faba6e
Matt Harbison <matt_harbison@yahoo.com>
largefiles: use 'default' instead of 'default-push' when pulling (issue3584)

This only applies to downloading largefiles, and only when no source for the
pull is explicitly provided.  The repository itself was properly being pulled
via 'default' previously.

Using --all-largefiles is not necessary on a bare pull to test this (this
existing test is merely a convenience), but it is required to test pulling on
the rebase path.

Note that the errors generated in the --rebase case are because the repo
specified doesn't have the largefiles in its cache (though they are in the user
cache), so the errors are misleading.  Specifying --all-largefiles when cloning
to 'b' fixes this, but instead of errors, it reports caching only 5 largefiles
instead of the 9 that come up missing.  Likely this is because the largefile
download procedure tries to download missing files for each rev, and some of the
files have standins in more than one rev that gets pulled.

(please test the fix)
Comment 9 Na'Tosha Bard 2012-12-09 10:13 UTC
I have verified that this issue is fixed after Matt's patch.