D1606: phases: drop the list with phase of each rev, always comput phase sets

quark (Jun Wu) phabricator at mercurial-scm.org
Wed Dec 6 21:02:42 EST 2017


quark requested changes to this revision.
quark added a comment.
This revision now requires changes to proceed.


  I have been also wanting to remove this O(changelog) space usage for long. I noticed phasecache was the biggest memory consumer among all `repo.` components (dirstate, changelog, etc) when prototyping stateful chg early this year.
  
  Requesting changes mainly because the `if public in phases` branch in `getrevset` has issues. Otherwise it looks good to me.

INLINE COMMENTS

> revlog.c:627
>  
>  static PyObject *compute_phases_map_sets(indexObject *self, PyObject *args)
>  {

I guess the `map` here means a `rev` -> `phase` map, which was the list. Since we now return len(size). Maybe rename the function to `compute_phases_len_sets`.

> revlog.c:1965
>  	 "get an index entry"},
>  	{"computephasesmapsets", (PyCFunction)compute_phases_map_sets,
>  			METH_VARARGS, "compute phases"},

Also here, `computephaseslensets`

> phases.py:226
>          else:
> -            # slow path - enumerate all revisions
> -            phase = self.phase
> -            revs = (r for r in repo if phase(repo, r) in phases)
> -            return smartset.generatorset(revs, iterasc=True)
> +            assert not repo.changelog.filteredrevs
> +            phases = set(allphases).difference(phases)

The `assert` is incorrect - repo could be "filtered". The old code support filtered repo implicitly by using `for r in repo`, the iterator of `repo` will remove filtered revisions.

> phases.py:226-237
> +            assert not repo.changelog.filteredrevs
> +            phases = set(allphases).difference(phases)
> +            if not phases:
> +                return smartset.fullreposet()
> +            if len(phases) == 1:
> +                [p] = phases
> +                revs = self._phasesets[p]

Sorry - I made a mistake using `repo.revs('public()')` to test the logic here. Actually that exercises a different code path <https://www.mercurial-scm.org/repo/hg/file/aa905f9cdcda/mercurial/revset.py#l1616>.

There are practically no users exercising this `public in phases` code path. So issues like `fullreposet` requires a `repo` argument is not exposed.

I think we can either:

- Pass an optional `subset` here then replace the revset implementation to use it:

  def getrevset(self, repo, phases, subset=None):
      if subset is None:
          subset = smartset.fullreposet(repo)
      if ...
          # fast path
          ...
          return subset & smartset.baseset(revs)
      else:
          ...



- Remove support for calculating `public()` here by just `raise error.ProgrammingError('public() is unsupported')`

I guess the former is preferred eventually. But if you want to land this faster, the latter is easier. I can help writing a follow-up.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1606

To: joerg.sonnenberger, #hg-reviewers, quark
Cc: quark, mercurial-devel


More information about the Mercurial-devel mailing list