Skip to content

Conversation

jakelishman
Copy link
Member

@jakelishman jakelishman commented May 23, 2025

This commit removes the scaling of the basic heuristic of Sabre inversely proportional to the front-layer population in all of Qiskit's defaults.

Previously, the "basic" heuristic of Sabre had a scaling factor applied to it such that the total contribution of the front layer to the full score of a swap remained approximately the same magnitude (scaled by the sums of the distances), no matter how many gates were in the layer. However, since the implementation of relative scoring1, we realise that only the score delta caused by a swap matters. A swap can affect at most two gates in the front layer, so scaling down the impact of this based on the population of the front layer only has the effect of allowing other heuristics to entirely dominate the "basic" component, when really the basic heuristic is the most critical for forwards progress in the algorithm.

In the worst-case scenario of a front-layer populated with more than 40 gates, the extended set with a fixed size of 20 gates, a relative weight of 0.5 and similar population-based scaling would entirely override the front-layer heuristic, which would only be used as a tie-break. For such circuits, this caused the release-valve mechanism to trigger on almost every gate in the worst cases, completely destroying the performance and quality of the lookahead methods. This was observed in currently unrealistic circuits like a 1081-width, 10-depth volume circuit, and 400+ qubit QFTs, but is likely to have had some effect at smaller circuits.

Summary

Details and comments

Benchmarking a small selection of indicative quality trackers:

| Change   |   Before [eb52aa38] <sabre/better-defaults-old~1> |   After [b070e219]  |   Ratio | Benchmark (Parameter)                                                                            |
|----------|---------------------------------------------------|---------------------|---------|--------------------------------------------------------------------------------------------------|
| -        |                                            143441 |              107094 |    0.75 | qft.LargeQFTMappingTrackBench.track_depth_sabre_swap(1081, 'decay')                              |
| -        |                                             29042 |               23776 |    0.82 | qft.LargeQFTMappingTrackBench.track_depth_sabre_swap(409, 'decay')                               |
| -        |                                             12829 |                6360 |    0.5  | quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(1081, 10, 'decay')     |
| -        |                                             13106 |                4880 |    0.37 | quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(1081, 10, 'lookahead') |
| -        |                                               537 |                 470 |    0.88 | quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(115, 10, 'decay')      |
| -        |                                               552 |                 496 |    0.9  | quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(115, 10, 'lookahead')  |
| -        |                                              5647 |                4529 |    0.8  | quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(115, 100, 'decay')     |
| -        |                                              5754 |                4854 |    0.84 | quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(115, 100, 'lookahead') |
| -        |                                              3345 |                1396 |    0.42 | quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(409, 10, 'decay')      |
| -        |                                              3939 |                1927 |    0.49 | quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(409, 10, 'lookahead')  |
| -        |                                            258273 |              149650 |    0.58 | quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(1081, 10, 'decay')      |
| -        |                                            223036 |              147112 |    0.66 | quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(1081, 10, 'lookahead')  |
| -        |                                             50178 |               31920 |    0.64 | quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(409, 10, 'decay')       |
| -        |                                             60422 |               32892 |    0.54 | quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(409, 10, 'lookahead')   |
| -        |                                              2295 |                2053 |    0.89 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')                                       |
| -        |                                              2295 |                2053 |    0.89 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')                                       |
| -        |                                              2295 |                2053 |    0.89 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr')                                      |
| Change   |   Before [eb52aa38] <sabre/better-defaults-old~1> |   After [b070e219]  |   Ratio | Benchmark (Parameter)                                                     |
|----------|---------------------------------------------------|---------------------|---------|---------------------------------------------------------------------------|
| +        |                                            132940 |              148497 |    1.12 | qft.LargeQFTMappingTrackBench.track_depth_sabre_swap(1081, 'lookahead')   |
| +        |                                              1680 |                1867 |    1.11 | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')               |
| +        |                                              1687 |                1870 |    1.11 | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')               |
| +        |                                              1687 |                1870 |    1.11 | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')              |
| +        |                                               375 |                 432 |    1.15 | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')  |
| +        |                                               375 |                 432 |    1.15 | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')  |
| +        |                                               375 |                 432 |    1.15 | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr') |

The ~15% regression in a coupling of the utility-scale benchmarks are not concerning; this is single-seed runs, so there's some variation. What's we care about is the improvement of lookahead (and to a lesser degree decay, whose flaws are now more exposed) on the large-scale benchmarks.

qv

This graph is showing where the improvement is coming from. for something approximately equivalent to the LargeQuantumVolumeMapping suite. Top graph is this PR, bottom is its parent. The x-axis is "iterations", which is the number of times the swap-scorer is invoked (so it's a measure of progress through the algorithm). Solid traces are the number of candidate swaps (that line was me tracing information for another change I want to make). The dashed traces are the number of gates remaining to route. Vertical dotted lines mark points where the release valve triggers.

The takeaway is that the previous lookahead and decay defaults were completely broken at this scale; they stymie the algorithm from actually making forwards progress. The new form fixes them.

I have several more takeaways and improvements to follow on from this preliminary analysis, but this is a first step that fixes our super-scaling.

Footnotes

  1. 02a1939: Calculate relative, not absolute, scores in SabreSwap

This commit removes the scaling of the basic heuristic of Sabre
inversely proportional to the front-layer population in all of
Qiskit's defaults.

Previously, the "basic" heuristic of Sabre had a scaling factor applied
to it such that the _total_ contribution of the front layer to the
full score of a swap remained approximately the same magnitude (scaled
by the sums of the distances), no matter how many gates were in the
layer.  However, since the implementation of relative scoring[^1], we
realise that only the score delta caused by a swap matters.  A swap can
affect at most two gates in the front layer, so scaling down the impact
of this based on the population of the front layer only has the effect
of allowing other heuristics to entirely dominate the "basic" component,
when really the basic heuristic is the most critical for forwards
progress in the algorithm.

In the worst-case scenario of a front-layer populated with more than 40
gates, the extended set with a fixed size of 20 gates, a relative weight
of 0.5 and similar population-based scaling would entirely override the
front-layer heuristic, which would only be used as a tie-break.  For
such circuits, this caused the release-valve mechanism to trigger on
almost every gate in the worst cases, completely destroying the
performance and quality of the lookahead methods.  This was observed in
currently unrealistic circuits like a 1081-width, 10-depth volume
circuit, and 400+ qubit QFTs, but is likely to have had some effect at
smaller circuits.

[^1]: 02a1939: Calculate relative, not absolute, scores in SabreSwap
@jakelishman jakelishman requested a review from a team as a code owner May 23, 2025 16:46
@jakelishman jakelishman added performance Changelog: API Change Include in the "Changed" section of the changelog mod: transpiler Issues and PRs related to Transpiler labels May 23, 2025
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core

@mtreinish mtreinish added this to the 2.1.0 milestone May 23, 2025
@coveralls
Copy link

coveralls commented May 23, 2025

Pull Request Test Coverage Report for Build 15309357816

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 7 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.05%) to 87.866%

Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 1 93.48%
crates/qasm2/src/parse.rs 6 97.61%
Totals Coverage Status
Change from base Build 15309125608: 0.05%
Covered Lines: 79570
Relevant Lines: 90558

💛 - Coveralls

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. the change makes sense to me. I'll spin up a benchpress run with this and compare it against the 2.0.2 data I generated for testing it fixed #14402

heuristic = (
Heuristic(attempt_limit=10 * coupling_map.size())
.with_basic(1.0, SetScaling.Size)
.with_basic(1.0, SetScaling.Constant)
.with_lookahead(0.5, 20, SetScaling.Size)
.with_decay(0.001, 5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're still using the decay here? I'm surprised you didn't want to remove it because I thought it didn't help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done the research legwork to properly make that claim, so I'd rather leave it and do it in a separate PR once I can back it up.

@mtreinish mtreinish self-assigned this May 27, 2025
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a benchpress run locally and compared it to my 2.0.2 run from last week and compared to the 2q gate count:

test (2)

For the most part this looks like things are behaving the same, it's hard to say for certain though without a statistical sample over multiple seeds which benchpress doesn't provide. But, I think the analysis in the commit message and PR summary is convincing that this is a good change to make even if it doesn't impact the swap count substantially in some benchmarks. There isn't a clear regression highlighted and that's more important.

@mtreinish mtreinish added this pull request to the merge queue May 28, 2025
@jakelishman
Copy link
Member Author

The most outlying improvement dot, in line ~1000 2q gates in this PR and ~2600 gates in 2.0.2, is test_hamiltonians[ham_fh-graph-1D-grid-nonpbc-qubitnodes_Lx-90_U-2_enc-jw-linear], which is a test that may have previously been triggering the release valve, but benchpress sets no seeds, so I wouldn't actually be able to reproduce if it did so in that test run.

@jakelishman
Copy link
Member Author

The other notable outlier (at ~75k 2q gates in this PR, ~150k 2q gates in 2.0.2) is test_QASMBench_large[qft_n320-linear], which is another that may have been getting stuck in the release valve again, but it's hard to say.

Merged via the queue into Qiskit:main with commit 732e2f7 May 28, 2025
26 checks passed
@jakelishman jakelishman deleted the sabre/better-defaults-old branch May 28, 2025 21:51
rahaman-quantum pushed a commit to rahaman-quantum/qiskit that referenced this pull request Jun 20, 2025
* Improve Sabre heuristic defaults

This commit removes the scaling of the basic heuristic of Sabre
inversely proportional to the front-layer population in all of
Qiskit's defaults.

Previously, the "basic" heuristic of Sabre had a scaling factor applied
to it such that the _total_ contribution of the front layer to the
full score of a swap remained approximately the same magnitude (scaled
by the sums of the distances), no matter how many gates were in the
layer.  However, since the implementation of relative scoring[^1], we
realise that only the score delta caused by a swap matters.  A swap can
affect at most two gates in the front layer, so scaling down the impact
of this based on the population of the front layer only has the effect
of allowing other heuristics to entirely dominate the "basic" component,
when really the basic heuristic is the most critical for forwards
progress in the algorithm.

In the worst-case scenario of a front-layer populated with more than 40
gates, the extended set with a fixed size of 20 gates, a relative weight
of 0.5 and similar population-based scaling would entirely override the
front-layer heuristic, which would only be used as a tie-break.  For
such circuits, this caused the release-valve mechanism to trigger on
almost every gate in the worst cases, completely destroying the
performance and quality of the lookahead methods.  This was observed in
currently unrealistic circuits like a 1081-width, 10-depth volume
circuit, and 400+ qubit QFTs, but is likely to have had some effect at
smaller circuits.

[^1]: 02a1939: Calculate relative, not absolute, scores in SabreSwap

* Restore test coverage and comments

* Change release note category
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: API Change Include in the "Changed" section of the changelog mod: transpiler Issues and PRs related to Transpiler performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants