Skip to content

Conversation

mtreinish
Copy link
Member

Summary

In #13921 a check was added to circuit_to_instruction() for determining if the circuit has a control flow op. This check was adding significant overhead to the function, nightly asv benchmarks are showing up to a ~9x slowdown. #13921 attempted to fix this regression but did so by adding a new API which isn't applicable at this point for 2.0. That PR could/should still be used in 2.1. In the meantime this commit seeks to solve the performance regression by adjusting how the check is performed to mitigate the runtime regression for 2.0.

Details and comments

In Qiskit#13921 a check was added to circuit_to_instruction() for determining
if the circuit has a control flow op. This check was adding significant
overhead to the function, nightly asv benchmarks are showing up to a ~9x
slowdown. Qiskit#13921 attempted to fix this regression but did so by adding a
new API which isn't applicable at this point for 2.0. That PR could/should
still be used in 2.1. In the meantime this commit seeks to solve the
performance regression by adjusting how the check is performed to
mitigate the runtime regression for 2.0.
@mtreinish mtreinish added performance Changelog: None Do not include in changelog labels Mar 1, 2025
@mtreinish mtreinish added this to the 2.0.0 milestone Mar 1, 2025
@mtreinish mtreinish requested a review from a team as a code owner March 1, 2025 23:55
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core

@mtreinish
Copy link
Member Author

I ran some benchmarks locally on my laptop to test the performance improvement here:

Benchmarks that have improved:

| Change   | Before [106864c6] <fix-pref-reg-to-inst~1>   | After [53b142e9] <fix-pref-reg-to-inst>   |   Ratio | Benchmark (Parameter)                                                |
|----------|----------------------------------------------|-------------------------------------------|---------|----------------------------------------------------------------------|
| -        | 70.8±6μs                                     | 60.8±3μs                                  |    0.86 | converters.ConverterBenchmarks.time_circuit_to_instruction(8, 8)     |
| -        | 102±0.9μs                                    | 81.1±1μs                                  |    0.8  | converters.ConverterBenchmarks.time_circuit_to_instruction(14, 8)    |
| -        | 340±8μs                                      | 252±8μs                                   |    0.74 | converters.ConverterBenchmarks.time_circuit_to_instruction(53, 8)    |
| -        | 162±20μs                                     | 107±0.5μs                                 |    0.66 | converters.ConverterBenchmarks.time_circuit_to_instruction(20, 8)    |
| -        | 250±30μs                                     | 160±2μs                                   |    0.64 | converters.ConverterBenchmarks.time_circuit_to_instruction(32, 8)    |
| -        | 76.2±4μs                                     | 33.2±0.7μs                                |    0.44 | converters.ConverterBenchmarks.time_circuit_to_instruction(1, 128)   |
| -        | 98.6±8μs                                     | 39.9±0.5μs                                |    0.41 | converters.ConverterBenchmarks.time_circuit_to_instruction(2, 128)   |
| -        | 196±7μs                                      | 61.9±4μs                                  |    0.32 | converters.ConverterBenchmarks.time_circuit_to_instruction(5, 128)   |
| -        | 301±20μs                                     | 89.2±5μs                                  |    0.3  | converters.ConverterBenchmarks.time_circuit_to_instruction(8, 128)   |
| -        | 452±20μs                                     | 129±10μs                                  |    0.29 | converters.ConverterBenchmarks.time_circuit_to_instruction(14, 128)  |
| -        | 1.70±0.09ms                                  | 466±100μs                                 |    0.27 | converters.ConverterBenchmarks.time_circuit_to_instruction(53, 128)  |
| -        | 678±80μs                                     | 177±6μs                                   |    0.26 | converters.ConverterBenchmarks.time_circuit_to_instruction(20, 128)  |
| -        | 1.12±0.1ms                                   | 276±30μs                                  |    0.25 | converters.ConverterBenchmarks.time_circuit_to_instruction(32, 128)  |
| -        | 827±30μs                                     | 114±3μs                                   |    0.14 | converters.ConverterBenchmarks.time_circuit_to_instruction(1, 2048)  |
| -        | 1.20±0.06ms                                  | 157±2μs                                   |    0.13 | converters.ConverterBenchmarks.time_circuit_to_instruction(2, 2048)  |
| -        | 3.19±0.3ms                                   | 375±6μs                                   |    0.12 | converters.ConverterBenchmarks.time_circuit_to_instruction(1, 8192)  |
| -        | 4.60±0.5ms                                   | 537±10μs                                  |    0.12 | converters.ConverterBenchmarks.time_circuit_to_instruction(2, 8192)  |
| -        | 6.25±0.4ms                                   | 704±10μs                                  |    0.11 | converters.ConverterBenchmarks.time_circuit_to_instruction(14, 2048) |
| -        | 2.43±0.1ms                                   | 279±10μs                                  |    0.11 | converters.ConverterBenchmarks.time_circuit_to_instruction(5, 2048)  |
| -        | 3.77±0.1ms                                   | 420±8μs                                   |    0.11 | converters.ConverterBenchmarks.time_circuit_to_instruction(8, 2048)  |
| -        | 9.56±1ms                                     | 942±10μs                                  |    0.1  | converters.ConverterBenchmarks.time_circuit_to_instruction(5, 8192)  |
| -        | 15.0±0.7ms                                   | 1.35±0.03ms                               |    0.09 | converters.ConverterBenchmarks.time_circuit_to_instruction(8, 8192)  |

Benchmarks that have stayed the same:

| Change   | Before [106864c6] <fix-pref-reg-to-inst~1>   | After [53b142e9] <fix-pref-reg-to-inst>   | Ratio   | Benchmark (Parameter)                                                |
|----------|----------------------------------------------|-------------------------------------------|---------|----------------------------------------------------------------------|
|          | n/a                                          | n/a                                       | n/a     | converters.ConverterBenchmarks.time_circuit_to_instruction(14, 8192) |
|          | n/a                                          | n/a                                       | n/a     | converters.ConverterBenchmarks.time_circuit_to_instruction(20, 2048) |
|          | n/a                                          | n/a                                       | n/a     | converters.ConverterBenchmarks.time_circuit_to_instruction(20, 8192) |
|          | n/a                                          | n/a                                       | n/a     | converters.ConverterBenchmarks.time_circuit_to_instruction(32, 2048) |
|          | n/a                                          | n/a                                       | n/a     | converters.ConverterBenchmarks.time_circuit_to_instruction(32, 8192) |
|          | n/a                                          | n/a                                       | n/a     | converters.ConverterBenchmarks.time_circuit_to_instruction(53, 2048) |
|          | n/a                                          | n/a                                       | n/a     | converters.ConverterBenchmarks.time_circuit_to_instruction(53, 8192) |
|          | 27.6±1μs                                     | 26.8±0.5μs                                | 0.97    | converters.ConverterBenchmarks.time_circuit_to_instruction(1, 8)     |
|          | 34.3±1μs                                     | 31.9±0.8μs                                | 0.93    | converters.ConverterBenchmarks.time_circuit_to_instruction(2, 8)     |
|          | 51.9±0.8μs                                   | 47.9±3μs                                  | 0.92    | converters.ConverterBenchmarks.time_circuit_to_instruction(5, 8)     |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

I did try using any(instruction.is_control_flow() for instruction in circuit.data) instead of the set intersection, but it was significantly slower. Asv reported a ratio of 5.74 when using any() compared to the approach in this commit.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 13610021201

Details

  • 3 of 3 (100.0%) changed or added relevant lines in 1 file are covered.
  • 5 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.008%) to 86.968%

Files with Coverage Reduction New Missed Lines %
crates/accelerate/src/unitary_synthesis.rs 2 94.29%
crates/qasm2/src/lex.rs 3 92.23%
Totals Coverage Status
Change from base Build 13608179063: 0.008%
Covered Lines: 76075
Relevant Lines: 87475

💛 - Coveralls

@jakelishman
Copy link
Member

How does the performance compare to before the original PR?

@mtreinish
Copy link
Member Author

How does the performance compare to before the original PR?

There's still a regression, but it's less severe:

Benchmarks that have stayed the same:

| Change   | Before [b89dd7ff] <lint_incr_latest~8>   | After [53b142e9] <fix-pref-reg-to-inst>   | Ratio   | Benchmark (Parameter)                                                |
|----------|------------------------------------------|-------------------------------------------|---------|----------------------------------------------------------------------|
|          | 32.5±0.6μs                               | 39.7±0.3μs                                | ~1.22   | converters.ConverterBenchmarks.time_circuit_to_instruction(2, 128)   |
|          | 51.9±6μs                                 | 60.9±1μs                                  | ~1.18   | converters.ConverterBenchmarks.time_circuit_to_instruction(5, 128)   |
|          | 392±10μs                                 | 459±20μs                                  | ~1.17   | converters.ConverterBenchmarks.time_circuit_to_instruction(53, 128)  |
|          | n/a                                      | n/a                                       | n/a     | converters.ConverterBenchmarks.time_circuit_to_instruction(14, 8192) |
|          | n/a                                      | n/a                                       | n/a     | converters.ConverterBenchmarks.time_circuit_to_instruction(20, 2048) |
|          | n/a                                      | n/a                                       | n/a     | converters.ConverterBenchmarks.time_circuit_to_instruction(20, 8192) |
|          | n/a                                      | n/a                                       | n/a     | converters.ConverterBenchmarks.time_circuit_to_instruction(32, 2048) |
|          | n/a                                      | n/a                                       | n/a     | converters.ConverterBenchmarks.time_circuit_to_instruction(32, 8192) |
|          | n/a                                      | n/a                                       | n/a     | converters.ConverterBenchmarks.time_circuit_to_instruction(53, 2048) |
|          | n/a                                      | n/a                                       | n/a     | converters.ConverterBenchmarks.time_circuit_to_instruction(53, 8192) |
|          | 29.7±2μs                                 | 32.4±0.3μs                                | 1.09    | converters.ConverterBenchmarks.time_circuit_to_instruction(1, 128)   |
|          | 24.7±2μs                                 | 26.2±0.07μs                               | 1.06    | converters.ConverterBenchmarks.time_circuit_to_instruction(1, 8)     |
|          | 270±20μs                                 | 276±20μs                                  | 1.02    | converters.ConverterBenchmarks.time_circuit_to_instruction(32, 128)  |

Benchmarks that have got worse:

| Change   | Before [b89dd7ff] <lint_incr_latest~8>   | After [53b142e9] <fix-pref-reg-to-inst>   |   Ratio | Benchmark (Parameter)                                                |
|----------|------------------------------------------|-------------------------------------------|---------|----------------------------------------------------------------------|
| +        | 303±9μs                                  | 415±20μs                                  |    1.37 | converters.ConverterBenchmarks.time_circuit_to_instruction(8, 2048)  |
| +        | 980±30μs                                 | 1.31±0.01ms                               |    1.34 | converters.ConverterBenchmarks.time_circuit_to_instruction(8, 8192)  |
| +        | 527±20μs                                 | 694±20μs                                  |    1.32 | converters.ConverterBenchmarks.time_circuit_to_instruction(14, 2048) |
| +        | 272±6μs                                  | 357±10μs                                  |    1.31 | converters.ConverterBenchmarks.time_circuit_to_instruction(1, 8192)  |
| +        | 685±20μs                                 | 896±30μs                                  |    1.31 | converters.ConverterBenchmarks.time_circuit_to_instruction(5, 8192)  |
| +        | 391±10μs                                 | 506±8μs                                   |    1.3  | converters.ConverterBenchmarks.time_circuit_to_instruction(2, 8192)  |
| +        | 114±5μs                                  | 145±8μs                                   |    1.28 | converters.ConverterBenchmarks.time_circuit_to_instruction(14, 128)  |
| +        | 210±7μs                                  | 264±2μs                                   |    1.25 | converters.ConverterBenchmarks.time_circuit_to_instruction(5, 2048)  |
| +        | 120±1μs                                  | 150±3μs                                   |    1.24 | converters.ConverterBenchmarks.time_circuit_to_instruction(2, 2048)  |
| +        | 88.6±3μs                                 | 109±3μs                                   |    1.23 | converters.ConverterBenchmarks.time_circuit_to_instruction(1, 2048)  |
| +        | 154±1μs                                  | 188±20μs                                  |    1.23 | converters.ConverterBenchmarks.time_circuit_to_instruction(20, 128)  |
| +        | 74.6±0.6μs                               | 87.2±8μs                                  |    1.17 | converters.ConverterBenchmarks.time_circuit_to_instruction(14, 8)    |
| +        | 234±2μs                                  | 274±10μs                                  |    1.17 | converters.ConverterBenchmarks.time_circuit_to_instruction(53, 8)    |
| +        | 98.5±0.4μs                               | 114±6μs                                   |    1.16 | converters.ConverterBenchmarks.time_circuit_to_instruction(20, 8)    |
| +        | 71.9±2μs                                 | 83.7±2μs                                  |    1.16 | converters.ConverterBenchmarks.time_circuit_to_instruction(8, 128)   |
| +        | 148±0.6μs                                | 170±20μs                                  |    1.15 | converters.ConverterBenchmarks.time_circuit_to_instruction(32, 8)    |
| +        | 26.9±0.3μs                               | 30.8±0.2μs                                |    1.14 | converters.ConverterBenchmarks.time_circuit_to_instruction(2, 8)     |
| +        | 51.2±0.6μs                               | 57.7±0.3μs                                |    1.13 | converters.ConverterBenchmarks.time_circuit_to_instruction(8, 8)     |
| +        | 40.1±0.7μs                               | 44.2±0.3μs                                |    1.1  | converters.ConverterBenchmarks.time_circuit_to_instruction(5, 8)     |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - it's not ideal, but a better situation than before.

@jakelishman jakelishman added this pull request to the merge queue Mar 2, 2025
Merged via the queue into Qiskit:main with commit 1f825d6 Mar 2, 2025
21 checks passed
@mtreinish mtreinish deleted the fix-pref-reg-to-inst branch March 2, 2025 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants