-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Make internal Layout and CouplingMap attrs slotted and adjust passes for fast access #7036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit updates the layout and coupling map classes to tune them for performance when used by transpiler passes. Right now the user facing api is doing a bunch of extra validation to provide helpful user messages and the internal data structures of both classes are hidden behind this api. This can add a bunch of overhead to transpiler passes (specifically routing passes) that are using these APIs in a loop. To address this commit changes these internal properties to public attributes and adds slots to the classes to speed up direct attribute access. This will enable us to tune transpiler passes to access these attributes more easily and faster. The follow on step here is to update any transpiler passes using the user facing api to access the attributes directly. Fixes Qiskit#7035
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do some more investigation before we go down this particular path. Directly renaming the internal attributes leaves us in a state where the objects less well documented and to some degree harder to understand. (I always found ._p2v
and ._v2p
confusing as attribute names.)
e.g. Layout already has Layout.get_virtual_bits
for direct access to ._v2p
. Might changing the access pattern in sabre swap to use that instead of the __getitem__
resolve the performance issue? (__getitem__
is also not great for this use-case because it checks ._p2v
first on every call, prior to checking ._v2p
.)
I still need to benchmark things here to see the difference this causes. But I expect it to have a modest improvement in most of the passes I've updated so far. I think the profiling from @georgios-ts in #7035 showed pretty clearly the cumulative overhead of wrapping an attribute with a pyfunction in an inner loop like in sabre (this also lines up with what we looked at in #6493 and the performance improvements that resulted from #6567). I think what it comes down to for me is more how we name things, I don't mind keeping things private (I was just trying to avoid the _append/append thing like in QuantumCircuit) but I think in passes where we know that the lookup is valid and/or the structure is correctly formed we should be accessing slotted attributes directly, especially as things scale up. So for me it was more a matter of having a bunch of things like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really have anything to add on the discussion of naming things (personally I've always disliked the a2b
form in any library), but I'm always in favour of defining __slots__
for objects that are used a lot, and having an "I know what I'm doing, skip the checks" access pattern available for internal use, provided it's well documented that it's intended to be read-only.
Agree that @georgios-ts 's numbers show there is room for improvement in the sabre case (but not whether that's the fault of our sabre implementation our of our
My point was more that we should try to build APIs that are both fast and safe by design (so that we don't end up just pushing these mutability and safety concerns out of our code and into our user's code) and that we document these safety and performance characteristics. That will help both us and our users develop best practices that eliminate some of the low hanging fruit like the above.
Agree, and I think the |
I've updated this to not rename the attributes and leave them as private, just add the slotting and adjust the access pattern for speed in the passes where it was relevant (I might have missed some though). |
I ran some benchmarks with this PR:
I ran the qualitative compiler benchmarks mostly because it's the only one that runs with sabre and stochastic swap, but I don't think they're large enough to show the real benefits on sabre here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks fine to me, and the speedups are good. Two-way dicts really feel like they should be a part of the standard library, or at least have a rock-solid external dependency, but that's not really related to this PR.
@@ -134,7 +134,6 @@ def _trivial_not_perfect(property_set): | |||
# layout so we need to clear it before contuing. | |||
if property_set["trivial_layout_score"] is not None: | |||
if property_set["trivial_layout_score"] != 0: | |||
property_set["layout"]._wrapped = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These things are (presumably) part of the old FencedObject
stuff that appeared at the start of the transpiler. I'm guessing it's being removed because the Layout
became slotted and you didn't want to add the extra slot? Do we need the FencedPropertySet
stuff at all really - is it useful for error checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I removed this because slots meant I couldn't add an arbitrary attribute to the Layout
object and it didn't seem worth the slot especially since it wasn't universally used.
I think we can look into deprecating the FencedPropertySet
. At one time it was used to ensure that a TransformationPass
could not modify the property set but that restriction was removed in #4387 so I'm not sure anything needs it anymore.
(edit: looking at the running passmanager object the flow controller gets a fenced property set to prevent a condition callable from modifying it. I'm not sure how important that is in practice, I've never seen an error like that come up before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds sensible to me!
This commit switches the default routing and layout method for optimization level 3 to use SabreSwap and Sabre layout passes. The quality of results is typically better with sabre than the default stochastic swap and dense layout we're using now. For optimization level 3 where we try to produce the best quality result and runtime is of secondary concern using sabre makes the most sense (especially after Qiskit#7036 which improves the runtime performance). Also for optimization level 3 currently sabre is typically faster because we increase the number of trials for stochastic swap (which is generally significantly faster) which slows it down as it's doing more work. This should improve the quality and speed of the results in general when running with optimization level 3. In the future we can do more work to improve the runtime performance of the sabre passes and hopefully make it fast enough to use for all the optimization levels which have more constraints on the run time performance than level 3. Related to Qiskit#7112 and Qiskit#7200
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates here. Out of curiosity, do you have benchmark numbers for accessing Layout._{v2p,p2v}
via their public methods (get_{virtual,physical}_bits
), perhaps once outside the hot loops? I'd be interested to see exactly how much direct attribute access is gaining us, and whether or not a broader refactor of the Layout
object for performance should be considered.
(I'm not a fan of updating all of our passes to access internal attributes because then our passes, and the passes our users develop based on them, will depend on something other than the stable API, and arguably would make a future refactor of Layout
harder.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. Few more comments.
Co-authored-by: Kevin Krsulich <kevin@krsulich.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates.
* Switch default routing/layout method to sabre for opt level 3 This commit switches the default routing and layout method for optimization level 3 to use SabreSwap and Sabre layout passes. The quality of results is typically better with sabre than the default stochastic swap and dense layout we're using now. For optimization level 3 where we try to produce the best quality result and runtime is of secondary concern using sabre makes the most sense (especially after #7036 which improves the runtime performance). Also for optimization level 3 currently sabre is typically faster because we increase the number of trials for stochastic swap (which is generally significantly faster) which slows it down as it's doing more work. This should improve the quality and speed of the results in general when running with optimization level 3. In the future we can do more work to improve the runtime performance of the sabre passes and hopefully make it fast enough to use for all the optimization levels which have more constraints on the run time performance than level 3. Related to #7112 and #7200 * Clarify punctuation in release note Co-authored-by: Jake Lishman <jake@binhbar.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Summary
This commit updates the layout and coupling map classes to tune them for
performance when used by transpiler passes. Right now the user facing
api is doing a bunch of extra validation to provide helpful user
messages and the internal data structures of both classes are hidden
behind this api. This can add a bunch of overhead to transpiler passes
(specifically routing passes) that are using these APIs in a loop. To
address this commit changes these internal properties to public
attributes and adds slots to the classes to speed up direct attribute
access. This will enable us to tune transpiler passes to access these
attributes more easily and faster. The follow on step here is to update
any transpiler passes using the user facing api to access the attributes
directly.
Details and comments
Fixes #7035
TODO: