-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Fix performance of Sabre rust<->Python boundary #10652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In Qiskit#10366 the SabreLayout and SabreSwap passes were refactored to support nested control flow blocks. As part of that refactor a new struct SabreResult was created to store the nested results for each block. This new class however resulted in PyO3 cloning the full swap map on every access. Since we have at least 1 access per gate in the circuit (and another one for each swap) this extra clone() adds a lot of extra overhead for deep circuits. In extreme cases this regression could be quite extreme. To address this the result format is changed to be a tuple (as it was originally), while this is less ergonomic the advantage it provides is that for nested objects it moves the rust object to the pyo3 interface so we avoid a copy as Python owns the object on the return. However, for control flow blocks we're still using the SabreResult class as it simplifies the internal implementation (which is why Qiskit#10366 introduced the struct). The potential impact of this is mitigated by switching to only a single clone per control flow block, by only accessing the SabreResult object's attribute on each recursive call. However, if it is an issue in the future we can work on flattening the nested structure before returning it to python to avoid any clones. Fixes Qiskit#10650
One or more of the the following people are requested to review this:
|
I reran the reproduce script from #10650 and with this PR applied |
Pull Request Test Coverage Report for Build 5889966418
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, overall!
Thanks for the fix. Just one small comment that may be worth addressing.
This commit simplifies the logic in the recursive call logic in _apply_sabre_result to always use a tuple so we avoid an isinstance check. Co-authored-by: Kevin Hartman <kevin@hart.mn>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* Fix performance of Sabre rust<->Python boundary In #10366 the SabreLayout and SabreSwap passes were refactored to support nested control flow blocks. As part of that refactor a new struct SabreResult was created to store the nested results for each block. This new class however resulted in PyO3 cloning the full swap map on every access. Since we have at least 1 access per gate in the circuit (and another one for each swap) this extra clone() adds a lot of extra overhead for deep circuits. In extreme cases this regression could be quite extreme. To address this the result format is changed to be a tuple (as it was originally), while this is less ergonomic the advantage it provides is that for nested objects it moves the rust object to the pyo3 interface so we avoid a copy as Python owns the object on the return. However, for control flow blocks we're still using the SabreResult class as it simplifies the internal implementation (which is why #10366 introduced the struct). The potential impact of this is mitigated by switching to only a single clone per control flow block, by only accessing the SabreResult object's attribute on each recursive call. However, if it is an issue in the future we can work on flattening the nested structure before returning it to python to avoid any clones. Fixes #10650 * Simplify recursive call logic in _apply_sabre_result This commit simplifies the logic in the recursive call logic in _apply_sabre_result to always use a tuple so we avoid an isinstance check. Co-authored-by: Kevin Hartman <kevin@hart.mn> --------- Co-authored-by: Kevin Hartman <kevin@hart.mn> (cherry picked from commit e6c431e)
* Fix performance of Sabre rust<->Python boundary In #10366 the SabreLayout and SabreSwap passes were refactored to support nested control flow blocks. As part of that refactor a new struct SabreResult was created to store the nested results for each block. This new class however resulted in PyO3 cloning the full swap map on every access. Since we have at least 1 access per gate in the circuit (and another one for each swap) this extra clone() adds a lot of extra overhead for deep circuits. In extreme cases this regression could be quite extreme. To address this the result format is changed to be a tuple (as it was originally), while this is less ergonomic the advantage it provides is that for nested objects it moves the rust object to the pyo3 interface so we avoid a copy as Python owns the object on the return. However, for control flow blocks we're still using the SabreResult class as it simplifies the internal implementation (which is why #10366 introduced the struct). The potential impact of this is mitigated by switching to only a single clone per control flow block, by only accessing the SabreResult object's attribute on each recursive call. However, if it is an issue in the future we can work on flattening the nested structure before returning it to python to avoid any clones. Fixes #10650 * Simplify recursive call logic in _apply_sabre_result This commit simplifies the logic in the recursive call logic in _apply_sabre_result to always use a tuple so we avoid an isinstance check. Co-authored-by: Kevin Hartman <kevin@hart.mn> --------- Co-authored-by: Kevin Hartman <kevin@hart.mn> (cherry picked from commit e6c431e) Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
* Fix performance of Sabre rust<->Python boundary In Qiskit#10366 the SabreLayout and SabreSwap passes were refactored to support nested control flow blocks. As part of that refactor a new struct SabreResult was created to store the nested results for each block. This new class however resulted in PyO3 cloning the full swap map on every access. Since we have at least 1 access per gate in the circuit (and another one for each swap) this extra clone() adds a lot of extra overhead for deep circuits. In extreme cases this regression could be quite extreme. To address this the result format is changed to be a tuple (as it was originally), while this is less ergonomic the advantage it provides is that for nested objects it moves the rust object to the pyo3 interface so we avoid a copy as Python owns the object on the return. However, for control flow blocks we're still using the SabreResult class as it simplifies the internal implementation (which is why Qiskit#10366 introduced the struct). The potential impact of this is mitigated by switching to only a single clone per control flow block, by only accessing the SabreResult object's attribute on each recursive call. However, if it is an issue in the future we can work on flattening the nested structure before returning it to python to avoid any clones. Fixes Qiskit#10650 * Simplify recursive call logic in _apply_sabre_result This commit simplifies the logic in the recursive call logic in _apply_sabre_result to always use a tuple so we avoid an isinstance check. Co-authored-by: Kevin Hartman <kevin@hart.mn> --------- Co-authored-by: Kevin Hartman <kevin@hart.mn>
Summary
In #10366 the SabreLayout and SabreSwap passes were refactored to support nested control flow blocks. As part of that refactor a new struct SabreResult was created to store the nested results for each block. This new class however resulted in PyO3 cloning the full swap map on every access. Since we have at least 1 access per gate in the circuit (and another one for each swap) this extra clone() adds a lot of extra overhead for deep circuits. In extreme cases this regression could be quite extreme. To address this the result format is changed to be a tuple (as it was originally), while this is less ergonomic the advantage it provides is that for nested objects it moves the rust object to the pyo3 interface so we avoid a copy as Python owns the object on the return. However, for control flow blocks we're still using the SabreResult class as it simplifies the internal implementation (which is why #10366 introduced the struct). The potential impact of this is mitigated by switching to only a single clone per control flow block, by only accessing the SabreResult object's attribute on each recursive call. However, if it is an issue in the future we can work on flattening the nested structure before returning it to python to avoid any clones.
Details and comments
Fixes #10650