Skip to content

Rust implementation of the QPY module #14166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 50 commits into
base: main
Choose a base branch
from
Open

Conversation

gadial
Copy link
Contributor

@gadial gadial commented Apr 3, 2025

Summary

Adds a rust-based implementation of qpy serialization.

Addresses #13131

Details and comments

We use rust structs along with the binrw library to specify and write the byte representation of data (this turns out to be somewhat simpler than how it's currently done in python). There are still many python-dependent elements we need to take into consideration:

  1. Backwards compatibility is currently not supported; whenever an older version is passed, the Python version is used instead. The goal of this PR is to establish a baseline from which it would be relatively easy to add support for previous versions.
  2. File header is still created in Python space; the entrypoint for the rust code is write_circuit and read_circuit (this is being kept until a later PR adds backward compatability)
  3. For every instruction, QPY stores the name of the corresponding python class, e.g. for X gate it stores the string XGate. Currently all the class names are obtained by accessing the __name__ field of the python class; for standard gates and instructions we'll need to switch to a python-independent solution (an ad-hoc dictionary is currently used for the reader component; there may be a better solution).
  4. Since Expr was only recently moved to rust, after most of this PR was complete, we do not have a native Rust serialization yet; rather, we rely on the python serialization code (the _write_expr and _read_expr methods). There are many more places where Rust might be used but Python is used instead.
  5. numpy object serialization is done via python call to numpy's save.

In this PR we make the distinction between packing an object and serializing it:

  • serializing an object means generating a bytes representation for it (a Vec<u8> object, aliased to Bytes in the module.
  • packing an object means extracting its data fields and generating a corresponding rust struct, taken from the formats.rs files. This might involve serializing some of the object's fields, and packing other fields.

Our goal is to minimize serialization as much as possible, allowing us to generate a large packed struct reflecting the structure of the qpy file as much as possible before serializing; the code can be further improved to reflect this, although I think this can wait to a future PR.

@gadial gadial added mod: qpy Related to QPY serialization Rust This PR or issue is related to Rust code in the repository labels Apr 3, 2025
@gadial gadial added this to the 2.1.0 milestone Apr 3, 2025
@gadial gadial self-assigned this Apr 3, 2025
@gadial gadial requested a review from a team as a code owner April 3, 2025 09:47
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core
  • @mtreinish
  • @nkanazawa1989

@gadial gadial marked this pull request as draft April 3, 2025 09:47
@eliarbel eliarbel linked an issue Apr 24, 2025 that may be closed by this pull request
@eliarbel
Copy link
Member

eliarbel commented May 4, 2025

Hi @gadial, this is still in draft mode so I just gave it a brief look. I have a few general questions and comments at this stage:

  1. With the way the code is written now it's geared towards being called from Python, e.g. with the file header written in interface.py, the Python calls in py_write_circuit, pack_circuit_header and more. Maybe it's just an artifact of this PR still being WIP, but with the C-API for circuit construction (this Add initial C API for circuit construction #14006 and follow-ons) we should consider a C-api equivalent for the dump workflow or some aspects of it.
  2. For the circuit header handling in pack_circuit_header maybe QuantumCircuitData in converters or a similar approach might be useful here instead of all the getattr calls.
  3. Generally speaking about 1 & 2: assuming we'll still need some Python calls, can we make the Rust-logic as Python-free as possible, say by having a layer to extract as much info as possible from the Python objects before passing it to the actual pack functions? I realize this might not be that straightforward, but still worth trying. Eventually we should have enough of the data model Python-free so it makes sense to design for that already.
  4. Is use_rust in dump just a WIP thing for debug or do you plan to keep Python's write_circuit as an option? If so why? Otherwise it would be good to have the formats defined only in one place. Currently formats.rs mirrors formats.py which makes sense for the WIP status but the duplication is not good for the longer term. Just mentioning.

@gadial
Copy link
Contributor Author

gadial commented May 4, 2025

Thanks @eliarbel

  1. That's mostly at artifact of originally planning a smaller scope for this PR (in order to proceed incrementally instead of one huge PR) - I'm still trying to aim for the MVP, although this can be changed if needed since the header is a relatively easy part.
  2. Thanks, I'll try looking into it once the PR is passing.
  3. It's a little tricky since we deal with many different data types, each with its own Pythonic quirks, and some (e.g. parameters) are already transitioning to Rust. My priorities in this PR is to flush out the QPY structure as much as possible (i.e. by using detailed structs) and keeping the actual data extraction in specific serialize_X functions which can be amended on demand as we move more logic from Python to Rust. But I thing adding new extractors is out of scope for this PR which aims to establish the baseline.
  4. That's most definitely a WIP thing - I use it to test both Rust and Python generate the same output. At the end I believe we should remove all the Python code related to QPY dumping, but we'll have to keep the formats until we handle reading as well.

Hi @gadial, this is still in draft mode so I just gave it a brief look. I have a few general questions and comments at this stage:

  1. With the way the code is written now it's geared towards being called from Python, e.g. with the file header written in interface.py, the Python calls in py_write_circuit, pack_circuit_header and more. Maybe it's just an artifact of this PR still being WIP, but with the C-API for circuit construction (this Add initial C API for circuit construction #14006 and follow-ons) we should consider a C-api equivalent for the dump workflow or some aspects of it.
  2. For the circuit header handling in pack_circuit_header maybe QuantumCircuitData in converters or a similar approach might be useful here instead of all the getattr calls.
  3. Generally speaking about 1 & 2: assuming we'll still need some Python calls, can we make the Rust-logic as Python-free as possible, say by having a layer to extract as much info as possible from the Python objects before passing it to the actual pack functions? I realize this might not be that straightforward, but still worth trying. Eventually we should have enough of the data model Python-free so it makes sense to design for that already.
  4. Is use_rust in dump just a WIP thing for debug or do you plan to keep Python's write_circuit as an option? If so why? Otherwise it would be good to have the formats defined only in one place. Currently formats.rs mirrors formats.py which makes sense for the WIP status but the duplication is not good for the longer term. Just mentioning.

@coveralls
Copy link

coveralls commented May 11, 2025

Pull Request Test Coverage Report for Build 16703840999

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 2739 of 3196 (85.7%) changed or added relevant lines in 14 files are covered.
  • 510 unchanged lines in 7 files lost coverage.
  • Overall coverage decreased (-0.6%) to 87.07%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/qpy/binary_io/value.py 0 1 0.0%
crates/qpy/src/formats.rs 87 89 97.75%
crates/qpy/src/annotations.rs 78 102 76.47%
crates/qpy/src/bytes.rs 73 113 64.6%
crates/qpy/src/value.rs 344 426 80.75%
crates/qpy/src/circuits.rs 1588 1712 92.76%
crates/qpy/src/params.rs 489 673 72.66%
Files with Coverage Reduction New Missed Lines %
crates/circuit/src/parameter/parameter_expression.rs 1 83.59%
crates/qasm2/src/lex.rs 3 92.27%
crates/circuit/src/parameter/symbol_expr.rs 9 73.93%
qiskit/qpy/type_keys.py 16 70.59%
qiskit/qpy/common.py 18 38.68%
qiskit/qpy/binary_io/value.py 79 54.67%
qiskit/qpy/binary_io/circuits.py 384 41.46%
Totals Coverage Status
Change from base Build 16661257663: -0.6%
Covered Lines: 84013
Relevant Lines: 96489

💛 - Coveralls

@eliarbel eliarbel modified the milestones: 2.1.0, 2.2.0 May 29, 2025
@raynelfss raynelfss added the Changelog: None Do not include in changelog label Jul 28, 2025
Copy link
Contributor

@kevinhartman kevinhartman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @gadial, thanks for doing all of this.

This isn't a complete review by any means, but I just wanted to give you some comments to work through ahead of your holiday. I'll continue reviewing this coming week.

The general questions I have so far are:

  1. Are there any changes to the current QPY format to support any of this, or is it compatible without introducing a new version?
  2. This PR is quite a bit larger than I was expecting. Are there any more meaningful chunks that you might be able to split this into to make reviewing it easier to understand architecturally? I think when we spoke about it before, you mentioned that testing was easier if both reader and writer were in place. But that leads me to the third point:
  3. Should we have Rust-level unit testing for any of this? Or, is there some amount of stress testing we could do, i.e. generating complex circuits with both the old and new serializers and making sure the outputs are equivalent? I have some doubt that our existing QPY test coverage is sufficient to detect potential differences introduced here (but perhaps I am wrong). How confident are you that the new code is sufficiently exercised by what we've got?

}
}

impl TryFrom<&Bytes> for f64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit suspect to me. What use-case do we have where we need to interpret a raw byte sequence as a big endian float? Shouldn't binrw handle that automatically for us if one of our storage structs has an f64 field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's used when converting a general dumped value to python: https://github.com/gadial/qiskit/blob/cc1f0fbd07cb6c38c123c078523fcf4636d18fc5/crates/qpy/src/value.rs#L554

I do think this can be avoided with smarter handling of DumpedValue as a binrw enum, but I think it's best to save this for a future PR.

Comment on lines 225 to 226
pack_param(py, &target.extract::<Param>()?, qpy_data)?,
pack_param(py, &cases_tuple.extract::<Param>()?, qpy_data)?,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be careful here since these aren't actually Params. We're trying to get rid of Param::Obj (the control flow PR gets us most of the way there). I think they'd need to just be PyObject for now here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, changed pack_param to handle Bound<PyAny> objects instead. I still need to handle Param objects here, though, so I renamed the previous pack_param to pack_param_obj.

@gadial
Copy link
Contributor Author

gadial commented Aug 2, 2025

Hey @gadial, thanks for doing all of this.

This isn't a complete review by any means, but I just wanted to give you some comments to work through ahead of your holiday. I'll continue reviewing this coming week.

The general questions I have so far are:

  1. Are there any changes to the current QPY format to support any of this, or is it compatible without introducing a new version?
  2. This PR is quite a bit larger than I was expecting. Are there any more meaningful chunks that you might be able to split this into to make reviewing it easier to understand architecturally? I think when we spoke about it before, you mentioned that testing was easier if both reader and writer were in place. But that leads me to the third point:
  3. Should we have Rust-level unit testing for any of this? Or, is there some amount of stress testing we could do, i.e. generating complex circuits with both the old and new serializers and making sure the outputs are equivalent? I have some doubt that our existing QPY test coverage is sufficient to detect potential differences introduced here (but perhaps I am wrong). How confident are you that the new code is sufficiently exercised by what we've got?
  1. No changes.
  2. There are three conceptual parts for the PR: First is the formats file which defines the structure of the qpy file as much as possible (there are still some Bytes fields e.g. for storing multiple possible values types; this may be further refined in a future PR). Next are the writer and the reader parts. The writer takes the QuantumCircuit object and generates a struct; the reader takes a struct and generates a QuantumCircuit. The reader and the writer obviously depend on the formats, but not on each other. However, not adding all three together would result in some dead code in the PR, which is not preferred.
  3. In the long run we definitely should have Rust-level testing (which if I'm not mistaken is currently missing from most of our Rust code?). I did some stress testing during development (which indeed encountered some problems which were fixed) and will also do it as part of the benchmarking, but I have relatively good confidence in the current tests, due to the large amounts of problems they managed to detect in relatively niche situations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog mod: qpy Related to QPY serialization Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Port QPY serialization to Rust
6 participants