-
Notifications
You must be signed in to change notification settings - Fork 737
Update pulse benchmarking #1607
Update pulse benchmarking #1607
Conversation
- Add pulse defaults loading test. This mainly measures instmap construction. - Add lowering test. This measures conversion of block -> schedule. - Add ECR building tests with three major approaches. - Add parameterized block test with parameter scan, assuming calibration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These benchmarks look great to me, thanks for doing this. I have one question inline about the pulse default benchmarks but it's not really a blocker we can always add it a follow up too.
def setup(self, num_random_gate): | ||
self.source = gen_source(num_random_gate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the slowest of these take ~500 ms for me running benchmarks locally. Since this is a critical concern for performance I'm wondering if this is large enough to showcase the issues we're hitting. Does the conversion scale with number of gate definitions in absolute terms or is it also a function of qubits?
I'm wondering if we should just use FakeWashington
too I saw in the commit message where you mentioned keeping a stable base and I normally agree with that (which is why some fake backends are vendored in the code here). But in practice I don't think we're likely to ever change the snapshot of FakeWashington
unless there was a big backwards incompatible change made to the configuration of the device that we wanted to ensure we tested (the only time that happened historically was the move from u1, u2, u3 to sx, x, rz)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that these backends still report u* gates, and parsing of string parameters is the heavy overhead. I also included U3 gates and randam_gate
s with string parameter (this also checks efficiency of pulse library) in this benchmark, so I think this is enough sensitive to the improvement of the parser (currently we are doing sort of overengineering). I don't know why backends still report u* gate calibrations, but there could be a possibility of removal of them, resulting in drastic performance improvement without actual performance change. This is why I hesitate to have fake backends in the benchmark. The test time increases with number of qubits and instructions, but I think scaling is linear unless the machine runs out of memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine with me, then I'm all good with these benchmarks. (we also can expand things in the future if needed)
- Add pulse defaults loading test. This mainly measures instmap construction. - Add lowering test. This measures conversion of block -> schedule. - Add ECR building tests with three major approaches. - Add parameterized block test with parameter scan, assuming calibration
- Add pulse defaults loading test. This mainly measures instmap construction. - Add lowering test. This measures conversion of block -> schedule. - Add ECR building tests with three major approaches. - Add parameterized block test with parameter scan, assuming calibration
Summary
This PR renews pulse benchmark with modern grammar. This invalidates old performance history.
Current pulse benchmarks targets pulse programs written in the form of
Schedule
, but this approach will be shortly discouraged. Pulse programs are usually built with pulse builder, that outputsScheduleBlock
. Then, such programs will be get lowered toSchedule
at the time of execution. So measuring performance onSchedule
doesn't give us practical measure of our pulse SDK.Details and comments
In new benchmark, following codes are added.
This benchmark measures the speed of loading calibration data from JSON, which is usually provided by a backend as command definitions. Recently, owing to the increase of qubit numbers, loading speed of the calibration data is becoming critical. To track improvements of the logic, this test is newly added.
To prevent artifact due to fake provider update (especially calibration data) in terra, a dedicated fake data is introduced in the file. This generator assumes 2Q device, but can add arbitrary number of
random_gate
consisting of a single waveform with frame changes.PulseDefaultsBench
measures loading speed with varying the number of random gates, andCircuitSchedulingBench
measures circuit -> schedule conversion speed on top of new fake data. The latter test is replacement ofScheduleToInstructionBench
, which had dependency on Fake 2Q pulse backend in terra.Tests inside this file are renewed. New test consists of
EchoedCrossResonanceConstructionBench
andParameterizedScheduleBench
. These tests aim at benchmarkingScheduleBlock
performance, rather thanSchedule
. Reference mechanism (block can manage external reference or subroutine as if managing parameters) is also tested.ParameterizedScheduleBench
assumes the situation of calibration experiments, where we scan a particular parameter of pulse withinplace=False
mode, and usually the pulse schedule is fully parameterized. Parameters are assigned to flat schedule, referenced schedule, and pulse gate to cover various situations.Execution of program requires conversion from
ScheduleBlock
toSchedule
, and this file includes such test. Some random and sufficiently complicated pulse program is prepared and digested bytarget_qobj_transform
. Note that this is standard transformer function though, this is not well designed and logic itself could be changed in future.