-
Notifications
You must be signed in to change notification settings - Fork 2k
Use writev
for writing guest outgoing packets to the tap device
#2958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
writev
for writing guest outgoing packets to the tap device
Leaving here for reference some initial performance measurements for the TX path. Baseline, is latest main.
*kworker load numbers include a lot of noise, since we are measuring the load of all kworker threads on the system. |
ecfbb9a
to
8bd30c4
Compare
I also tried modifying the RX path in order to use This required changing a bit the order in which we do things in the RX path. The order roughly goes like:
The implementation works (here is a prototype), however the performance degrades significantly. I tried to track down the reason, and I took flamegraphs for executions with the test implementation and comparing it against the version with only vectored-writes enabled (this PR). The flame graph when using only and the one when using both In the second case the majority of the time for receiving one packet ( This made me realize, that in the original code we do not parse the whole chain if it is not needed. For example if we are receiving an ICMP frame (54 bytes long) we will parse only one descriptor in the chain. With the new workflow we do not know the size of the packet in advance (tap does not provide us At the moment, I do not see any way to avoid this issue. The only way around this I could think of is trying to ask the tap how many bytes there are available for reading but this functionality doesn't seem to be there. |
523d030
to
f18b35d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! This is not only more efficient, but I think it also simplifies the emulation code. 💯
leaving a few comments below
3fd7c59
to
ce1c8a3
Compare
writev
for writing guest outgoing packets to the tap devicewritev
for writing guest outgoing packets to the tap device
606dc38
to
89828d7
Compare
I used the tool of @zulinx86 to quantify the performance change introduced by this PR on all the platforms we support. TCP throughput results:
network latency results:
|
6570a7a
to
51a9289
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buy this man a beer! 🍻
51a9289
to
a0b43f2
Compare
74d0067
to
3de46ac
Compare
The standard library handles vectored writes and reads using the `IoSlice` and `IoSliceMut` respectively. We introduce a new type, `IoVecBuffer` which essentially is a vector of `IoSlice` and is able to be instantiated from a `DescriptorChain`. `IoVecBuffer` provides us the necessary bits for using `write_vectored` to transmit packets in the tap. Signed-off-by: Babis Chalios <bchalios@amazon.es>
This allows us to avoid an extra user-space copy of the packet we are transmitting to the tap. We use directly the buffers described by the `DescriptorChain` by means of `IoVecBuffer`s. Signed-off-by: Babis Chalios <bchalios@amazon.es>
Move the tap mock object from the `Net` device to the `Tap` iteslf and extend said object to allow us to mock as well tap write failures. Also, add a few extra unit-tests in the mmds network stack. Signed-off-by: Babis Chalios <bchalios@amazon.es>
Update baselines to reflect the improvements in the TX path due to vectored writes. Signed-off-by: Babis Chalios <bchalios@amazon.es>
Update baselines to reflect the (small) improvements in the TX path due to vectored writes. Signed-off-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
3de46ac
to
76017e9
Compare
Added missing entries for 1.3 release - bchalios@'s net scatter-gather improvements (firecracker-microvm#2958) - seccompiler change to make builds reproducible (firecracker-microvm#3445) Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Added missing entries for 1.3 release - bchalios@'s net scatter-gather improvements (firecracker-microvm#2958) - seccompiler change to make builds reproducible (firecracker-microvm#3445) Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Added missing entries for 1.3 release - bchalios@'s net scatter-gather improvements (firecracker-microvm#2958) - seccompiler change to make builds reproducible (firecracker-microvm#3445) Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Added missing entries for 1.3 release - bchalios@'s net scatter-gather improvements (firecracker-microvm#2958) - seccompiler change to make builds reproducible (firecracker-microvm#3445) Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Added missing entries for 1.3 release - bchalios@'s net scatter-gather improvements (firecracker-microvm#2958) - seccompiler change to make builds reproducible (firecracker-microvm#3445) Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Avoid extra memory copy when writing to tap device in virtio-net TX path
Currently, when performing TX, a network frame might be exist in various scattered in various memory regions described by a
DescriptorChain
that we receive from the guest. In order to write the frame to the tap device we first perform a number of copies from said memory buffers to a single buffer which we then write to theTap
file descriptor.This PR reverts that, to use scatter-gather IO using the
writev
system call which avoids the intermediate memory copies.Fixes #420
Description of Changes
The main blocker for using the
writev
system call in our device is that we need to be able to inspect the outgoing frame to check if it is destined towardsmmds
. This requires us to read at least the frame headers which makes the code quite convoluted.In order to do this in a clean way, I introduce an
IoVec
struct which is essentially aVec
ofstd::io::IoSlice
. The struct can be created from a Descriptor chain (without performing any copies) and can be directly used to perform thewritev
system call.Moreover, it provides methods for reading ranges of bytes from it (which at the moment perform a copy) and this functionality is used to inspect the ranges corresponding to a destination addresses in order to check if the frame is destined for
mmds
.rust-vmm
.License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.
PR Checklist
[Author TODO: Meet these criteria.]
[Reviewer TODO: Verify that these criteria are met. Request changes if not]
git commit -s
).unsafe
code is properly documented.CHANGELOG.md
.