-
Notifications
You must be signed in to change notification settings - Fork 2.1k
sys/net/gcoap: Use socket _buf API to recognize truncated requests #16378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sys/net/gcoap: Use socket _buf API to recognize truncated requests #16378
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me. Some style suggestions.
I did a short test: a large coap put:
(there is nothing written for success unless verbosity 6 the value is 2.04) from my POV this works as expected, (prior this patch the transfer without blocks just got stuck) 👍 |
541bda9
to
3fcfa27
Compare
Style issues addressed (most did indeed fit the 100 char limit). The TBD Size1 I'm leaving out right now as there are open questions as to what should be indicated there; not sending one is violating a SHOULD of 7252 but all still better than not responding at all, and IMO also better than sending the wrong value. Are you happy enough with the state of things that one of you can ACK this? [edit: provided a test was done on non-GNRC, see below] |
@kfessel, I'm not quite sure reading your comment what you tested. Did you find an example where gcoap-on-lwIP or similar can be run, or just tested the default GNRC gcoap case? |
i tested my app (the one i had the issue with that lead me to talk about that in matrix) i tested with native the app takes wasm bytecode with is often more than 100 byte and therefor has that problem but works well with block transfers this PR is a solution for my case (error response enables me to react and use block transfers no error just silence is bad) |
couldn't we just send the limit by buffersize and if the app requires a smaller size it creates its own 4.13 error |
What I think 4.13 with Size1 should is to report the payload size the client could use with this request. That means that the message needs to be parsed to the end of options, and then it needs to be determined how much space is left. Doing this is relatively complex, and may need additional error reporting to reliably see if the message was maybe even truncated in the options (in which case ... I don't know if anything sensible can be reported at all). But I'm not even sure that is the right interpretation, whereas not sending a Size1 is bad but at least debuggable. |
There are now proper testing descriptions, and I ran them. A few things worth mentioning happened there, and are expressed in follow-up commits:
|
Thank you I just stumbled upon this as well. Why can't we use a user supplied buffer here instead of the local |
I saw a message in my email from this PR. It's been a while but my recollection is below.
See the kb2ma/RIOT/lwip* branches.
IIRC this is the basic design of gcoap, as a messaging hub running in its own thread and for simplicity. See the wiki page for contrast with nanocoap. |
Any incoming message needs to go through gcoap's buffers at one point.
At the time we're reading into that, we don't even know whether what's coming in is a response or a request, let alone which request this belongs to. |
If a response is received through gcoap it is, as it always has been, copied into gcoap's receive buffer, and freed from the socket. When such a truncated response arrives, the tail is gone for good already. Sure, we could offer all kinds of "yeah but there's a pointer you can follow up on and need to free" stuff, but the beauty of nanocoap and gcoap is that the user doesn't need to gather data from anywhere, and the beauty of CoAP is that this is not needed in practice. The point of this is to allow the stack to fail gracefully (allowing the protocol to continue rather than deadlocking it) when overly long messages arrive, which generally means to ask for smaller bites of data. An application will pick its buffer in accordance with the messages it expects, so if someone wants to haul 1KiB messages, they can up their buffer. |
Mh, so you are basically "misusing" the
Mh, but consider this: we only put the header of a packet into the internal buffer and for the rest an application may choose to use: /* just a sketch! */
static uint8_t payload[SOME_ARRAY_SIZE];
int pos = 0, res;
while ((res = gcoap_get_payload(pkt, &payload[pos], sizeof(payload) - pos) > 0) {
pos += res;
} this way, a developer does not need to beat themselves over the head of configuring some module they may or may non know about and can do all the necessary compile-time config in the app. No '"get follow-up or "yeah but there's a pointer you can follow up on and need to free" stuff' required, if we establish that this is the way to get a packet's payload. |
Yes. I'm only using _buf because without it I we couldn't even see whether truncation happened, and right here this is all I care about.
Sure, that's a change we can make. But it's something that needs coordination across nanocoap and gcoap (which I start suspecting is part of what's holding RIOT's CoAP ecosystem back). In the end, this can get us rid of the need for a payload-sized buffer for incoming messages at all (though so far our handlers place their responses where the request came in, so...) I'm happy to work with this as a proposal, but that'll take several changes and iterations of design, while this here is only about getting the CoAP machinery unstuck when large packages arrive, without any need for applications to alter any behavior. (Unless they specifically checked for whether |
Does it oO? If the |
nanocoap is used as message parser and composer, but exceeding that, gcoap and nanocoap also share the coap_pkt_t structure as well as the coap_handler_t -- which means that untangling them entails a type change across all user visible CoAP functions. BTW, Murdock found that I had disregarded the new DTLS code -- that's now "fixed" easily (because if truncation happens to DTLS, tough luck, crypto won't open, and the message can't be acted on sensibly). |
Ok, but so (as far as I can judge) nothing that couldn't be handled with some good old inheritance and type juggling. In the long run, I think it is a good idea to boilerplate the whole nanocoap underbelly away from the user. But for now, I think we can live with just warning the user about truncated payloads. |
Thanksl. Murdock passes too now. @maribu has done some review already, do these still hold? If so, testing is what is missing. For me with the latest commit, tests on native passed as described above for GNRC; LWIP passed after commenting out sys/net/netif/netif.c line 72 (netif_get_name; that's undefined with this setup both in master and this branch and an unrelated yet-to-be-opened issue). Please let me know if there are any more issues, and whether I can squash. (I'd keep it as the 3 commits, and just let the fixup autosquash). |
See #16741 ;-). |
May I rebase this? |
Yes, and squash as well, if you like :-). |
Closes: RIOT-OS#14167 Co-authored-by: Marian Buschsieweke <maribu@users.noreply.github.com>
6f8baf2
to
b9a8652
Compare
I'm configured to autosquash ;-) Rebased, and all described tests still work. |
@chrysn |
I did test my application with this patch, with GNRC and LWIP it works |
@chrysn: Do you want to add |
I tested:
|
I'd prefer to do any Size1 in a follow-up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything is green (review steps 1-5 and ci tests) -> 🚀
Contribution description
Gcoap used to behave quite bad with large requests (see #14167); this uses the experimental _buf API to detect overflows and react accordingly.
Testing procedure
./server.py
.coap get 2a02:b18:c13b:8010::907 5683 /other/block
-c
and/other/separate
, a very annoyed server because it's server that retries several times in vain);./aiocoap-client 'coap://[2a02:b18:c13b:8012:3c63:beff:fe85:ca96]/.well-known/core' -m POST --payload ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
To test with lwIP (where sock_udp_recv_buf may actually have scatter-gather behavior, so it's a good test), I liberally copied USEMODULEs from tests/lwip over to examples/gcoap and ripped out anything that looked like gnrc (without any claims of necessity):
I didn't manage to send requests in for reasons out of the scope of this PR, but getting (
coap get 2a02:b18:c13b:8010::907 5683 /other/block
) produces the expected truncated result, and tracing through the newly introduced copy loop shows that it is indeed used as a loop and not as an if. When increasing CONFIG_GCOAP_PDU_BUF_SIZE to 1280, we get out of the "buffer too short" domain but into an area where the correctness of the reassembled buffer can be seen. (Squeeze your eyes and you see the pattern of the hex dump is uninterrupted).(I do see that the blockwise transfer causes trouble of its own, but AFAICT it's unrelated to this issue.)
Issues/PRs references
Closes: #14167
See also #16377 for the worse alternative resolution, and a hint at how often this comes up.
[edit: Testing procedure filled out]