sys/net/gcoap: Use socket _buf API to recognize truncated requests #16378

chrysn · 2021-04-22T21:00:54Z

Contribution description

Gcoap used to behave quite bad with large requests (see #14167); this uses the experimental _buf API to detect overflows and react accordingly.

Testing procedure

Have a CoAP server on the host that serves something large, eg. aiocoap's ./server.py.
Run the gcoap example.
coap get 2a02:b18:c13b:8010::907 5683 /other/block
- previously have given a timeout, and (when using -c and /other/separate, a very annoyed server because it's server that retries several times in vain);
- now a part of the response is shown, and a warning about the truncation is shown.
From outside, send a large request (TBD: didn't get this to run on gcoap yet): ./aiocoap-client 'coap://[2a02:b18:c13b:8012:3c63:beff:fe85:ca96]/.well-known/core' -m POST --payload ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
- Previously, this would just time out.
- Now, you get "4.13 Request Entity Too Large" written in friendly, red letters.

To test with lwIP (where sock_udp_recv_buf may actually have scatter-gather behavior, so it's a good test), I liberally copied USEMODULEs from tests/lwip over to examples/gcoap and ripped out anything that looked like gnrc (without any claims of necessity):

+USEMODULE += ipv6_addr
+USEMODULE += lwip_ipv6_autoconfig
+USEMODULE += lwip_netdev
+USEMODULE += lwip_ipv6
+USEMODULE += lwip
+USEMODULE += lwip_udp
+USEMODULE += lwip_tcp
+USEMODULE += sock_async_event
+USEMODULE += sock_ip
+USEMODULE += sock_tcp
+USEMODULE += sock_udp
+USEMODULE += sock_util
+USEMODULE += od
+USEMODULE += sock_aux_local
+USEMODULE += netdev_default

I didn't manage to send requests in for reasons out of the scope of this PR, but getting (coap get 2a02:b18:c13b:8010::907 5683 /other/block) produces the expected truncated result, and tracing through the newly introduced copy loop shows that it is indeed used as a loop and not as an if. When increasing CONFIG_GCOAP_PDU_BUF_SIZE to 1280, we get out of the "buffer too short" domain but into an area where the correctness of the reassembled buffer can be seen. (Squeeze your eyes and you see the pattern of the hex dump is uninterrupted).

(I do see that the blockwise transfer causes trouble of its own, but AFAICT it's unrelated to this issue.)

Issues/PRs references

Closes: #14167

See also #16377 for the worse alternative resolution, and a hint at how often this comes up.

[edit: Testing procedure filled out]

maribu

Code looks good to me. Some style suggestions.

sys/net/application_layer/gcoap/gcoap.c

kfessel · 2021-04-26T08:48:36Z

I did a short test: a large coap put: time.wasm 566 byte

$coap-client -m put -p 5683 -f time.wasm coap://[fe80::xxxx%tapbr0]/WASM
4.13

coap-client -m put -p 5683 -f time.wasm -b 100 coap://[fe80::xxxx%tapbr0]/WASM

(there is nothing written for success unless verbosity 6 the value is 2.04)

from my POV this works as expected, (prior this patch the transfer without blocks just got stuck)

👍

chrysn · 2021-04-28T11:35:47Z

Style issues addressed (most did indeed fit the 100 char limit).

The TBD Size1 I'm leaving out right now as there are open questions as to what should be indicated there; not sending one is violating a SHOULD of 7252 but all still better than not responding at all, and IMO also better than sending the wrong value.

Are you happy enough with the state of things that one of you can ACK this? [edit: provided a test was done on non-GNRC, see below]

chrysn · 2021-04-28T11:37:04Z

@kfessel, I'm not quite sure reading your comment what you tested. Did you find an example where gcoap-on-lwIP or similar can be run, or just tested the default GNRC gcoap case?

kfessel · 2021-04-28T11:40:35Z

i tested my app (the one i had the issue with that lead me to talk about that in matrix) i tested with native
-> i only tested gnrc

the app takes wasm bytecode with is often more than 100 byte and therefor has that problem but works well with block transfers

this PR is a solution for my case (error response enables me to react and use block transfers no error just silence is bad)

kfessel · 2021-04-28T15:52:18Z

The TBD Size1 I'm leaving out right now as there are open questions as to what should be indicated there; not sending one is violating a SHOULD of 7252 but all still better than not responding at all, and IMO also better than sending the wrong value.

couldn't we just send the limit by buffersize and if the app requires a smaller size it creates its own 4.13 error

chrysn · 2021-04-28T18:02:31Z

What I think 4.13 with Size1 should is to report the payload size the client could use with this request. That means that the message needs to be parsed to the end of options, and then it needs to be determined how much space is left. Doing this is relatively complex, and may need additional error reporting to reliably see if the message was maybe even truncated in the options (in which case ... I don't know if anything sensible can be reported at all). But I'm not even sure that is the right interpretation, whereas not sending a Size1 is bad but at least debuggable.

chrysn · 2021-04-29T08:18:25Z

There are now proper testing descriptions, and I ran them.

A few things worth mentioning happened there, and are expressed in follow-up commits:

If the message is truncated before the options end, things will still fail. This is because we run the message through the nanocoap coap_parse, and can't (easily and compatibly, at least) tell whether parsing was "successful enough" that we at least get the header populated. So for the time being, requests truncated already in the options will behave as before. This is now documented in the code, but honestly it's not where I expect the trouble to come from.
The gcoap client had a nasty behavior where rather than checking whether memo->state == GCOAP_MEMO_RESP, it checked for all its known error conditions. I'd say this is OK and call it bad judgement on the application developer side (error checking in enums should always be "accept what's known", not "reject specific errors"), but others may call this an incompatible change.
Given that we know at response handling time that only the payload was trimmed, the example response handler could be smart and not even complain but just start blockwising (even if the server didn't announce it). I prefer to leave that to later, as it may need meddling with more block-wise code, and isn't essential for getting a gcoap version that doesn't spontaneously break on large messages.

miri64 · 2021-05-03T08:35:18Z

@kfessel, I'm not quite sure reading your comment what you tested. Did you find an example where gcoap-on-lwIP or similar can be run, or just tested the default GNRC gcoap case?

Sidenote: There is a branch by @kb2ma somewhere, that should make that possible.

benpicco · 2021-05-14T12:41:20Z

Thank you I just stumbled upon this as well.
I wrongly assumed the buffer passed to gcoap_req_init() was also used to store the response and wondered why I did not receive anything even though I used a large enough buffer here.

Why can't we use a user supplied buffer here instead of the local _listen_buf?

kb2ma · 2021-05-15T14:26:29Z

I saw a message in my email from this PR. It's been a while but my recollection is below.

Sidenote: There is a branch by @kb2ma somewhere, that should make that possible.

See the kb2ma/RIOT/lwip* branches.

Why can't we use a user supplied buffer here instead of the local _listen_buf?

IIRC this is the basic design of gcoap, as a messaging hub running in its own thread and for simplicity. See the wiki page for contrast with nanocoap.

chrysn · 2021-08-24T16:51:32Z

I wrongly assumed the buffer passed to gcoap_req_init() was also used to store the response and wondered why I did not receive anything even though I used a large enough buffer here.

Any incoming message needs to go through gcoap's buffers at one point.

Why can't we use a user supplied buffer here instead of the local _listen_buf?

At the time we're reading into that, we don't even know whether what's coming in is a response or a request, let alone which request this belongs to.

chrysn · 2021-08-25T14:28:29Z

If a response is received through gcoap it is, as it always has been, copied into gcoap's receive buffer, and freed from the socket. When such a truncated response arrives, the tail is gone for good already.

Sure, we could offer all kinds of "yeah but there's a pointer you can follow up on and need to free" stuff, but the beauty of nanocoap and gcoap is that the user doesn't need to gather data from anywhere, and the beauty of CoAP is that this is not needed in practice.

The point of this is to allow the stack to fail gracefully (allowing the protocol to continue rather than deadlocking it) when overly long messages arrive, which generally means to ask for smaller bites of data. An application will pick its buffer in accordance with the messages it expects, so if someone wants to haul 1KiB messages, they can up their buffer.

miri64 · 2021-08-25T14:46:11Z

Mh, so you are basically "misusing" the _buf API for a different than the intended use case, gotcha!

The point of this is to allow the stack to fail gracefully (allowing the protocol to continue rather than deadlocking it) when overly long messages arrive, which generally means to ask for smaller bites of data. An application will pick its buffer in accordance with the messages it expects, so if someone wants to haul 1KiB messages, they can up their buffer.

Mh, but consider this: we only put the header of a packet into the internal buffer and for the rest an application may choose to use:

/* just a sketch! */
static uint8_t payload[SOME_ARRAY_SIZE];
int pos = 0, res;
while ((res = gcoap_get_payload(pkt, &payload[pos], sizeof(payload) - pos) > 0) {
    pos += res;
}

this way, a developer does not need to beat themselves over the head of configuring some module they may or may non know about and can do all the necessary compile-time config in the app. No '"get follow-up or "yeah but there's a pointer you can follow up on and need to free" stuff' required, if we establish that this is the way to get a packet's payload.

chrysn · 2021-08-25T14:54:38Z

Yes. I'm only using _buf because without it I we couldn't even see whether truncation happened, and right here this is all I care about.

consider this

Sure, that's a change we can make. But it's something that needs coordination across nanocoap and gcoap (which I start suspecting is part of what's holding RIOT's CoAP ecosystem back). In the end, this can get us rid of the need for a payload-sized buffer for incoming messages at all (though so far our handlers place their responses where the request came in, so...)

I'm happy to work with this as a proposal, but that'll take several changes and iterations of design, while this here is only about getting the CoAP machinery unstuck when large packages arrive, without any need for applications to alter any behavior. (Unless they specifically checked for whether memo->state indicates some particular form of error rather than for success). Let's do both, but please the good before the better.

miri64 · 2021-08-25T15:03:39Z

Sure, that's a change we can make. But it's something that needs coordination across nanocoap and gcoap (which I start suspecting is part of what's holding RIOT's CoAP ecosystem back). In the end, this can get us rid of the need for a payload-sized buffer for incoming messages at all (though so far our handlers place their responses where the request came in, so...)

Does it oO? If the coap_pkt_t struct needs to be extended for gcoap's sake, it might make sense to instead provide an extended, gcoap-specific type. But granted, I am not as deep into the gcoap/nanocoap-internals (yet), to provide solid feedback on this. Up until now I understood that nanocoap is only used as a message parser and composer for gcoap, but if there is deeper entanglement, maybe some cleanup is indeed required.

chrysn · 2021-08-25T15:10:52Z

nanocoap is used as message parser and composer, but exceeding that, gcoap and nanocoap also share the coap_pkt_t structure as well as the coap_handler_t -- which means that untangling them entails a type change across all user visible CoAP functions.

BTW, Murdock found that I had disregarded the new DTLS code -- that's now "fixed" easily (because if truncation happens to DTLS, tough luck, crypto won't open, and the message can't be acted on sensibly).

miri64 · 2021-08-25T16:00:17Z

nanocoap is used as message parser and composer, but exceeding that, gcoap and nanocoap also share the coap_pkt_t structure as well as the coap_handler_t -- which means that untangling them entails a type change across all user visible CoAP functions.

Ok, but so (as far as I can judge) nothing that couldn't be handled with some good old inheritance and type juggling. In the long run, I think it is a good idea to boilerplate the whole nanocoap underbelly away from the user. But for now, I think we can live with just warning the user about truncated payloads.

chrysn · 2021-08-25T16:41:17Z

Thanksl. Murdock passes too now. @maribu has done some review already, do these still hold? If so, testing is what is missing. For me with the latest commit, tests on native passed as described above for GNRC; LWIP passed after commenting out sys/net/netif/netif.c line 72 (netif_get_name; that's undefined with this setup both in master and this branch and an unrelated yet-to-be-opened issue).

Please let me know if there are any more issues, and whether I can squash. (I'd keep it as the 3 commits, and just let the fixup autosquash).

miri64 · 2021-08-25T17:19:37Z

(netif_get_name; that's undefined with this setup both in master and this branch and an unrelated yet-to-be-opened issue).

See #16741 ;-).

chrysn · 2021-09-02T14:34:02Z

May I rebase this?

miri64 · 2021-09-02T14:48:37Z

Yes, and squash as well, if you like :-).

Closes: RIOT-OS#14167 Co-authored-by: Marian Buschsieweke <maribu@users.noreply.github.com>

chrysn · 2021-09-02T14:52:52Z

I'm configured to autosquash ;-)

Rebased, and all described tests still work.

kfessel · 2021-09-06T18:49:34Z

What I think 4.13 with Size1 should is to report the payload size the client could use with this request. That means that the message needs to be parsed to the end of options, and then it needs to be determined how much space is left. Doing this is relatively complex, and may need additional error reporting to reliably see if the message was maybe even truncated in the options (in which case ... I don't know if anything sensible can be reported at all). But I'm not even sure that is the right interpretation, whereas not sending a Size1 is bad but at least debuggable.

@chrysn
the value is already calculated by the nanocoap coap_parse it is pdu->payload_len of the truncated pkg, if that one is 0 there are too many options

kfessel · 2021-09-06T22:04:02Z

I did test my application with this patch, with GNRC and LWIP it works
my application is a gcoap server that take big PUT messages without this patch i get no response, if i do not use blocktransfer, with this patch i get the information -> then I do the transfer again with blocktransfer and it is working.
i will try to do further testing (the example) later

kfessel · 2021-10-15T17:04:57Z

@chrysn: Do you want to add SIZE1 reporting with what coap_parse provides in pdu->payload_len? or just have it as it is?

kfessel · 2021-10-25T18:14:02Z

I tested:

my own server (takes message bigger than the buffer via put and processes them) with GNRC ✔️ and LWIP ✔️ (in both cases i got an appropriate resonse -> resend with small blocks working ✔️ )
the gcoap example for the client path with GNRC ✔️ and LWIP ✔️
(got a warning as described in the PR ✔️ )

chrysn · 2021-10-26T18:31:41Z

I'd prefer to do any Size1 in a follow-up.

kfessel

everything is green (review steps 1-5 and ci tests) -> 🚀

chrysn requested review from cgundogan, haukepetersen, miri64 and PeterKietzmann as code owners April 22, 2021 21:00

chrysn marked this pull request as draft April 22, 2021 21:01

chrysn added Area: CoAP Area: Constrained Application Protocol implementations Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Apr 22, 2021

maribu reviewed Apr 23, 2021

View reviewed changes

sys/net/application_layer/gcoap/gcoap.c Outdated Show resolved Hide resolved

sys/net/application_layer/gcoap/gcoap.c Outdated Show resolved Hide resolved

sys/net/application_layer/gcoap/gcoap.c Outdated Show resolved Hide resolved

chrysn removed the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Apr 28, 2021

chrysn force-pushed the gcoap-handle-truncation branch from 541bda9 to 3fcfa27 Compare April 28, 2021 11:32

chrysn marked this pull request as ready for review April 29, 2021 08:18

MrKevinWeiss added this to the Release 2021.07 milestone Jun 22, 2021

MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021

chrysn mentioned this pull request Aug 24, 2021

sys/net/gcoap: Increase default GCOAP_PDU_BUF_SIZE #16377

Closed

chrysn mentioned this pull request Aug 25, 2021

pkg/lwip: implement netif_get_name() #16741

Merged

miri64 mentioned this pull request Aug 26, 2021

gcoap: multi-transport support #16688

Merged

2 tasks

chrysn and others added 3 commits September 2, 2021 16:49

sys/net/gcoap: Use socket _buf API to recognize truncated requests

2e70f29

Closes: RIOT-OS#14167 Co-authored-by: Marian Buschsieweke <maribu@users.noreply.github.com>

examples/gcoap: Don't assume success after exhausing known errors

e9736e9

examples/gcoap: Do display truncated messages, but with a warning

b9a8652

chrysn force-pushed the gcoap-handle-truncation branch from 6f8baf2 to b9a8652 Compare September 2, 2021 14:52

kfessel self-requested a review September 6, 2021 16:15

kfessel added the Reviewed: 3-testing The PR was tested according to the maintainer guidelines label Oct 25, 2021

kfessel approved these changes Oct 26, 2021

View reviewed changes

chrysn merged commit a92cdb5 into RIOT-OS:master Oct 27, 2021

chrysn deleted the gcoap-handle-truncation branch October 27, 2021 06:56

kfessel mentioned this pull request Nov 3, 2021

example/gcoap: add LWIP make option #17130

Merged

fjmolinas added this to the Release 2022.01 milestone Nov 18, 2021

cgundogan mentioned this pull request May 11, 2022

gcoap: forward_proxy: forward truncated responses #18091

Closed

sys/net/gcoap: Use socket _buf API to recognize truncated requests #16378

sys/net/gcoap: Use socket _buf API to recognize truncated requests #16378

Uh oh!

Conversation

chrysn commented Apr 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution description

Testing procedure

Issues/PRs references

Uh oh!

maribu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kfessel commented Apr 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrysn commented Apr 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrysn commented Apr 28, 2021

Uh oh!

kfessel commented Apr 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kfessel commented Apr 28, 2021

Uh oh!

chrysn commented Apr 28, 2021

Uh oh!

chrysn commented Apr 29, 2021

Uh oh!

miri64 commented May 3, 2021

Uh oh!

benpicco commented May 14, 2021

Uh oh!

kb2ma commented May 15, 2021

Uh oh!

chrysn commented Aug 24, 2021

Uh oh!

chrysn commented Aug 25, 2021

Uh oh!

miri64 commented Aug 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrysn commented Aug 25, 2021

Uh oh!

miri64 commented Aug 25, 2021

Uh oh!

chrysn commented Aug 25, 2021

Uh oh!

miri64 commented Aug 25, 2021

Uh oh!

chrysn commented Aug 25, 2021

Uh oh!

miri64 commented Aug 25, 2021

Uh oh!

chrysn commented Sep 2, 2021

Uh oh!

miri64 commented Sep 2, 2021

Uh oh!

chrysn commented Sep 2, 2021

Uh oh!

kfessel commented Sep 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kfessel commented Sep 6, 2021

Uh oh!

kfessel commented Oct 15, 2021

Uh oh!

kfessel commented Oct 25, 2021

Uh oh!

chrysn commented Oct 26, 2021

Uh oh!

kfessel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chrysn commented Apr 22, 2021 •

edited

Loading

kfessel commented Apr 26, 2021 •

edited

Loading

chrysn commented Apr 28, 2021 •

edited

Loading

kfessel commented Apr 28, 2021 •

edited

Loading

miri64 commented Aug 25, 2021 •

edited

Loading

kfessel commented Sep 6, 2021 •

edited

Loading