Skip to content

nanocoap: add coap_next_msg_id() #18991

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

benpicco
Copy link
Contributor

Contribution description

Both nanoCoAP and GCoAP have the need for getting (somewhat) monotonic message IDs.
Currently both implement this as an internal function that they then kinda need to re-repose after all.

Clean this up by just adding a single public coap_next_msg_id() function that can be used by anyone needing CoAP message IDs.

Testing procedure

CoAP should function as before.
(tests/gcoap_fileserver will run integration test on CI)

Issues/PRs references

@benpicco benpicco requested a review from fabian18 November 29, 2022 01:02
@github-actions github-actions bot added Area: CoAP Area: Constrained Application Protocol implementations Area: network Area: Networking Area: sys Area: System labels Nov 29, 2022
@benpicco benpicco added Type: cleanup The issue proposes a clean-up / The PR cleans-up parts of the codebase / documentation CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Nov 29, 2022
@riot-ci
Copy link

riot-ci commented Nov 29, 2022

Murdock results

✔️ PASSED

d45ae80 nanocoap_sock: make use of coap_next_msg_id()

Success Failures Total Runtime
117858 0 117858 01h:56m:58s

Artifacts

@benpicco benpicco removed the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Nov 29, 2022
@benpicco
Copy link
Contributor Author

@kfessel 074b2e2 will now randomize the ID on init if it happens to be 0.

Copy link
Contributor

@kfessel kfessel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me (randomize if the no initialized memory is 0 (power loss) according to should match the coap-standards strong suggestion randomize the initial id)

@benpicco benpicco requested review from chrysn and maribu November 29, 2022 10:27
Copy link
Member

@maribu maribu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see inline

*
* @return A new message ID that can be used for a request or response.
*/
uint16_t coap_next_msg_id(void);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change so that the src and dst endpoint are available to the implementation?

The standard suggest three levels of granularity:

  1. Having one counter for each distinct src <--> dst pair
  2. Having one counter per network interface / subnet /
  3. Having one single globel counter

The reason for having more counter is that message IDs must not be reused within some time. It easily becomes the limiting factor on 802.3 or 802.11 networks. So, having multiple counters can increase throughput.

I think it is fine to keep using a single counter. But having a way to extend would be nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we better keep the separate implementations.
With nanocoap_sock it's trivial to attach such a counter to the nanocoap_sock_t struct.

If we want a generalized function we would need to work with arrays to look up the message ID, which sounds like a disproportionate overhead to me.


static void auto_init_coap_random(void)
{
if (_msg_id == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this assume that SRAM has a value of 0 on the first boot? To me understanding this is not necessarily the case and this us being relied upon by the PUF modules.

The PUF random seeding utility does track whether it is a cold boot. Maybe we should split this out an own (trivial) module and use that here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's why initially only wanted to use the .noinit section as it was done in nanocoap_sock.

@kfessel was worried that some MCUs would then always start with 0 on power-on-reset, so I added the random.

I wanted to keep the .noinit to keep sequence numbers sequential across reboots as I had issues with CoAP servers discarding old message IDs in the past (when they were reset after the reboot).

With always randomizing there is a small chance of this happening.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the standard actually wants the first msg IDs to be randomized. I do agree with .noinit, but I doubt that checking it to be zero is a good way to determine it is a cold boot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With always randomizing there is a small chance of this happening.

especially since random seed might not be random but RANDOM_SEED_DEFAULT or cpuid in these cases .noinit seems a better bet to not reuse the same msg_id.

checking for 0 we catch the cpus that have their memory cells drift to 0 if powered off .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking for 0 we catch the cpus that have their memory cells drift to 0 if powered off .

Note: Production variance has a significant impact on whether a given SRAM cell tends ti drift to zero or one. A few cells end up being randomly zero or one. With large enough SRAM this is a viable source of entropy.

So, there really is no sense in checking for it being zero. One could profile the memory at the address this gets linked to for a specific SRAM and end up with a classification that would yield a probability for it being a cold boot.

We do have reliable ways to reliably tell the reset cause: Some MCUs readily tell so. For others, we can store a magic cookie value into a .noinit variable. If it is still stored there on boot, it was a warm boot.

We may already have a module for that, I think this is ringing a bell.

@chrysn
Copy link
Member

chrysn commented Nov 29, 2022

There has been a similar proposal in #16730.

My preference would be that that either RIOT bless some mechanism of persisting data across reboots (which comes in various, possibly conflicting, sets of requirements1), or it be up to the application to do that persistence. Either way, while message-ID is one thing to persist, I'd rather have an opaque thing that the CoAP stack (or even the full OS) takes at startup and produces at any time, which may easily be just the MID for now (but may become larger as the stack takes on responsibilities, eg. when it enforces congestion avoidance, or when we want to enhance our RNG by keeping entropy across reboots).

Footnotes

  1. I think the main conflicting goals are "works reliably" vs "best effort is OK", which is connected to whether a clean shutdown is required, or whether it is cheap to do these updates often enough to be meaningfully usable even after a crash.

@benpicco
Copy link
Contributor Author

I still don't fully understand why randomness of the initial message ID is so important. All message IDs after that are sequential (even across requests) so why is it important that the first ID is randomized?

@chrysn
Copy link
Member

chrysn commented Nov 29, 2022 via email

@miri64
Copy link
Member

miri64 commented Nov 29, 2022

I still don't fully understand why randomness of the initial message ID is so important. All message IDs after that are sequential (even across requests) so why is it important that the first ID is randomized?

The growth could also be randomized IMHO (as long as it does not contradict the standard).

@chrysn
Copy link
Member

chrysn commented Nov 29, 2022 via email

@maribu
Copy link
Member

maribu commented Nov 29, 2022

Note that using an upcounting counter is the suggested implementation in the standard and everybody uses this. Some CoAP implementations rely one a counter being used to optimize duplicate detection. The QoS would drop when interacting with such implementations.

The 32 bit entropy in the CoAP token required by the standard should help more than the 16 bit msg ID in annoying attackers anyway.

@chrysn
Copy link
Member

chrysn commented Nov 29, 2022 via email

@benpicco
Copy link
Contributor Author

So to salvage this:

  • add msg_id field to nanocoap_sock_t, leave GCoAP alone
  • randomize the msg_id on connect

did I get that right?

@maribu
Copy link
Member

maribu commented Jan 20, 2023

A bit OT, but there's no guarantee of 32bit in the token. Message size sensitive applications will use the 0-length token as often as possible.

Hmm, I read the standard as that the Token is where you should put in your entropy against spoofing, if you want this (section 5.3.1):

A client that is connected to the general Internet SHOULD use at least 32 bits of randomness, keeping in mind that not being directly connected to the Internet is not necessarily sufficient protection against spoofing.

But then, I do wonder what the use cases are where adding 32 bit of entropy to your message will make a lot of difference. I honestly see no use case were using CoAP over plain UDP is sensible except for networks with L2 security and a firewall that prevents plain UDP based CoAP communication with nodes outside of that network (such as nodes in the Internet). If you have a device capable enough to act as a border router, I would expect that to also to be capable enough to act as secure endpoint of CoAP communication (such as CoAP over OSCORE or DTLS). And that could expose any resource provided by the constrained devices incapable of DTLS or OSCORE in the L2 network with added security.

@benpicco
Copy link
Contributor Author

closed in favor of #19178

@benpicco benpicco closed this Jan 26, 2023
@benpicco benpicco deleted the coap_next_msg_id branch January 26, 2023 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: CoAP Area: Constrained Application Protocol implementations Area: network Area: Networking Area: sys Area: System Type: cleanup The issue proposes a clean-up / The PR cleans-up parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants