Skip to content

Conversation

maribu
Copy link
Member

@maribu maribu commented Jul 30, 2020

Contribution description

Changed the API of netdev_driver_t:

  • The send() function should no longer return the number of bytes and should not block
  • The upper layer now must call the new confirm_send() function after calling send(); either busy waiting until something different to -EBUSY is returned, or after NETDEV_EVENT_TX_COMPLETE was signaled

During transition to the new API, the upper layer must remain backward compatible and must assume the legacy API if netdev_driver_t::confirm_send() is NULL.

Testing procedure

Just reading the doc and Murdock should be sufficient. Without any driver and no upper layer code being touched, this PR will just increase RAM consumption by one function pointer per network driver.

Issues/PRs references

Alternative to: #14625

@maribu maribu added Area: drivers Area: Device drivers Process: API change Integration Process: PR contains or issue proposes an API change. Should be handled with care. Process: needs >1 ACK Integration Process: This PR requires more than one ACK State: waiting for other PR State: The PR requires another PR to be merged first labels Jul 30, 2020
Comment on lines +349 to +366
* @return Number of bytes transmitted. (The size of the transmitted
* frame including all overhead, such as frame check sequence,
* bit stuffing, escaping, headers, trailers, preambles, start of
* frame delimiters, etc. May be an estimate for performance
* reasons.)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miri64: We had a discussion on what this size should be in an issue. This interpretation is different to what most drivers implement (in the send(), not the confirm_send() function), but as porting would result in work anyway, I figured we should rather go for the best solution than the lazy solution.

IMO there is little reason to just return the result of iolist_size(iolist) on what was passed in send(), as the upper layer could just call iolist_size() directly to get this. So if we really want additional data for statistics, it should be IMO a value that has the most value when the value of iolist_size(iolist) is known anyway. And to me this is the number of bytes physically transmitted; as this would allow to calculate the layer 2 throughput, while iolist_size(iolist) can provide the layer 2 goodput.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last sentence is wrong. The netif can easily calculate the sizes needed for goodput calculation, but it would need to make sure to call iolist_size(iolist) prior to adding the L2 header (and trailer, if applicable). But the reasoning should still be valid.

Comment on lines 315 to 330
* @param[in] iolist IO vector list to send. Elements of this list may
* have iolist_t::iol_size == 0 and (in this case only)
* iolist_t::iol_data == 0.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this change. Previously the API doc read as if a driver would need to expect elements in the iolist that have iolist_t::iol_data == NULL but iolist_t::iol_size > 0. I think this was just a mistake in formulation and not intentionally.

@jia200x
Copy link
Member

jia200x commented Jul 31, 2020

nice one!

IMO we could also specify that the frame from the send function is always a PSDU as well (Full L2 frame). What do you think?

@maribu
Copy link
Member Author

maribu commented Jul 31, 2020

IMO we could also specify that the frame from the send function is always a PSDU as well (Full L2 frame). What do you think?

I mostly agree, but I would like to leave the FCS calculation to the driver, as hardware support for that is very common.

@maribu
Copy link
Member Author

maribu commented Jul 31, 2020

IMO we could also specify that the frame from the send function is always a PSDU as well (Full L2 frame). What do you think?

I mostly agree, but I would like to leave the FCS calculation to the driver, as hardware support for that is very common.

And stuff like escaping (e.g. slipdev) or bitstuffing is also be better left to the driver, as doing this on the fly can be implemented quite efficiently, while doing so in advance involves a lot of memmove(), and resizing of the buffer.

@jia200x
Copy link
Member

jia200x commented Aug 4, 2020

I guess this should be rebased on top of #14657

@maribu maribu force-pushed the netdev_confirm_send branch from a272b86 to 64b85cd Compare August 4, 2020 10:08
@maribu
Copy link
Member Author

maribu commented Aug 4, 2020

I guess this should be rebased on top of #14657

Yes. Done :-)

* the data pending info in netdev_driver_t::confirm_send
* via the `info` parameter
*/
NETDEV_EVENT_TX_COMPLETE_DATA_PENDING,
NETDEV_EVENT_TX_NOACK, /**< ACK requested but not received */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoudln't NOACK and TX_MEDIUM_BUSY be included as well? They are resolved at the same time

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I changed the return values to have -EGAIN in confirm_send() when called to early. This matches the semantics of the error code better. And this allows to return -EBUSY when the medium was busy; which is consistent with the interpretation of the same error code issued by send().

@jia200x
Copy link
Member

jia200x commented Aug 4, 2020

I'm +1 with the fundamentals. Before going into style/wording reviews, @miri64 @bergzand do you agree with the proposed changes?

@maribu maribu removed the State: waiting for other PR State: The PR requires another PR to be merged first label Aug 12, 2020
@maribu
Copy link
Member Author

maribu commented Aug 12, 2020

It would be nice to get some feedback on this.

I have now implemented a PTP client on top of the sock_udp that uses the timestamp in the auxiliary data for synchronization. The integration for the RX timestamp was pretty straight forward:

  1. Use the void *rx_info in netdev_driver_t::recv() to pass the timestamp from the netdev to the netif
  2. Extend gnrc_netif_hdr_t to contain a timestamp (if a pseudo-module is used, so that overhead only applies if the timestamping is actually needed)
  3. Extend gnrc_sock_recv() to also extract the timestamp from the gnrc_netif_hdr_t
  4. Extend the gnrc_sock_udp & gnrc_sock_ip to pass the timestamp from gnrc_sock_recv() to the user

With that, PTP synchronization does already work. Without TX timestamps, the network delay cannot yet be estimated and thus not yet be compensated.

I would now like to also implement the missing blocks for TX timestamp to also estimate the network delay. However, I would like to have a rough agreement on whether this approach or #14625 will make the race.

@jia200x
Copy link
Member

jia200x commented Aug 12, 2020

I totally agree with the proposed change. We need more feedback though since it's an API change.

@miri64 @bergzand @PeterKietzmann @kaspar030 ?

@miri64
Copy link
Member

miri64 commented Aug 26, 2020

If it helps, I am all for it :-).

* NETDEV_EVENT_TX_COMPLETE event. This function must not return
* `-EAGAIN` after that event was received.
*/
int (*confirm_send)(netdev_t *dev, void *info);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is going to call that function and what are they going to do with the number of bytes sent?

The 802.15.4 sub-mac will call a callback on TX done, can we do this here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is going to call that function

E.g. in the current GNRC network stack, I would implement this in the gnrc_netif. That way the gnrc_netif could turn the asynchronous non-blocking API into a "blocking" API. (I'm not sure if blocking is the right word, but the gnrc_netif will only act upon a message when the transmission is fully complete. That way if e.g. the 6lowpan implementation would send 3 frames as fragments of a single IPv6 packet, the gnrc_netif would send them one by one. That way the driver does no longer need to block on send() for link layer fragmentation to work.)

and what are they going to do with the number of bytes sent?

The number of bytes sent is only useful to gather L2 stats IMO. So unless the l2 stats module is used, the caller will only check for the return value to be non-negative.

The 802.15.4 sub-mac will call a callback on TX done, can we do this here too?

In the long term I would like to adopt those ideas of the 802.15.4 radio HAL that can be generically applied to all network devices for the netdev API. As said, my personal goal is that in the end a compatible subset of the radio HAL becomes the "generic network HAL". That way code duplication between different network technologies could be avoided, while still being able to exploit technology specific features through the technology specific API additions on top of the "generic network HAL".

But I think it would be better to change the netdev API in small steps.

Copy link
Contributor

@benpicco benpicco Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So would this do

    dev->send();
    while (dev->confirm_send() == -EAGAIN) {}

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But more likely you want to block until NETDEV_EVENT_TX_COMPLETE is issued.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would we get NETDEV_EVENT_TX_COMPLETE but have confirm_send() return -EAGAIN?

Shouldn’t one be enough?
I see also have .confirm_tx in the radio HAL.
I’d like to have @jia200x‘s 2 ct on this so we don’t end up with two slightly incompatible mechanisms for the same thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only mechanism that tells the upper layer that a packet was received is when confirm_transmit() returns 0.
The thing is, some radios have an "TX DONE" IRQ (and impressively, some like CC2420 don't). If the cap IEEE802154_CAP_IRQ_TX_DONE is available, the upper layer knows that it can e.g lock a mutex and unlock it on IEEE802154_CONFIRM_TX_DONE. But e.g for CC2420 the lower layer needs to poll until confirm_transmit return 0.

So in the Radio HAL officially there's only one mechanism (the confirm function).

Maybe we should update the documentation of NETDEV_EVENT_TX_COMPLETE to reflect that's not a mandatory event.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we instead use a timer in the CC2420 driver and keep the API easier instead? At least the driver should be in an excellent position to optimize the timeout duration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing blocks the driver to implement such a feature. however, the timeout should make sure that confirm_send returns 0. Otherwise the API is not respected

Copy link
Contributor

@benpicco benpicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for taking the burden of blocking .send away from the driver.
I take that this is not in conflict with the radio HAL work, so have my ACK.

@jia200x
Copy link
Member

jia200x commented Sep 11, 2020

I'm all for taking the burden of blocking .send away from the driver.
I take that this is not in conflict with the radio HAL work, so have my ACK.

+1

Note that although not ideal, now it's possible to make all radios that return -EBUSY on send work with GNRC without needing gnrc_netif_pktq. This could quickly enable sending 6lowpan fragments without adding the overhead of the queue. But note this is absolutely not intended for the long term.

@benpicco
Copy link
Contributor

Yea ideally the thread that produces those fragments should block instead of filling up a queue.

@benpicco
Copy link
Contributor

+1

Is this an ACK 😉 ?

@benpicco benpicco added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Dec 21, 2020
@maribu
Copy link
Member Author

maribu commented Dec 21, 2020

Let me squash and fix the whitespace errors. By now, the history of fixups is no useful information anymore, as I assume the memory of the reviewers of this PR has faded to much, so that they need to re-review the whole PR anyway.

@maribu maribu force-pushed the netdev_confirm_send branch from 0e97732 to da15774 Compare December 21, 2020 16:28
@maribu maribu mentioned this pull request Dec 23, 2020
2 tasks
@jia200x
Copy link
Member

jia200x commented Jan 5, 2021

what's the status of this one? is it close to go? :)

@maribu
Copy link
Member Author

maribu commented Jan 5, 2021

what's the status of this one? is it close to go? :)

I'd hit the ACK button right away once you provide the second ACK ;-)

@maribu maribu added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Jan 5, 2021
@jia200x
Copy link
Member

jia200x commented Jan 5, 2021

I think it would be nice to add the deprecation release (21.10?). Otherwise I'm fine.
For me it's ok if you want to add it now or directly after in a follow up. As you wish :)

Otherwise, I ACK upon reply :)

Changed the API of `netdev_driver_t`:

- The `send()` function should no longer return the number of bytes and should
  not block
- The upper layer now must call the new `confirm_send()` function after
  calling `send()`; either busy waiting until something different to `-EBUSY`
  is returned, or after `NETDEV_EVENT_TX_COMPLETE` was signaled

During transition to the new API, the upper layer must remain backward
compatible and must assume the legacy API if `netdev_driver_t::confirm_send()`
is `NULL`.
@maribu maribu force-pushed the netdev_confirm_send branch from da15774 to 53fcb97 Compare January 5, 2021 14:34
@maribu
Copy link
Member Author

maribu commented Jan 5, 2021

I think it would be nice to add the deprecation release (21.10?). Otherwise I'm fine.

My idea was to update gnrc_netif as a follow up to make use of the new behavior and add a pseudo-module that enables backward compatibility. Without this module presence of send_confirm could be assert()ed to ease debugging. And then we could deprecate the backward compatibility pseudo-module. Via makefiles/deprecated_modules.inc.mk out of tree users would get a motivating warning if they use drivers with legacy behavior :-)

@maribu
Copy link
Member Author

maribu commented Jan 5, 2021

I directly amended a typo fix ("An this case" --> "In this case")

Copy link
Member

@jia200x jia200x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright! ACK

@maribu maribu added Process: needs >1 ACK Integration Process: This PR requires more than one ACK and removed Process: needs >1 ACK Integration Process: This PR requires more than one ACK labels Jan 5, 2021
@maribu maribu merged commit e5a0154 into RIOT-OS:master Jan 5, 2021
@maribu
Copy link
Member Author

maribu commented Jan 5, 2021

Thanks everyone :-)

@Teufelchen1
Copy link
Contributor

Just a short reminder that the deprecated enums are still in use within RIOT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: drivers Area: Device drivers CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Process: API change Integration Process: PR contains or issue proposes an API change. Should be handled with care. Process: needs >1 ACK Integration Process: This PR requires more than one ACK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants