Skip to content

SX126x HAL (restored) #19172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 25 commits into
base: master
Choose a base branch
from
Draft

SX126x HAL (restored) #19172

wants to merge 25 commits into from

Conversation

ansocket
Copy link

@ansocket ansocket commented Jan 19, 2023

Contribution description

It's a HAL implementation for Semtech SX126x driver. Basically created for RAK3172 module (STM32WLE5CC).

Testing procedure

gnrc_networking:
All basic functions work correctly.

tests/ieee802154_submac:
All should work correctly, if SF<=7 and BW>=125KHz, because ack timer can't wait more than ~65ms.

tests/ieee802154_hal:
Run and you should see:

  • Correct ACK replies after transmission
  • Correct address filter
  • CCA
ieee802154_hal transmission
main(): This is RIOT! (Version: 1fd33c-lora_rak3172)
Trying to register sx126x.
Success
Initialization successful - starting the shell now
> print_addr

A6:DE:12:64:DA:C5:B4:2B

> Size = 31
Frame received:
00000000  61  DC  00  23  00  2B  B4  C5  DA  64  12  DE  A6  05  84  4E  a..#.+...d.....N
00000010  08  50  68  EE  DE  4C  6F  72  65  6D  20  69  70  73  75      .Ph..Lorem ipsu
LQI: 255, RSSI: 200

Size = 31
Frame received:
00000000  61  DC  01  23  00  2B  B4  C5  DA  64  12  DE  A6  05  84  4E  a..#.+...d.....N
00000010  08  50  68  EE  DE  4C  6F  72  65  6D  20  69  70  73  75      .Ph..Lorem ipsu
LQI: 255, RSSI: 197
main(): This is RIOT! (Version: 1fd33c-lora_rak3172)
Trying to register sx126x.
Success
Initialization successful - starting the shell now
> txtsnd A6:DE:12:64:DA:C5:B4:2B 10 ackreq

Transmission succeeded

> Size = 3
Received valid ACK with sqn 0
00000000  02  00  00                                                      
LQI: 255, RSSI: 194

txtsnd A6:DE:12:64:DA:C5:B4:2B 10 ackreq

Transmission succeeded


> Size = 3
Received valid ACK with sqn 1
00000000  02  00  01                                                      
LQI: 255, RSSI: 194

Problems

Issues/PRs references

#19198

UDP: I accidently wiped the repo, so I create a new branch. Code is not changed.

@ansocket ansocket mentioned this pull request Jan 19, 2023
Copy link
Member

@jia200x jia200x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The driver implementation makes sense. It requires a round of functional changes and the rest should be only integration/cosmetic changes. Great job!

Here's the initial review.

#define MAC_TIMER_CHAN_ACK (0U) /**< MAC timer channel for transmitting an ACK frame */
#define MAC_TIMER_CHAN_IFS (1U) /**< MAC timer channel for handling IFS logic */

static uint8_t rxbuf[IEEE802154_FRAME_LEN_MAX]; /* len PHR + PSDU + LQI */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can probably remove these RX and TX buffers, since the _read function can copy directly from the internal framebuffer (by calling sx126x_reead_buffer) and the _write function can write directly to the TX framebuffer.


static uint8_t rxbuf[IEEE802154_FRAME_LEN_MAX]; /* len PHR + PSDU + LQI */
static uint8_t txbuf[IEEE802154_FRAME_LEN_MAX]; /* len PHR + PSDU + LQI */
static uint8_t ack[IEEE802154_ACK_FRAME_LEN];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also allocate uint8_t ack[IEEE802154_ACK_FRAME_LEN in the stack, instead of adding it as a global variable.

static uint8_t rxbuf[IEEE802154_FRAME_LEN_MAX]; /* len PHR + PSDU + LQI */
static uint8_t txbuf[IEEE802154_FRAME_LEN_MAX]; /* len PHR + PSDU + LQI */
static uint8_t ack[IEEE802154_ACK_FRAME_LEN];
static uint8_t _size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for all these global variables: please try to add them to the device descriptor (sx126x_t). Otherwise this only allows one instance of the driver.

static uint8_t sx126x_long_addr[IEEE802154_LONG_ADDRESS_LEN];
static uint16_t sx126x_pan_id;

#define SX126X_TIMER TIMER_DEV(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if possible, prefer ztimer here. Although it might be overkill to have a high level timer API here, it makes this driver 100% hardware independent. See e.g https://github.com/RIOT-OS/RIOT/blob/master/drivers/sx127x/sx127x_netdev.c#L86.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I suspect this is the reason of the "NO ACK" you receive.
The timer frequency is in the order of a IEEE 802.15.4 2.4 GHz (O-QPSK) symbol time (16 us).
The symbol time of LoRa SF7/BW125 is 1 ms. Since all timeouts are in symbols, the radio tries to send an ACK immediately.

Since the symbol time vary depending on the PHY configuration, this timeouts should probably be calculated on demand.

bool ack_filter : 1; /**< whether the ACK filter is activated or not */
bool promisc : 1; /**< whether the device is in promiscuous mode or not */
bool pending : 1; /**< whether there pending bit should be set in the ACK frame or not */
} cfg = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add this cfg to the device descriptor (sx126x).
The radio that initially used this pattern (nrf802154) it's a periph radio, so it's ensured there will be only one instance. That's not the case for sx126x.

uint8_t timeout;
cfg.ifs = true;
if (lifs) {
timeout = IEEE802154_LIFS_SYMS;
Copy link
Member

@jia200x jia200x Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For typical IEEE 802.15.4 transceivers the IFS is meant to be a processing time. For LoRa transceivers (whose transmission time is very large compared to common IEEE 802.15.4 transceivers) we can probably reduce these values. I propose to use something like:

#ifndef LORA_SIFS_SYMS
#define LORA_SIFS_SYMS ...
#endif

#ifndef LORA_LIFS_SYMS
#define LORA_LIFS_SYMS ...
#endif


/* Ignore send if packet size is 0 */
if (!pos) {
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Radio HAL assumes the packet is valid (e.g not null). Therefore you can skip these lines

return -EBADMSG;
}

memcpy(buf, rxbuf, size-2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned above, you can just call sx126x_read_buffer here. Then you don't need an extra buffer.

for (const iolist_t *iol = iolist; iol; iol = iol->iol_next) {
if (iol->iol_len > 0) {
sx126x_write_buffer(dev, pos, iol->iol_base, iol->iol_len);
memcpy((&txbuf[pos]), iol->iol_base, iol->iol_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just drop this TX buffer, as you are already copying the content to the internal framebuffer.

(void)cmd;
(void)value;

return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return 0;
return -ENOTSUP;

@jia200x
Copy link
Member

jia200x commented Jan 19, 2023

BTW we will need to find a way to allow both interfaces (netdev, Radio HAL) simultaneously. Otherwise it's not possible to use LoRaWAN with this radio.

@ansocket
Copy link
Author

@jia200x Wow, I didn't expect that your review will be so detailed. Thank you very much!
I'll check all your suggestions and make changes as soon as possible :)

@ansocket
Copy link
Author

ansocket commented Jan 24, 2023

Hello!

I made some changes into this implementation:

  • ACK correctly worked on my rak3172 board with ieee802154_submac or gnrc_networking examples. This mechanism reduced losses to 0%!
  • Added ztimer instead hardware timer. So @jdavid I hope you can run examples on your board successfully.
  • Some global variables deleted or moved in sx126x struct.
  • I changed ACK_TIMEOUT in submac.c to 50000us. It helps to receive every ACK. Probably, this value can be less.

P.S.
Thank you @jia200x for your help!

@jia200x
Copy link
Member

jia200x commented Jan 24, 2023

Great to hear!! Thanks to you for the contribution.

I changed ACK_TIMEOUT in submac.c to 50000us. It helps to receive every ACK. Probably, this value can be less.

So far this has been hardcoded because there SubMAC only worked for O-QPSK radios. But now that there are more alternatives, we should probably calculate it on demand. Otherwise, just changing this value will break existing O-QPSK radios.

We can probably get it from CAPS for now.

@jia200x
Copy link
Member

jia200x commented Jan 24, 2023

btw could you rebase this branch on top of lora_24012023 to bring the new commits here?

git checkout lora_rak3172
git rebase lora_24012023
git push -f

@github-actions github-actions bot added Area: boards Area: Board ports Area: cpu Area: CPU/MCU ports Area: doc Area: Documentation Area: drivers Area: Device drivers Area: Kconfig Area: Kconfig integration Area: LoRa Area: LoRa radio support Area: network Area: Networking Area: sys Area: System Area: tests Area: tests and testing framework Platform: ARM Platform: This PR/issue effects ARM-based platforms labels Jan 25, 2023
@@ -37,7 +37,7 @@
#include "test_utils/expect.h"
#include "xtimer.h"

#define SYMBOL_TIME (256U) /**< 16 us */
#define SYMBOL_TIME (1024U) /**< 16 us */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this application should also work for O-QPSK radios, we will need to calculate this on demand. I plan to propose a PR with a mechanism to do that (since we also need it for other SubGHz radios).

Copy link
Member

@jia200x jia200x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing the latest changes!
Here's a more detailed review of the technical aspects. After addressing these changes we should then focus on:

  1. Be able to run the HAL version as well as the netdev version of this radio
  2. Fix style issues/missing documentation
  3. Create semantic commits: one for the radio, one for the board, etc.

As mention before, I will propose in the meantime a mechanism to get the symbol time depending on the PHY.


#include "net/ieee802154/radio.h"

#include "event/thread.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "event/thread.h"

There shouldn't be a dependency to the event_thread module, because the upper layer may decide which queue should be used.

#define ENABLE_DEBUG 0
#include "debug.h"

#include "net/netdev/lora.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there shouldn't be any netdev dependency on this radio. If there's an existing macro in this file, we should consider moving it ti net/lora.h since it might indicate it's not specific to netdev.

#include <errno.h>
#include <stdio.h>

#include "net/gnrc.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be a dependency to GNRC here, because this device can be used with other network stacks.

Suggested change
#include "net/gnrc.h"

#include "ztimer.h"

#include "thread.h"
#include "periph/timer.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "periph/timer.h"

Since you are already using ztimer, you should remove this line too.

Comment on lines 24 to 25
const uint8_t llcc68_max_sf = LORA_SF11;
const uint8_t sx126x_max_sf = LORA_SF12;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const uint8_t llcc68_max_sf = LORA_SF11;
const uint8_t sx126x_max_sf = LORA_SF12;

This lines are not being used in the code. Since you are already using a fixed PHY setting, I would propose to simply remove these.

(void)hal;
(void)threshold;

return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's not implemented:

Suggested change
return 0;
return -ENOTSUP;

sx126x_set_channel(dev, (channel-11)*200000LU + 865500000LU);
}

assert(pow >= -14 && pow <= 22);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of asserting, you should return -EINVAL and don't apply any PHY settings. That gives the upper layer the opportunity to react.

assert(channel >= 11 && channel <= 26);

if (channel == 26) {
sx126x_set_channel(dev, 869525000LU);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use macros for these frequency, channel and power values. You can do something like:

#define SX126X_HAL_CHAN_26 (869525000LU)
#define SX126X_HAL_CHAN_BASE (865500000LU)
#define SX126X_HAL_CHAN_SPACING (200000LU)

#define SX126X_CHAN_MIN (11)
#define SX126X_CHAN_MAX (26)

#define SX126X_POWER_MIN (-14)
#define SX126X_POWER_MAX (22)

| IEEE802154_CAP_IRQ_RX_START
| IEEE802154_CAP_IRQ_TX_DONE
| IEEE802154_CAP_IRQ_CCA_DONE
//| IEEE802154_CAP_IRQ_ACK_TIMEOUT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//| IEEE802154_CAP_IRQ_ACK_TIMEOUT

@@ -0,0 +1,37 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need this file for the netdev implementation of this driver

@@ -0,0 +1,647 @@
#include <assert.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will also need to add the copyright header here. You can copy it from any RIOT C file and adjust it as needed. Don't forget to add your name to the author list 😉

@ansocket
Copy link
Author

Hello @jia200x

Thanks for your review! All changes accepted, I will work on it :)

#define ACK_TIMEOUT_US (864U)

//#define ACK_TIMEOUT_US (864U)
#define ACK_TIMEOUT_US (50000U)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already adapted the SubMAC in #19198 to calculate the timeout on demand.
You would just need to define a channel page for LoRa and adapt the ieee802154_get_symbol_time function. For the LoRa channel page, I would try to define a very high value to avoid colliding with future 802.15.4 PHY specifications... let's say something in the range of 0xFF00 to 0xFFFF.

@jia200x
Copy link
Member

jia200x commented Apr 4, 2023

hi @3JIouHoCoK ,

thank you so much for the last commits.
I was off for some weeks due to paper work, but in the meantime I could test the radio and it works quite fine. I only found small issues (that seem to be also present in the netdev version, so it's not related to this PR).
I also found some stuff that we should probably change before merging (e.g sending the transceiver to COLD_SLEEP erases RAM and therefore the transceiver losses its state).

I will try to come back during the week.

Copy link
Member

@jia200x jia200x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I'm back with some change requests.
I tested the driver on a nucleo-.wl55jc and it works fine, so we should probably focus on code style and nits.
For the final PR there shouldn't be file changes that don't belong to IEEE 802.15.4 (e.g the RAK board should be on its own PR).

#include "sx126x_params.h"
#include "sx126x_internal.h"

#define LORA_ACK_REPLY_US 1024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to align it with IEEE 802.15.4, this should be 54 symbols.
If using SF7BW125, this means 54 * 1024[us] = 55.3 [ms]

}
}

sx126x_set_lora_payload_length(dev, pos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current SubMAC assumes the radio adds the CRC.
This means, the write function should calculate it:

 static int _write(ieee802154_dev_t *hal, const iolist_t *iolist){
 
+
     sx126x_t *dev = hal->priv;
     (void)dev;
     size_t pos = 0;
+    uint16_t chksum = 0;
     /* Full buffer used for Tx */
     sx126x_set_buffer_base_address(dev, 0x80, 0x00);
     /* Write payload buffer */
     for (const iolist_t *iol = iolist; iol; iol = iol->iol_next) {
         if (iol->iol_len > 0) {
             sx126x_write_buffer(dev, pos + 0x80, iol->iol_base, iol->iol_len);
+            memcpy(buffer + pos, iol->iol_base, iol->iol_len);
             DEBUG("[sx126x] netdev: send: wrote data to payload buffer.\n");
             pos += iol->iol_len;
+            chksum = ucrc16_calc_le(iol->iol_base, iol->iol_len,
+                                UCRC16_CCITT_POLY_LE, chksum);
         }
     }
-
+    chksum = byteorder_htols(chksum).u16;
+    /* Include CRC */
+    sx126x_write_buffer(dev, pos + 0x80, (uint8_t*) &chksum, sizeof(chksum));
+    memcpy(buffer + pos, &chksum, sizeof(chksum));
+    pos += 2;
     sx126x_set_lora_payload_length(dev, pos);
     DEBUG("[sx126x] writing (size: %d).\n", (pos));
     return 0;
 }

(void)hal;
sx126x_t *dev = hal->priv;
sx126x_rx_buffer_status_t rx_buffer_status;
sx126x_get_rx_buffer_status(dev, &rx_buffer_status);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the transceiver build the CRC footer (see above), this function should return the length of the packet without the 2 bytes of the CRC:

     sx126x_get_rx_buffer_status(dev, &rx_buffer_status);
-    dev->size = rx_buffer_status.pld_len_in_bytes;
+    /* Exclude CRC */
+    dev->size = rx_buffer_status.pld_len_in_bytes - 2;

return rx_buffer_status.pld_len_in_bytes;
}

static int _read(ieee802154_dev_t *hal, void *buf, size_t max_size, ieee802154_rx_info_t *info)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for the CRC: here we calculate the CRC and compare it with the one from the frame.

You can base on this snippet (please ignore the debug messages, and the code is a bit old)

@@ -450,21 +351,34 @@ static int _read(ieee802154_dev_t *hal, void *buf, size_t max_size, ieee802154_r
     (void)hal;
     (void)buf;
     (void)info;
+    uint16_t chksum = 0;
+    uint16_t exp_chksum;
 
+    DEBUG("[sx126x] _read\n");
     sx126x_t* dev = hal->priv;
 
         /* Getting information about last received packet */
-    netdev_lora_rx_info_t *packet_info = (void*)info;
+    //netdev_lora_rx_info_t *packet_info = (void*)info;
     sx126x_rx_buffer_status_t rx_buffer_status;
     sx126x_pkt_status_lora_t pkt_status;
     sx126x_get_rx_buffer_status(dev, &rx_buffer_status);
 
+    /* Size including CRC */
     dev->size = rx_buffer_status.pld_len_in_bytes;
+
     sx126x_get_lora_pkt_status(dev, &pkt_status);
+
+    /* TODO */
+    if (info) {
+        info->lqi = 255;
+        info->rssi = 100;
+    }
+    /*
     if (packet_info) {
         packet_info->snr = pkt_status.snr_pkt_in_db;
         packet_info->rssi = pkt_status.rssi_pkt_in_dbm;
     }
+    */
 
       /* Put PSDU to the output buffer */
     
@@ -472,14 +386,26 @@ static int _read(ieee802154_dev_t *hal, void *buf, size_t max_size, ieee802154_r
         return dev->size;
     }
 
-    if (dev->size > max_size) {
+    /* Exclude CRC */
+    if (dev->size > max_size + 2) {
         return -ENOBUFS;
     }
 
     if (dev->size < 3) {
         return -EBADMSG;
     }
-    sx126x_read_buffer(dev, rx_buffer_status.buffer_start_pointer, (uint8_t*)buf, dev->size);
+    /* Copy to buffer */
+    sx126x_read_buffer(dev, rx_buffer_status.buffer_start_pointer, (uint8_t*)buf, dev->size-2);
+    sx126x_read_buffer(dev, rx_buffer_status.buffer_start_pointer + dev->size-2, (uint8_t*)&exp_chksum, sizeof(exp_chksum));
+    chksum = ucrc16_calc_le(buf, dev->size-2,
+                        UCRC16_CCITT_POLY_LE, chksum);
+    chksum = byteorder_htols(chksum).u16;
+    /* Validate checksum */
+    if (chksum != exp_chksum) {
+        puts("CRC_F");
+        return -EBADMSG;
+    }
+
     DEBUG("[sx126x] first 3 bytes of received packet: %d %d %d\n", *(uint8_t*)buf, *((uint8_t*)buf+1), *((uint8_t*)buf+2));
     return dev->size;
 }

{
(void)hal;
sx126x_t *dev = hal->priv;
sx126x_set_sleep(dev, SX126X_SLEEP_CFG_COLD_START);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sx126x_set_sleep(dev, SX126X_SLEEP_CFG_COLD_START);
sx126x_set_sleep(dev, SX126X_SLEEP_CFG_WARM_START);

Cold start does not have RAM retention, which means the transceiver losses its state every time it sleeps. It is possible to implement a wake up procedure that recovers the radio, but it's probably easier to simply go to warm sleep.

dev->cad_params.cad_detect_min = 10,
dev->cad_params.cad_detect_peak = 22,
dev->cad_params.cad_symb_nb = SX126X_CAD_02_SYMB,
dev->cad_params.cad_timeout = 0x000F00;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is cad_timeout in symbols? If so, this value is too big. In case of IEEE 802.15.4, I would set it to 0x08

| IEEE802154_CAP_IRQ_RX_START
| IEEE802154_CAP_IRQ_TX_DONE
| IEEE802154_CAP_IRQ_CCA_DONE
| IEEE802154_CAP_PHY_BPSK,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This driver exposes only PHY. We should consider adding a IEEE802154_CAP_PHY_LORA for this.

if (sx126x_is_stm32wl(dev)) {
#if IS_USED(MODULE_SX126X_STM32WL)
_dev = netdev;
event_callback_init(&sx126x_ev_callback, (void*)_sx126x_handler, (void*)netdev);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the netdev implementation already has its mechanism to offload IRQ. Therefore, this could break such mechanism. I propose to leave it as it was before

Comment on lines 43 to 51
void isr_subghz_radio(void)
{
/* Disable NVIC to avoid ISR conflict in CPU. */
NVIC_DisableIRQ(SUBGHZ_Radio_IRQn);
NVIC_ClearPendingIRQ(SUBGHZ_Radio_IRQn);
netdev_trigger_event_isr(_dev);
cortexm_isr_end();
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should still be there, otherwise the IRQ offloading mechanism of netdev won't work.
Since the HAL and netdev implementation will never be compiled together, you can just leave this function there

@ansocket
Copy link
Author

Hello @jia200x
Thanks for your review! I'll make changes in the next few days.

@ansocket
Copy link
Author

ansocket commented Apr 25, 2023

Hello @jia200x
I made some new changes as you wrote at your review. But one old problem disturbs me a little.
I think, my implementation of ACK replies needs tests. Sometimes I catch a problem when I send "ping" packets. Console sends me:

gnrc_netif: can't queue packet for sending.

I think, multiple threads try to use the driver when it sends a reply. Maybe I have an issue in interrupt context when ACK is received (or not).
Could you (or someone else, who is using this driver ;) ) check it, please?

@ansocket ansocket closed this Apr 26, 2023
@ansocket ansocket deleted the lora_rak3172 branch April 26, 2023 03:16
@ansocket ansocket restored the lora_rak3172 branch April 26, 2023 03:16
@ansocket
Copy link
Author

Hello @jia200x !
I closed this PR accidently again! I didn't know that renaming a branch can close a pr :(
But as I understand correctly, I should make a pr with only sx126x driver without any board files. Should I open a new PR or change something in this?

@ansocket ansocket reopened this Apr 26, 2023
}

#if IS_ACTIVE(SX126X_SPI)
static void _dio1_isr(void *arg)
{
netdev_trigger_event_isr(arg);
_sx126x_handler(arg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is calling the handler on interrupt context

@jia200x
Copy link
Member

jia200x commented Apr 26, 2023

Hello,

I think, my implementation of ACK replies needs tests. Sometimes I catch a problem when I send "ping" packets. Console sends me:

gnrc_netif: can't queue packet for sending.

Could you please send around the ping parameters?
This usually occurs when the stack transmit faster than the transceiver, which in case of LoRa is likely. Note that LoRa transceivers are orders of magnitude slower than standard IEEE 802.15.4 transceivers. Sending 8 bytes takes around 60 ms with SF7BW125. The same packet in a IEEE 802.15.4 O-QPSK transceiver would probably take less than 1 ms.
When adding ACK logic on top, these times on air add up considerably.

What is important to check is whether there are pktbuf leaks or not. You can use the pktbuf command for that (add USEMODULE += shell_cmd_gnrc_pktbuf to your application Makefile).

I think, multiple threads try to use the driver when it sends a reply. Maybe I have an issue in interrupt context when ACK is received (or not).

I will give it a look.

@jia200x
Copy link
Member

jia200x commented Apr 26, 2023

But as I understand correctly, I should make a pr with only sx126x driver without any board files. Should I open a new PR or change something in this?

Hmmm it might be easier to open a new PR and use this one as a reference.
You should just copy the SX126X related files + the tests (hal and submac)

@ansocket
Copy link
Author

ansocket commented Apr 26, 2023

Could you please send around the ping parameters?

Default 1s ping . The issue shows randomly (mb). So I could get this after 5-7 packets and didn't after 10000. SF7BW125 ACK on.

@jia200x
Copy link
Member

jia200x commented Apr 27, 2023

Default 1s ping . The issue shows randomly (mb). So I could get this after 5-7 packets and didn't after 10000. SF7BW125 ACK on.

What does the output of pktbuf show? Does the node recover after some time?

@ansocket
Copy link
Author

What does the output of pktbuf show? Does the node recover after some time?

I'll try to check it tomorrow.

@ansocket
Copy link
Author

ansocket commented Apr 28, 2023

Hello, @jia200x

What does the output of pktbuf show

pktbuf after the problem:
PING ACK pktbuf.txt

Does the node recover after some time?

I had waited about 15 minutes and the board didn't recover. But the board still have RF switch staying in TX state.

@jia200x
Copy link
Member

jia200x commented Apr 28, 2023

hmmmm ok, there's definitely something wrong... I suspect there could be an issue with the IRQ processing, as this usually happen when the SubMAC is not informed about a TX Done event.
I will give it a try during next week.

@ansocket
Copy link
Author

ansocket commented Apr 28, 2023

I'm not sure but current ack realization by ztimer doesn't look safe :)
I think ack reply should run in the same thread with other sx126x stuff. If we are starting to make an ack reply, but at this moment ack timeout is fired, what should be happened? I should think about similar situations.

Today I tested TCP connection by Sock API without ACK replies by sending a simple string with 2s delay (gnrc_border_router connected to the host machine). After a couple of hours TCP connection was broken and global address (given by dpcp6) was deprecated. And I'm not sure what happened.
Upd. border router just stop sending any messages to ethos after some time. Probably this issue described in #16398, so it doesn't depend on this driver :)

/* Disable NVIC to avoid ISR conflict in CPU. */
NVIC_DisableIRQ(SUBGHZ_Radio_IRQn);
NVIC_ClearPendingIRQ(SUBGHZ_Radio_IRQn);
event_post(EVENT_PRIO_MEDIUM, (event_t*)&sx126x_ev_callback);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized this is being posted to a queue running in a thread that has different priority than the thread that runs the driver (gnrc_netif_ieee802154).
To solve that, you should probably set an event_queue_t pointer (e.g in sx126x_init) and do:

Suggested change
event_post(EVENT_PRIO_MEDIUM, (event_t*)&sx126x_ev_callback);
event_post(sx126x_dev->evq, (event_t*)&sx126x_ev_callback);

This sx126x_dev variable should point to the (unique) device descriptor (available only when using the perioph version of the driver)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you check if that solves the issue?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, @jia200x
Yes, I check it in a couple of days.

Copy link
Author

@ansocket ansocket May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @jia200x

Sorry for the delay.
I changed priority of the event thread by USEMODULE += event_thread_highest but it didn't solve the problem with acks. Probably I don't understand the way that you suggest. I just don't know how I should put the hal pointer to ISR handler except the event_callback mechanism. And I have not periph version of this module to check this idea with queue :(

#endif
}

static void ack_timer_cb(void *arg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transceiver is so slow, that you might consider offloading this using event_post.
This was not possible with e.g the nrf802154, as the reply should take less than 192 us.
Doing that will definitely prevent any synchronization issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: boards Area: Board ports Area: cpu Area: CPU/MCU ports Area: doc Area: Documentation Area: drivers Area: Device drivers Area: examples Area: Example Applications Area: Kconfig Area: Kconfig integration Area: LoRa Area: LoRa radio support Area: network Area: Networking Area: sys Area: System Area: tests Area: tests and testing framework Platform: ARM Platform: This PR/issue effects ARM-based platforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants