-
Notifications
You must be signed in to change notification settings - Fork 2.1k
drivers/periph: Added gpio_abc (GPIO Advanced Bitbanging Capabilities) #12015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hmm, out of habit I seem to have put the OVGU in the copyright here and there, but this is actually a spare time project. I'll check and fix this on the weekend |
gpio_abc
(GPIO Advances Bitbanging Capabilities)
I did this with the usual API once, by setting one timer to do PWM on some unconected output pin with a fixed frequency. Then I used that frequency pin to get delays:
This worked well for bitbanging at ~2MHz IIRC, on an STM32F401 running at 80MHz. |
This does not provide the required accuracy for my use case (driving NeoPixel LEDs), as the overhead of a call to By implementing the API proposed in this PR on a STM32F103C8 (@ 76MHz) I get a worst case accuracy of +/- 55ns - so well within the tolerance. So I assume this approach to work provide the required timing for driving NeoPixels on most ARM platforms. |
@PeterKietzmann and @gschorcht: I saw that you have provided a bit-banging SPI and I2C driver. I think the API of this PR could be easily extended to be useful for protocols with a clock and a data line (e.g. I2C) by adding this: /**
* @brief Read the GPIO after waiting for the specified delay
* @return Returns `0` if pin is low, and `>0` if pin is high
*/
int gpio_read_at(gpio_t pin, int delay); (The clock and data signals are never changed at the same time, as the data signal must already be in place when the edge of the clock comes. Therefore, no function to handle two pins simultaneously is needed here.) And the following extension should be enough for a full-duplex bit-banging solution (e.g. SPI): /**
* @brief Sets the pin in @p out_pin, waits for @p delay and then reads the pin @p in_pin
* @return Returns `0` if pin is low, and `>0` if pin is high
*/
int gpio_set_and_read_at(gpio_t out_pin, gpio_t in_pin, int delay);
/**
* @brief Clears the pin in @p out_pin, waits for @p delay and then reads the pin @p in_pin
* @return Returns `0` if pin is low, and `>0` if pin is high
*/
int gpio_clear_and_read_at(gpio_t out_pin, gpio_t in_pin, int delay); For SPI the MOSI pin should be set/clear prior to the clock edge, then the clock edge, and after the clock edge the MISO pin should be sampled. The above functions could be used to provide the clock edge followed by the sampling. (During testing I experienced that the function call adds the biggest overhead in time. For that reason using Would you be interested in having the API extended to be usable for those cases? If so, I'd happily do this. |
Did you try "LTO=1"? |
That's pretty good. I'll try to measure how the PWM pin accuracy ended up being. |
00000000 <gpio_set>:
0: 2301 movs r3, #1
2: f020 020f bic.w r2, r0, #15
6: f000 000f and.w r0, r0, #15
a: fa03 f000 lsl.w r0, r3, r0
e: 6110 str r0, [r2, #16]
10: 4770 bx lr I didn't. If the compiler inlines that function, I'd expect the overhead to go down from e.g. 333ns (if I remember correctly) on the BluePill to something like 70ns (5 CPU cycles, see disassembly above). But we also need to take the gpio_read() into account, which likely adds about another 5 cycles. My wild guess would be that the resolution would be about 140ns if LTO is enabled and the compiler does indeed inline the function. (That would be an accuracy of +-70ns, so good enough for my use case, but not as good as counting CPU cycles. On SAM3 CPUs the accuracy btw. is +-18ms, as the CPU only takes 3 cycles per delay loop. That accuracy will be hard to reach with PWM.) But having LTO on is no guarantee that a compiler indeed inlines specific functions. Maybe choosing not to inline results in a smaller binary and the compiler's policy is to prefer that over speed. |
I just noticed that the calculation of the accuracy was incorrect. I updated it. The correct accuracy for the BluePill is +-49ns. |
Can the ABC be ported to a new GPIO peripheral implementation without the help of an oscilloscope, for example by running long loops and comparing them to an established timer? How much fluctuation do you expect from compiler version changes? How does the time needed to decide the next bit (eg. Did you consider busy-waiting for a high-speed timer like SystemCoreClk on fixed times? (I'm thinking along the lines of |
Likely yes. Most likely this approach could be used to automatically obtain the two parameters using in the ABC implementations. I can provide an implementation for that. On the other hand cheap logic analyzers (10€ to 20€) already have a sample rate of 24MHz, which is enough to measure the parameters. If I had to add another GPIO ABC implementation, I would likely still use the logical analyzer to verify the automatically obtained values from the long loop timings.
In the delay loop: None, as there inline assembly is used (for just this reason). This is also the point which matters most, as fluctuation would accumulate with each loop iteration. In the overhead of a function call: Likely little to none, unless LTO is used. The actual setting or clearing of the GPIO pin before the delay loop is implemented in C. But that is currently implemented for all platforms I'm aware of by generating a bitmask and binary-or (in case of setting) or binary-and (in case of clearing) it into a memory location marked as
This could be used when a high speed hardware timer with the required resolution is available. On the hardware I have in mind (BluePill, STM32F103) the timer could actually run with the CPU frequency (72MHz, ~14ns per tick), which would work quite well. But I think it would be a bad idea to access the hardware directly (bypassing The approach of counting CPU cycles is very portable (even though the implementation is not), as no specific hardware requirements such as high resolution timers are required. This is why I decided to go for that. But keep in mind: The user facing API of this PR is not bound in any way to one implementation. This PR adds some code useful for implementations that do count CPU cycles, but this is an internal implementation detail. I see no reason to not implement GPIO ABC using high resolution timers, if that implementation takes care to not break |
Thanks for your detailed response, that does answer most of my questions (also I remembered that platforms like ATmega don't have a CPU cycle counting timer). What I still don't understand is the open one:
In other words, how is the time the code between the the abc calls compensated for? Is there an implicit assumption that it will always be a few clock cycles (get value from RAM, shift and bitmask, decide on set/clear or duration) that's adjusted for in the overhead cycles, or is an estimate provided by the user? |
Exactly. The assumption is that no computation requiring more than 150ns of CPU cycles (~10 on a 72MHz BluePill) is done in advance, so that between calls to
I think that this would be painful to use, as end users of that API would have trouble to figure this out. Also, this would depend on a lot of factors. Being asked to do as little as possible between calls to I'll check the API doc if this assumption is clearly communicated and extend, if not. |
I'm doing a rebase (no squashing, no changes expect fixing merge conflicts) to resolve the merge conflict with Makefile.dep. |
I updated the API and documentation to be more useful for implementations not doing CPU cycles counting:
I also updated the test to automatically detect the parameters with the shell command The STM32F3 implementation has been rebased on top of this. |
The submodule periph_gpio_abc extends the GPIO driver by Advanced Bitbanging Capabilities (ABC). It provides means to control GPIOs with a precision of 0.15µs or better.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions. |
Not stale |
# Enable periph_gpio when periph_gpio_irq is enabled | ||
ifneq (,$(filter periph_gpio_irq,$(USEMODULE))) | ||
# Add periph_gpio when one or more of its submodules is used | ||
ifneq (,$(filter periph_gpio_%,$(USEMODULE))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, the module periph_gpio_ext
would then also activate periph_gpio
, which should not necessarily be the case. Theoretically it would be possible to extend a board with an I2C-GPIO-Extender which does not have the periph_gpio
feature but has the periph_i2c
feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the support for external GPIOs is merged, I will rebase and update.
I'm not sure if the API will be implemented by external GPIOs as well, as strict timing requirements do not match well with a serially attached GPIO extender. (Especially with an I2C extender sharing the bus with a notorious clock stretcher.)
Maybe this API should be periph only. Slow bit banging can also be done quite well with xtimer/ztimer and the regular GPIO API.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions. |
I just gave this a try with nrf5x. A few merge conflicts have accumulated, but even with crude resolution it worked. The test application was useful for getting the values right which I just took from STM for a first iteration; after filling in what the ones the app recommends, my WS2812 (compatible; SK6812 RGBAW) showed neat colors, and I didn't need to involve either good understanding of the involved clocks nor an oscilloscope for a working port. On the high level view I'm not convinced that GPIO ABC as it is here is the best abstraction level, but it is useful, and if a better pattern emerges (I'm hoping for "describe the pulse pattern, and let the tools figure out whether a periph can do that w/o CPU involvement or a high-prio thread is needed), it can still fall back to this. Could you resolve Gunar's comment by avoiding build-system interactions with #14602 (I agree this shoild not be expected to work with non-native GPIO), and rebase to master for a more low-altitide round of review? (And poke me if I fail to deliver that in a timely fashion)? [edit: 14602 reference had a typo] |
Having been the only one to comment to keep this from staling, I'm giving this back its "stale" label, in hopes that the work in #14602 (or the successor announced in the latest comments) would make the need for this concrete solution go away. |
Contribution description
The Problem
The platform independent GPIO API and the xtimer API are very convenient, but does not provide the features required for implementing bitbanging protocols with tight timing requirements (less than 0.5µs). In fact, for this resolution hardware timers cannot be used (at least on the platforms I'm working with), but one has to resort to counting CPU cycles. This often involves writing delay loops in assembly, as changes in compiler optimization would mess up the timing. As a result, most bitbanging protocols with timing requirements below 1 µs are often implemented platform dependent. Writing and maintaining dozens of platform depended implementation is clearly not the best approach for RIOT.
The first use case of this API is a driver of the NeoPixel smart RGB LED, which I will create a separate PR for.
The Proposed Solution
This PR provides a platform independent API that combines the GPIO access with a delay loop in form of two functions and a helper function
void gpio_set_for(gpiot_t pin, int delay)
void gpio_clear_for(gpio_t pin, int delay)
int gpio_abc_delay(uin16_t ns)
The helper function
gpio_abc_delay()
is used to pre-calculate the number of delay loop iterations required to generate the required duration ingpio_set_for()
andgpio_clear_for()
, depending on the function call overhead and the CPU cycles per delay loop for the specific platform. Thegpio_set_for()
andgpio_clear_for()
combine the GPIO access with the timing for two reasons: Firstly, this API is very convenient for writing bitbanging code. Secondly (and more important), this allows to measure the overhead of a call togpio_set_for()
/gpio_clear_for()
and take this overhead into account ingpio_abc_delay()
, as otherwise the required accuracy for timing is impossible. And lastly, a providing the delay function separately of the GPIO access would add the overhead of a second function call, which adds about roughly 0,3µs of overhead for STM32F1, STM32F3, and SAM3 according to a short test.Testing procedure
Issues/PRs references