Skip to content

drivers/periph: Added gpio_abc (GPIO Advanced Bitbanging Capabilities) #12015

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

maribu
Copy link
Member

@maribu maribu commented Aug 15, 2019

Contribution description

The Problem

The platform independent GPIO API and the xtimer API are very convenient, but does not provide the features required for implementing bitbanging protocols with tight timing requirements (less than 0.5µs). In fact, for this resolution hardware timers cannot be used (at least on the platforms I'm working with), but one has to resort to counting CPU cycles. This often involves writing delay loops in assembly, as changes in compiler optimization would mess up the timing. As a result, most bitbanging protocols with timing requirements below 1 µs are often implemented platform dependent. Writing and maintaining dozens of platform depended implementation is clearly not the best approach for RIOT.

The first use case of this API is a driver of the NeoPixel smart RGB LED, which I will create a separate PR for.

The Proposed Solution

This PR provides a platform independent API that combines the GPIO access with a delay loop in form of two functions and a helper function

  1. void gpio_set_for(gpiot_t pin, int delay)
  2. void gpio_clear_for(gpio_t pin, int delay)
  3. int gpio_abc_delay(uin16_t ns)

The helper function gpio_abc_delay() is used to pre-calculate the number of delay loop iterations required to generate the required duration in gpio_set_for() and gpio_clear_for(), depending on the function call overhead and the CPU cycles per delay loop for the specific platform. The gpio_set_for() and gpio_clear_for() combine the GPIO access with the timing for two reasons: Firstly, this API is very convenient for writing bitbanging code. Secondly (and more important), this allows to measure the overhead of a call to gpio_set_for() / gpio_clear_for() and take this overhead into account in gpio_abc_delay(), as otherwise the required accuracy for timing is impossible. And lastly, a providing the delay function separately of the GPIO access would add the overhead of a second function call, which adds about roughly 0,3µs of overhead for STM32F1, STM32F3, and SAM3 according to a short test.

Testing procedure

  1. Check the documentation
  2. Check out a fellow up PR containing an actual implementation of this API and run the test application. (You'll need a scope or a logic analyzer with a sample rate of at least 20 MHz for any meaningful testing.)

Issues/PRs references

@maribu
Copy link
Member Author

maribu commented Aug 15, 2019

Hmm, out of habit I seem to have put the OVGU in the copyright here and there, but this is actually a spare time project. I'll check and fix this on the weekend

@maribu maribu changed the title drivers/periph: Added gpio_abc (GPIO Advances Bitbanging Capabilities) drivers/periph: Added gpio_abc (GPIO Advanced Bitbanging Capabilities) Aug 15, 2019
@kaspar030
Copy link
Contributor

In fact, for this resolution hardware timers cannot be used (at least on the platforms I'm working with),

I did this with the usual API once, by setting one timer to do PWM on some unconected output pin with a fixed frequency. Then I used that frequency pin to get delays:

{
  while (FREQ_PIN); 
  gpio_set(DATA_PIN);
  while(!FREQ_PIN);
  gpio_clear(DATA_PIN)
static void delay(ticks) {
  while(ticks) {
    while (FREQ_PIN);
    while(!FREQ_PIN);
  }
}

This worked well for bitbanging at ~2MHz IIRC, on an STM32F401 running at 80MHz.

@maribu
Copy link
Member Author

maribu commented Aug 16, 2019

In fact, for this resolution hardware timers cannot be used (at least on the platforms I'm working with),

I did this with the usual API once, by setting one timer to do PWM on some unconected output pin with a fixed frequency. Then I used that frequency pin to get delays:

{
  while (FREQ_PIN); 
  gpio_set(DATA_PIN);
  while(!FREQ_PIN);
  gpio_clear(DATA_PIN)
static void delay(ticks) {
  while(ticks) {
    while (FREQ_PIN);
    while(!FREQ_PIN);
  }
}

This worked well for bitbanging at ~2MHz IIRC, on an STM32F401 running at 80MHz.

This does not provide the required accuracy for my use case (driving NeoPixel LEDs), as the overhead of a call to gpio_set() and gpio_clear() is already bigger than the tolerance for the timing for every board I tested (BluePill, Nucleo-F303RE, Nucleo-F446RE, Arduino Due). The tolerance of the NeoPixel is +-150ns when only supporting one hardware type, but +-100ns when supporting both with the same code. (The required pulse length of both types are slightly different, but close enough to take the average and remain within the tolerance of both.)

By implementing the API proposed in this PR on a STM32F103C8 (@ 76MHz) I get a worst case accuracy of +/- 55ns - so well within the tolerance. So I assume this approach to work provide the required timing for driving NeoPixels on most ARM platforms.

@maribu
Copy link
Member Author

maribu commented Aug 28, 2019

@PeterKietzmann and @gschorcht: I saw that you have provided a bit-banging SPI and I2C driver. I think the API of this PR could be easily extended to be useful for protocols with a clock and a data line (e.g. I2C) by adding this:

/**
 * @brief Read the GPIO after waiting for the specified delay
 * @return Returns `0` if pin is low, and `>0` if pin is high
 */
int gpio_read_at(gpio_t pin, int delay);

(The clock and data signals are never changed at the same time, as the data signal must already be in place when the edge of the clock comes. Therefore, no function to handle two pins simultaneously is needed here.)

And the following extension should be enough for a full-duplex bit-banging solution (e.g. SPI):

/**
 * @brief Sets the pin in @p out_pin, waits for @p delay and then reads the pin @p in_pin
 * @return Returns `0` if pin is low, and `>0` if pin is high
 */
int gpio_set_and_read_at(gpio_t out_pin, gpio_t in_pin, int delay);


/**
 * @brief Clears the pin in @p out_pin, waits for @p delay and then reads the pin @p in_pin
 * @return Returns `0` if pin is low, and `>0` if pin is high
 */
int gpio_clear_and_read_at(gpio_t out_pin, gpio_t in_pin, int delay);

For SPI the MOSI pin should be set/clear prior to the clock edge, then the clock edge, and after the clock edge the MISO pin should be sampled. The above functions could be used to provide the clock edge followed by the sampling. (During testing I experienced that the function call adds the biggest overhead in time. For that reason using gpio_set() followed by gpio_read_at() would not work, as the overhead of the gpio_set() would noticeably stretch the clock signal.)

Would you be interested in having the API extended to be usable for those cases? If so, I'd happily do this.

@kaspar030
Copy link
Contributor

This does not provide the required accuracy for my use case (driving NeoPixel LEDs), as the overhead of a call to gpio_set() and gpio_clear() is already bigger than the tolerance for the timing for every board I tested (BluePill, Nucleo-F303RE, Nucleo-F446RE, Arduino Due).

Did you try "LTO=1"?

@kaspar030
Copy link
Contributor

By implementing the API proposed in this PR on a STM32F103C8 (@ 76MHz) I get a worst case accuracy of +/- 55ns - so well within the tolerance. So I assume this approach to work provide the required timing for driving NeoPixels on most ARM platforms.

That's pretty good. I'll try to measure how the PWM pin accuracy ended up being.
When I implemented that, I thought that there's a way to provide a timing base that can be implemented per-platform (PWM works on stm32, maybe a fast timer plus masking works for other platforms), and that all the gpio_* could be re-used. To me, using hard-coded delays seemed prone to skipping clock flanks.

@maribu
Copy link
Member Author

maribu commented Aug 28, 2019

This does not provide the required accuracy for my use case (driving NeoPixel LEDs), as the overhead of a call to gpio_set() and gpio_clear() is already bigger than the tolerance for the timing for every board I tested (BluePill, Nucleo-F303RE, Nucleo-F446RE, Arduino Due).

Did you try "LTO=1"?

00000000 <gpio_set>:
   0:	2301      	movs	r3, #1
   2:	f020 020f 	bic.w	r2, r0, #15
   6:	f000 000f 	and.w	r0, r0, #15
   a:	fa03 f000 	lsl.w	r0, r3, r0
   e:	6110      	str	r0, [r2, #16]
  10:	4770      	bx	lr

I didn't. If the compiler inlines that function, I'd expect the overhead to go down from e.g. 333ns (if I remember correctly) on the BluePill to something like 70ns (5 CPU cycles, see disassembly above). But we also need to take the gpio_read() into account, which likely adds about another 5 cycles. My wild guess would be that the resolution would be about 140ns if LTO is enabled and the compiler does indeed inline the function. (That would be an accuracy of +-70ns, so good enough for my use case, but not as good as counting CPU cycles. On SAM3 CPUs the accuracy btw. is +-18ms, as the CPU only takes 3 cycles per delay loop. That accuracy will be hard to reach with PWM.)

But having LTO on is no guarantee that a compiler indeed inlines specific functions. Maybe choosing not to inline results in a smaller binary and the compiler's policy is to prefer that over speed.

@maribu
Copy link
Member Author

maribu commented Aug 28, 2019

I just noticed that the calculation of the accuracy was incorrect. I updated it. The correct accuracy for the BluePill is +-49ns.

@chrysn
Copy link
Member

chrysn commented Oct 1, 2019

Can the ABC be ported to a new GPIO peripheral implementation without the help of an oscilloscope, for example by running long loops and comparing them to an established timer?

How much fluctuation do you expect from compiler version changes?

How does the time needed to decide the next bit (eg. if (next_byte & (1 << 6)) gpio_set_for(outpin, short) else gpio_set_for(outpin, long) vs if (next_bit()) …) factor in?

Did you consider busy-waiting for a high-speed timer like SystemCoreClk on fixed times? (I'm thinking along the lines of gpio_set_at(pin, time); if (next_bit()) time += short; else time += long; gpio_clear_at(pin, time);).

@maribu
Copy link
Member Author

maribu commented Oct 1, 2019

Can the ABC be ported to a new GPIO peripheral implementation without the help of an oscilloscope, for example by running long loops and comparing them to an established timer?

Likely yes. Most likely this approach could be used to automatically obtain the two parameters using in the ABC implementations. I can provide an implementation for that.

On the other hand cheap logic analyzers (10€ to 20€) already have a sample rate of 24MHz, which is enough to measure the parameters. If I had to add another GPIO ABC implementation, I would likely still use the logical analyzer to verify the automatically obtained values from the long loop timings.

How much fluctuation do you expect from compiler version changes?

In the delay loop: None, as there inline assembly is used (for just this reason). This is also the point which matters most, as fluctuation would accumulate with each loop iteration.

In the overhead of a function call: Likely little to none, unless LTO is used. The actual setting or clearing of the GPIO pin before the delay loop is implemented in C. But that is currently implemented for all platforms I'm aware of by generating a bitmask and binary-or (in case of setting) or binary-and (in case of clearing) it into a memory location marked as volatile. And the C code maps pretty well to the native instructions of various platforms, as only low level bit manipulation (left shift, binary and, binary or, binary negation) is used in the code. So I would be very surprised to see more than two CPU cycles difference in the generated code, which would map to an error of 40ns on a 50MHz CPU.

Did you consider busy-waiting for a high-speed timer like SystemCoreClk on fixed times?

This could be used when a high speed hardware timer with the required resolution is available. On the hardware I have in mind (BluePill, STM32F103) the timer could actually run with the CPU frequency (72MHz, ~14ns per tick), which would work quite well. But I think it would be a bad idea to access the hardware directly (bypassing xtimer), as I can image all sorts of mischief to happen. And xtimer always has an internal resolution of 1µs, which is not good enough.

The approach of counting CPU cycles is very portable (even though the implementation is not), as no specific hardware requirements such as high resolution timers are required. This is why I decided to go for that.

But keep in mind: The user facing API of this PR is not bound in any way to one implementation. This PR adds some code useful for implementations that do count CPU cycles, but this is an internal implementation detail. I see no reason to not implement GPIO ABC using high resolution timers, if that implementation takes care to not break xtimer (or ztimer).

@chrysn
Copy link
Member

chrysn commented Oct 2, 2019

Thanks for your detailed response, that does answer most of my questions (also I remembered that platforms like ATmega don't have a CPU cycle counting timer). What I still don't understand is the open one:

How does the time needed to decide the next bit (eg. if (next_byte & (1 << 6)) gpio_set_for(outpin, short) else gpio_set_for(outpin, long) vs if (next_bit()) …) factor in?

In other words, how is the time the code between the the abc calls compensated for? Is there an implicit assumption that it will always be a few clock cycles (get value from RAM, shift and bitmask, decide on set/clear or duration) that's adjusted for in the overhead cycles, or is an estimate provided by the user?

@maribu
Copy link
Member Author

maribu commented Oct 2, 2019

Is there an implicit assumption that it will always be a few clock cycles (get value from RAM, shift and bitmask, decide on set/clear or duration) that's adjusted for in the overhead cycles

Exactly. The assumption is that no computation requiring more than 150ns of CPU cycles (~10 on a 72MHz BluePill) is done in advance, so that between calls to gpio_set_for() and gpio_clear_for() belonging to the same transaction almost no CPU time is required.

or is an estimate provided by the user?

I think that this would be painful to use, as end users of that API would have trouble to figure this out. Also, this would depend on a lot of factors. Being asked to do as little as possible between calls to gpio_set_for() and gpio_clear_for() is something I expect most users can work and live with.

I'll check the API doc if this assumption is clearly communicated and extend, if not.

@maribu
Copy link
Member Author

maribu commented Oct 4, 2019

I'm doing a rebase (no squashing, no changes expect fixing merge conflicts) to resolve the merge conflict with Makefile.dep.

@maribu
Copy link
Member Author

maribu commented Oct 7, 2019

@chrysn:

I updated the API and documentation to be more useful for implementations not doing CPU cycles counting:

  1. A GPIO_ABC_CLOCK can be provided, if it is not identical with the CPU clock
  2. The names and documentation have been more generalized
  3. The functions gpio_abc_begin() and gpio_abc_end() have been added (by default empty inline functions), so that some setup and tear down logic can be supplied if needed.

I also updated the test to automatically detect the parameters with the shell command detect. Provided that the implementations of gpio_set_for() and gpio_clear_for() are correct, this works like a charm. Porting GPIO ABC to new hardware should be a walk in the park with that, at least on ARM boards (in which the _delay() inline function of the STM32F3 implementation could be just copy-pasted).

The STM32F3 implementation has been rebased on top of this.

The submodule periph_gpio_abc extends the GPIO driver by Advanced Bitbanging
Capabilities (ABC). It provides means to control GPIOs with a precision of
0.15µs or better.
@stale
Copy link

stale bot commented Jun 7, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Jun 7, 2020
@maribu
Copy link
Member Author

maribu commented Jul 3, 2020

Not stale

@stale stale bot removed the State: stale State: The issue / PR has no activity for >185 days label Jul 3, 2020
# Enable periph_gpio when periph_gpio_irq is enabled
ifneq (,$(filter periph_gpio_irq,$(USEMODULE)))
# Add periph_gpio when one or more of its submodules is used
ifneq (,$(filter periph_gpio_%,$(USEMODULE)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, the module periph_gpio_ext would then also activate periph_gpio, which should not necessarily be the case. Theoretically it would be possible to extend a board with an I2C-GPIO-Extender which does not have the periph_gpio feature but has the periph_i2c feature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the support for external GPIOs is merged, I will rebase and update.

I'm not sure if the API will be implemented by external GPIOs as well, as strict timing requirements do not match well with a serially attached GPIO extender. (Especially with an I2C extender sharing the bus with a notorious clock stretcher.)

Maybe this API should be periph only. Slow bit banging can also be done quite well with xtimer/ztimer and the regular GPIO API.

@stale
Copy link

stale bot commented Mar 19, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Mar 19, 2021
@chrysn
Copy link
Member

chrysn commented Mar 20, 2021

I just gave this a try with nrf5x. A few merge conflicts have accumulated, but even with crude resolution it worked. The test application was useful for getting the values right which I just took from STM for a first iteration; after filling in what the ones the app recommends, my WS2812 (compatible; SK6812 RGBAW) showed neat colors, and I didn't need to involve either good understanding of the involved clocks nor an oscilloscope for a working port.

On the high level view I'm not convinced that GPIO ABC as it is here is the best abstraction level, but it is useful, and if a better pattern emerges (I'm hoping for "describe the pulse pattern, and let the tools figure out whether a periph can do that w/o CPU involvement or a high-prio thread is needed), it can still fall back to this.

Could you resolve Gunar's comment by avoiding build-system interactions with #14602 (I agree this shoild not be expected to work with non-native GPIO), and rebase to master for a more low-altitide round of review? (And poke me if I fail to deliver that in a timely fashion)?

[edit: 14602 reference had a typo]

@stale stale bot removed the State: stale State: The issue / PR has no activity for >185 days label Mar 20, 2021
@MrKevinWeiss MrKevinWeiss added this to the Release 2021.07 milestone Jun 21, 2021
@MrKevinWeiss MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021
@chrysn
Copy link
Member

chrysn commented Aug 25, 2021

Having been the only one to comment to keep this from staling, I'm giving this back its "stale" label, in hopes that the work in #14602 (or the successor announced in the latest comments) would make the need for this concrete solution go away.

@chrysn chrysn added the State: stale State: The issue / PR has no activity for >185 days label Aug 25, 2021
@stale stale bot closed this Oct 2, 2021
@maribu maribu deleted the gpio_abc branch January 21, 2023 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: cpu Area: CPU/MCU ports Area: drivers Area: Device drivers State: stale State: The issue / PR has no activity for >185 days Type: new feature The issue requests / The PR implemements a new feature for RIOT
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants