Skip to content

Conversation

dylandreimerink
Copy link
Member

@dylandreimerink dylandreimerink commented Apr 13, 2025

As it stands, we only had proper handling of atomic add operations. lock *(u32 *)(r1 + 0x1) += w2 was the only instruction that was properly handled. Atomic operations have the same opcode, but and are differentiated by the imm value. So far we have not been looking at the imm value, so all atomic operations look like atomic adds to us.

This commit adds decoding for all current atomic operations. We handle them similarly to how we handle ISAv4 instructions which use the offset to further specify the instruction.

In #1193 we expanded the opcode from a u8 to u16, which is bigger than the actual size. This allowed us to still represent functionally different opcodes in Go, even tough the kernel uses other bits of the instruction. So our opcode in Go is not identical to the opcode in the kernel. We translate during marshalling and unmarshaling.

So far, we have only needed a few additional bits, but the atomic ops need 9 bits of imm to fully encode all possibilities. Since 9 + 8 > 16 we have to grow the opcode to 32 bits.

During unmarshaling, we simply take the lower 9 bits of the imm shift it left by 16 bits and or it with the opcode. During marshaling this process is reversed.

Fixes: #1750

@dylandreimerink dylandreimerink marked this pull request as ready for review April 14, 2025 08:53
@dylandreimerink dylandreimerink requested a review from a team as a code owner April 14, 2025 08:53
@dylandreimerink dylandreimerink force-pushed the feature/load-aquire-store-release branch from acfcf12 to c253f9c Compare April 16, 2025 10:12
@lmb
Copy link
Collaborator

lmb commented Apr 16, 2025

So for signed division we extended the opcode space to be 16 bits. Does it make sense to use some of that space to encode LoadAcquire / StoreRelease as actual opcodes as well? I'm a bit torn because of that whole scheme you pointed out. We might end up with 2^5 additional opcodes in the worst case? I'm guessing that not all combinations are valid though.

If we stick to the current approach, can we unexport AtomicFlags for now? That way we can punt on how they are encoded / whether we need further flags.

@dylandreimerink
Copy link
Member Author

So for signed division we extended the opcode space to be 16 bits. Does it make sense to use some of that space to encode LoadAcquire / StoreRelease as actual opcodes as well?

Yes, I had forgotten we did that. I guess that makes more sense so we can avoid the special caseing in the string generation, which was the reason for doing this in the first place. I will give this a shot.

I'm a bit torn because of that whole scheme you pointed out. We might end up with 2^5 additional opcodes in the worst case?

In theory. It seems that in practice there are 6 memory orderings commonly defined in C/CPP, so with a bit of luck we will never come close to 2^5 additional ops.

What personally bothers me is that using odd schemes like offset and imm as part of the opcode is becoming more common. And while instructions are mostly still constant width, the encoding is moving away from the original and becoming less consistent.

@dylandreimerink dylandreimerink force-pushed the feature/load-aquire-store-release branch 2 times, most recently from 8e6b628 to 6cb882d Compare April 17, 2025 14:11
@dylandreimerink dylandreimerink changed the title asm: Add Load-Acquire and Store-Release instructions asm: Add handling for atomic operations Apr 17, 2025
@dylandreimerink dylandreimerink requested a review from lmb April 17, 2025 14:13
@dylandreimerink dylandreimerink force-pushed the feature/load-aquire-store-release branch from 6cb882d to 81d04eb Compare April 22, 2025 09:33
@dylandreimerink dylandreimerink force-pushed the feature/load-aquire-store-release branch 2 times, most recently from e213316 to 4da3570 Compare April 24, 2025 12:30
@lmb
Copy link
Collaborator

lmb commented Apr 24, 2025

@dylandreimerink I pushed a commit which cuts down the repetition a bit. IIRC I added the free functions for Load, etc. because there wasn't that much variation. That isn't the case with the atomic ops so using a method on the type seems nicer?

asm.AddAtomic.Mem()
asm.AddAtomic.FetchMem()

read okay to me, but I'm not 100% sure they are really correct. LoadAcquire and StoreRelease are also annoying.

@dylandreimerink
Copy link
Member Author

I pushed a commit which cuts down the repetition a bit. IIRC I added the free functions for Load, etc. because there wasn't that much variation. That isn't the case with the atomic ops so using a method on the type seems nicer?

Ah, I think this is better. I had all of the different ops spelled out for the string generation. But that is something we can fix by checking for the Fetch bit manually.

read okay to me, but I'm not 100% sure they are really correct. LoadAcquire and StoreRelease are also annoying.

This works for the copy-modify-write style of atomics. We need to add a guard for XCHG and CMPXCHG since they only have Fetch variants.

The LoadAcquire and StoreRelease are essentially a different class of atomic op. So we could given them dedicated functions, or only allow them to be used with Mem since they never have the Fetch bit set.

It technically works, its just that from a UX perspective they don't fit neatly in the *(size *)(dst + offset) (op) src description, they really are:

  • Load - dst = lock-acquire *(size *)(src + offset)
  • Store - lock-release *(size *)(dst + offset) = src

As it stands, we only had proper handling of atomic add operations.
`lock *(u32 *)(r1 + 0x1) += w2` was the only instruction that was
properly handled. Atomic operations have the same opcode, but
and are differentiated by the imm value. So far we have not been looking
at the imm value, so all atomic operations look like atomic adds to us.

This commit adds decoding for all current atomic operations. We handle
them similarly to how we handle ISAv4 instructions which use the
offset to further specify the instruction.

In #1193 we expanded the opcode from a u8 to u16, which is bigger than
the actual size. This allowed us to still represent functionally
different opcodes in Go, even tough the kernel uses other bits of the
instruction. So our opcode in Go is not identical to the opcode in the
kernel. We translate during marshalling and unmarshaling.

So far, we have only needed a few additional bits, but the atomic ops
need 9 bits of imm to fully encode all possibilities. Since 9 + 8 > 16
we have to grow the opcode to 32 bits.

During unmarshaling, we simply take the lower 9 bits of the imm shift
it left by 16 bits and or it with the opcode. During marshaling this
process is reversed.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
@dylandreimerink dylandreimerink force-pushed the feature/load-aquire-store-release branch from 283ef55 to 8669dd4 Compare April 29, 2025 12:47
@dylandreimerink dylandreimerink requested a review from lmb April 29, 2025 13:28
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
@dylandreimerink
Copy link
Member Author

I am happy this this if you are. I say squash+merge 🚀

@lmb lmb merged commit f086705 into main Apr 30, 2025
17 checks passed
@lmb lmb deleted the feature/load-aquire-store-release branch April 30, 2025 12:20
@lmb
Copy link
Collaborator

lmb commented Apr 30, 2025

Thanks for iterating with me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update asm package with load-aquire and store-release instructions
3 participants