Add new ISAv4 instructions #1193

dylandreimerink · 2023-11-01T08:00:14Z

The proposal for a new version of the eBPF instruction set has been accepted and merged in both LLVM 18 and linux kernel v6.6. So we might be seeing these new instructions in the near future.

This PR adds support for these new instructions. Before this PR we could still parse the instructions but did not recognize the variations added:

myfunc:
         0: SwapLE dst: r1 imm: 16
         1: SwapLE dst: r2 imm: 32
         2: SwapLE dst: r3 imm: 64
         3: LdXMode(128)B 
         4: LdXMode(128)H 
         5: LdXMode(128)W 
         6: LdXMode(128)B 
         7: LdXMode(128)H 
         8: MovReg dst: r1 src: r4
         9: MovReg dst: r2 src: r5
        10: MovReg dst: r3 src: r6
        11: Mov32Reg dst: r1 src: r3
        12: Mov32Reg dst: r2 src: r4
        13: InvalidJumpOp32Imm dst: r0 off: 0 imm: 4
        14: DivReg dst: r1 src: r3
        15: ModReg dst: r2 src: r4
        16: Div32Reg dst: r1 src: r3
        17: Mod32Reg dst: r2 src: r4

After this PR we correctly understand the new instructions:

myfunc:
        0: BSwap16 dst: r1 
        1: BSwap32 dst: r2 
        2: BSwap64 dst: r3 
        3: LdXMemSXB dst: r1 src: r4 off: 0 imm: 0
        4: LdXMemSXH dst: r2 src: r5 off: 4 imm: 0
        5: LdXMemSXW dst: r3 src: r6 off: 8 imm: 0
        6: LdXMemSXB dst: r1 src: r4 off: 0 imm: 0
        7: LdXMemSXH dst: r2 src: r5 off: 4 imm: 0
        8: SMov8Reg dst: r1 src: r4
        9: SMov16Reg dst: r2 src: r5
        10: SMov32Reg dst: r3 src: r6
        11: SMov8Reg32 dst: r1 src: r3
        12: SMov16Reg32 dst: r2 src: r4
        13: Ja32 imm: 4
        14: SDivReg dst: r1 src: r3
        15: SModReg dst: r2 src: r4
        16: SDivReg32 dst: r1 src: r3
        17: SModReg32 dst: r2 src: r4

Fixes: #1188

lmb

Thanks for putting this together. I think that my main concern with this approach is that it introduces so many special cases. A lot of the getters on OpCode actually don't work correctly for extended opcodes, something we've not tackled with this at all (documentation or code wise).

So far OpCode is a 1:1 representation of the BPF wire format. This made sense since not all of the space was allocated. This is important for InvalidOpCode to function correctly. Now that pretty much all space is gone there is another option we can try: divorce the representation of OpCode from the wire format. I'm thinking something along these lines:

diff --git a/asm/opcode.go b/asm/opcode.go
index 845c5521..04d1d6cf 100644
--- a/asm/opcode.go
+++ b/asm/opcode.go
@@ -66,18 +66,33 @@ func (cls Class) isJumpOrALU() bool {
 	return cls.IsJump() || cls.IsALU()
 }
 
-// OpCode is a packed eBPF opcode.
+// OpCode represents a single operation.
 //
-// Its encoding is defined by a Class value:
+// It is not a 1:1 mapping to real eBPF opcodes.
 //
-//	msb      lsb
-//	+----+-+---+
-//	| ???? |CLS|
-//	+----+-+---+
-type OpCode uint8
+// The encoding varies based on a 3-bit Class:
+//
+//	7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+//	           ???           | CLS
+//
+// For ALUClass and ALUCLass32:
+//
+//	7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+//	          OPC          |S| CLS
+//
+// For LdClass, LdXclass, StClass and StXClass:
+//
+//	7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+//	       0       | MDE |SIZ| CLS
+//
+// For JumpClass, Jump32Class:
+//
+//	7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+//	       0       |  OPC  |S| CLS
+type OpCode uint16
 
 // InvalidOpCode is returned by setters on OpCode
-const InvalidOpCode OpCode = 0xff
+const InvalidOpCode OpCode = 0xffff
 
 // rawInstructions returns the number of BPF instructions required
 // to encode this opcode.

Basically, make OpCode wider so that we can represent all the logical opcodes. For simplicity we'd also change the various *Op to be uint16. We'd then handle converting to and from OpCode in Instruction.Marshal and Unmarshal, similar to what we do for dword loads. This way ExtendedALUOp is not required anymore and SMod / SDiv can be ALUOp. Finally we can also guarantee that InvalidOpCode really is invalid.

This is basically free at runtime since we don't increase the size of Instruction. I even think that we can arrange it so that existing users won't break (besides having to throw in a uint16 cast maybe).

What do you think?

asm/alu.go

asm/instruction.go

asm/jump.go

asm/opcode.go

cmd/test/main.go

asm/instruction.go

dylandreimerink · 2023-11-01T12:07:02Z

Basically, make OpCode wider so that we can represent all the logical opcodes. For simplicity we'd also change the various *Op to be uint16. We'd then handle converting to and from OpCode in Instruction.Marshal and Unmarshal, similar to what we do for dword loads. This way ExtendedALUOp is not required anymore and SMod / SDiv can be ALUOp. Finally we can also guarantee that InvalidOpCode really is invalid.

This is basically free at runtime since we don't increase the size of Instruction. I even think that we can arrange it so that existing users won't break (besides having to throw in a uint16 cast maybe).

What do you think?

Yes, that feels less hacky than what I have so far 👍

asm/instruction.go

asm/instruction_test.go

asm/jump.go

asm/alu.go

asm/instruction.go

ISAv4 added signed division and modulo instructions. These instructions have the same opcode as the unsigned division and modulo instructions, but with the offset set to 1. This messes with the way we have been defining instruction so far. To work around this we add a new `ExtendedALUOp` type with the SMDiv and SMod instructions. This type has the same methods as the normal `ALUOp` but they generate the signed instructions. The printing logic has been updated to take the offset into account. So it appears as symetric in most ways. Except the signed operations will still result in `ins.op.ALUOp() == Div/Mod` while you would use the `SDiv/SMod` to generate the correct instruction. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

This commit adds the BSwap instruction, it can be emited with the `BSwap` function similar to `ToHost`. The BSwap is generated by using the ALU64 class instread of the plain ALU class. The opcode and instruction string generation has been updated to recongnize this and output the correct instruction when printing. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

ISAv4 adds sign extended move instructions, both the 64 and 32 bit variants. This commit adds support for these instructions. These instructions take 2 registers and a size so they don't fit with the existing `Reg/Imm` methods. Additionally, they have the same opcode as their normal variants, just a non-zero offset. So just 2 functions were added to generate these instructions. The instructuion string generation has also been updated to prefix the mov instruction with `SX` if they are sign extended. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

ISAv4 adds a new long jump instruction, which can jump to offsets beteen [-2^31, 2^31 - 1] instread of the normal [-32768, 32767]. This new instruction is BPF_JMP32 + BPF_JA which was previously not allowed. It uses the IMM/Constant field to store the jump offset like calls do. This commit adds a function to emit the instruction, updates the string generation to regcognize the new instruction and changes the logic to patch offsets based on references to patch this instructio correctly. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

lmb

Two questions, otherwise LGTM.

asm/alu.go

asm/instruction.go

asm/instruction_test.go

ISAv4 adds a new instruction to load a memory value and sign extend it. To do so a new mode is used. This commit adds a new function to emit this instruction and adds the logic to print it nicely. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

lmb

Renamed LoadSXMemOp to LoadMemSXOp

lmb · 2023-11-08T10:16:48Z

Thanks!

lmb · 2023-11-08T10:20:37Z

I managed to force push an old version i think x( Fixes incoming.

As it stands, we only had proper handling of atomic add operations. `lock *(u32 *)(r1 + 0x1) += w2` was the only instruction that was properly handled. Atomic operations have the same opcode, but and are differentiated by the imm value. So far we have not been looking at the imm value, so all atomic operations look like atomic adds to us. This commit adds decoding for all current atomic operations. We handle them similarly to how we handle ISAv4 instructions which use the offset to further specify the instruction. In #1193 we expanded the opcode from a u8 to u16, which is bigger than the actual size. This allowed us to still represent functionally different opcodes in Go, even tough the kernel uses other bits of the instruction. So our opcode in Go is not identical to the opcode in the kernel. We translate during marshalling and unmarshaling. So far, we have only needed a few additional bits, but the atomic ops need 9 bits of imm to fully encode all possibilities. Since 9 + 8 > 16 we have to grow the opcode to 32 bits. During unmarshaling, we simply take the lower 9 bits of the imm shift it left by 16 bits and or it with the opcode. During marshaling this process is reversed. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

As it stands, we only had proper handling of atomic add operations. `lock *(u32 *)(r1 + 0x1) += w2` was the only instruction that was properly handled. Atomic operations have the same opcode, but and are differentiated by the imm value. So far we have not been looking at the imm value, so all atomic operations look like atomic adds to us. This commit adds decoding for all current atomic operations. We handle them similarly to how we handle ISAv4 instructions which use the offset to further specify the instruction. In #1193 we expanded the opcode from a u8 to u16, which is bigger than the actual size. This allowed us to still represent functionally different opcodes in Go, even tough the kernel uses other bits of the instruction. So our opcode in Go is not identical to the opcode in the kernel. We translate during marshalling and unmarshaling. So far, we have only needed a few additional bits, but the atomic ops need 9 bits of imm to fully encode all possibilities. Since 9 + 8 > 16 we have to grow the opcode to 32 bits. During unmarshaling, we simply take the lower 9 bits of the imm shift it left by 16 bits and or it with the opcode. During marshaling this process is reversed. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> Co-authored-by: Lorenz Bauer <lmb@isovalent.com>

dylandreimerink requested a review from lmb November 1, 2023 08:01

dylandreimerink force-pushed the feature/isa-v4 branch 2 times, most recently from 658f785 to 4944213 Compare November 1, 2023 08:11

lmb requested changes Nov 1, 2023

View reviewed changes

dylandreimerink force-pushed the feature/isa-v4 branch 3 times, most recently from f42bdd7 to a25a9bb Compare November 6, 2023 10:09

dylandreimerink requested a review from lmb November 6, 2023 10:16

lmb requested changes Nov 7, 2023

View reviewed changes

dylandreimerink added 4 commits November 7, 2023 16:44

dylandreimerink force-pushed the feature/isa-v4 branch from a25a9bb to 7753dde Compare November 7, 2023 15:49

dylandreimerink requested a review from lmb November 7, 2023 15:52

lmb reviewed Nov 7, 2023

View reviewed changes

asm/alu.go Show resolved Hide resolved

asm/instruction.go Show resolved Hide resolved

asm/instruction_test.go Show resolved Hide resolved

dylandreimerink force-pushed the feature/isa-v4 branch 2 times, most recently from cc972c1 to f625a97 Compare November 8, 2023 09:50

lmb force-pushed the feature/isa-v4 branch from f625a97 to dab17fa Compare November 8, 2023 10:04

lmb approved these changes Nov 8, 2023

View reviewed changes

lmb merged commit e7260f4 into cilium:main Nov 8, 2023

lmb mentioned this pull request Nov 8, 2023

btf: support CO-RE relocation of ISAv4 instructions #1213

Closed

dylandreimerink mentioned this pull request Apr 17, 2025

asm: Add handling for atomic operations #1751

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add new ISAv4 instructions #1193

Add new ISAv4 instructions #1193

Uh oh!

dylandreimerink commented Nov 1, 2023 •

edited

Loading

Uh oh!

lmb left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dylandreimerink commented Nov 1, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lmb left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lmb left a comment

Uh oh!

lmb commented Nov 8, 2023

Uh oh!

lmb commented Nov 8, 2023

Uh oh!

Uh oh!

Add new ISAv4 instructions #1193

Add new ISAv4 instructions #1193

Uh oh!

Conversation

dylandreimerink commented Nov 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lmb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dylandreimerink commented Nov 1, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lmb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lmb left a comment

Choose a reason for hiding this comment

Uh oh!

lmb commented Nov 8, 2023

Uh oh!

lmb commented Nov 8, 2023

Uh oh!

Uh oh!

dylandreimerink commented Nov 1, 2023 •

edited

Loading