-
Notifications
You must be signed in to change notification settings - Fork 24
Add support for defining builtin host functions at compile-time #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks for writing this up -- I think this will be a really powerful ability once we have it, and will be extremely important for certain kinds of applications! When I was originally mulling over this design space for the zero-copy buffer use-case, I had been imagining something like a raw CLIF interface, but I agree that that's got a lot of downsides and is pretty much fully subsumed by the other options. I'm happy to see we're moving toward the "just define the logic in Wasm" idea (and also not the "special sublanguage" idea, though I liked that when we talked about it too) -- this cleans up a lot of duplication. I think my input here comes in two major lines:
Preliminarily: yes 100% to
and also to the more subjective "let's not encourage people to use a nonstandard extension" (and to clarify to any readers: yes that concern is still in scope even if we are literally standards-compliant by presenting function imports, because a sufficiently tempting set of function imports can become a de-facto standard). However I find myself wondering whether we shouldn't do something a little more general and provide
I'll call this the "privileged adapter module" approach. On the "negative motivation" side first (against special Wasm subset for "self-hosted Wasm"): defining restrictions on which subset of Wasm modules is acceptable for a compile-time-builtin module could be seen as restricting/subsetting the standard somewhat. I can absolutely see the logic for it (as the RFC says, discouraging general use, but also in particular: it's simpler if we don't have a VMContext for the special module that is separate from the module that is using it) but isn't it also defining a "Wasm-prime" in the other direction? On the "positive side" now (for fully general Wasm for "self-hosted Wasm"): the spirit of virtualization and precedent elsewhere (e.g., WASI virtualization, and also places that we talk about adapter modules, such as the debug RFC) suggests to me at least that there isn't too much danger in defining a more privileged interface (here: "load/store host memory") and then allowing any general Wasm module to be an adapter module that provides a higher-level "safe" service on top of it. (In fact we know of folks doing this in production with components.) It is still standard Wasm -- it just has a particular API available to it, and one that the embedder must enable for a specific module. If someone wanted to implement this "peek/poke API" today as ordinary host functions, they could; we are saying that we recognize the need for it because we want to move more logic into Wasm for inlinability. Philosophically, I think the notion that we can never have privileged interfaces imported into a Wasm module (because someone might write Wasm that always requires the privileged interfaces and misuse the specialized environment as a general environment) sits less well with me: it says that Wasm is somehow not universal, and can't be used to implement some parts of the system. Said another way: "root privilege" (arbitrary load/store) then virtualization is more or less directly aligned, I think, with how capability systems are supposed to work. The idea is that the danger is in plumbing the wrong capabilities to the wrong modules, and safety requires us to put the right access-filtering or -subsetting modules inline with powerful capabilities. But this is already true, and we already trust our embedders to "wire things up right" because one can write arbitrary hostcall implementations or grant the wrong pre-opens or whatnot. Right now in Wasmtime I think we haven't seen this situation much because we have host-native filtering of most of the privileges we grant (e.g. WASI APIs) but I think there's nothing fundamental about that. And finally, if we restrict ourselves to (i) these load/store intrinsics, provided when configured to a privileged adapter module, and (ii) early binding of this adapter module in a way that enables inlining, then all of the open design questions are addressed, as far as I can tell:
Anyway -- all strong opinions, relatively weakly held -- happy to discuss further! |
I don't have a strong opinion on whether we need two-level inlining (intrinsics -> self-hosted -> component) or just a single level (intrinsics -> component) — @cfallin is already saying some of the things I was thinking. But I do want to point out how much inlining we're doing and make a plug for making that easier. In either case, we want to inline some CLIF instructions for each intrinsic, right? I'm not sure I caught where the intrinsics were to be specified (are they proposed additions to the component model?) but, in any case, I was imagining there would be some compiler code that converted a call to these special imports into some CLIF, like we currently do for CM built-ins and trampolines (?). And this is what I was hoping could become easier: I know you were kind of discarding the first idea, "exposing CLIF", but it seems helpful for this kind of problem: we tell the compiler "here are the CLIF instructions for calls to this import". I understood from your RFC the danger of misuse, so perhaps it should not be a public, embedder-accessible API, but just having an easy way to inline intrinsics could make it easier to pursue the self-hosted functions? |
Yes, there will be two kinds of inlining:
|
This allows you to map some functions, described by the given `InstructionMapper`, over each of the entitities in an instruction, producing a new `InstructionData`. I intend to use this as part of an inliner API for Cranelift that I am developing as part of prototyping [Wasmtime's compile-time builtins](bytecodealliance/rfcs#43).
* Cranelift: Generate an `InstructionData::map` method This allows you to map some functions, described by the given `InstructionMapper`, over each of the entitities in an instruction, producing a new `InstructionData`. I intend to use this as part of an inliner API for Cranelift that I am developing as part of prototyping [Wasmtime's compile-time builtins](bytecodealliance/rfcs#43). * cargo fmt * fix clippy
Heads up: I've updated this RFC with a more-concrete proposal after the discussion in this issue and at various Cranelift and Wasmtime meetings. I've also implemented general function inlining for Wasmtime in bytecodealliance/wasmtime#11283. Right now, that is useful for inlining calls between components (including their generated adapters) but with compile-time builtins can also be reused to inline the definitions of compile-time builtins into their callers. And we shouldn't really have to do anything special to get that inlining, it should just happen For Free when inlining is enabled. (I do expect we will need to tweak the inlining heuristics as time goes on, but that is a separate discussion.) I think there is just one last open question we need to resolve before moving forward: exactly how to implement the Please take a look at the updated RFC and let me know what you think and any ideas you have for that last open question! |
array indirection: index into the array of resource tables, then index into | ||
the array of table elements? | ||
|
||
Anyone have any other ideas? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use LLVM intrinsics as inspiration and have the import's name tell you which type and resources to use, e.g. have resource.address.t1
for a resource in table 1
or something.
Okay so I actually had flawed assumptions with the way that resources work in the component model, and this nicely nullifies that last open question. The resource-definer gets to use an arbitrary So given all that, the Will update the RFC shortly. |
RFC updated accordingly. I think that resolves all open questions for this RFC. I will give it a little bit of time for some more discussion before officially starting the motion to merge, just to give people an extra chance to read the updated RFC first and provide feedback. |
invalid resources and out-of-bounds resource table accesses. | ||
The final intrinsic, `store.data_address`, gives a `*mut T` pointer containing | ||
the address of the `T` host data in the `wasmtime::Store<T>`. As long as | ||
embedders ensure that their `T` type is `#[repr(C)]`, then their compile-time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you also have to watch out for primitive type alignment changing offsets, e.g. in #[repr(C)] struct S(u8, f64)
the f64
field is at offset 4
on i686-unknown-linux-gnu
but at offset 8
on x86_64-unknown-linux-gnu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true and I expect we will need to add some wording to our documentation for this feature around what exactly the safety conditions will need to be, such that the compile-time builtins can portably access the host data. My plan right now is to document these things in detail as I prototype and dog-food the feature.
Okay, I think this is ready and we've had a bit of time for folks to look it over, so let's officially start the process. Motion to FinalizeDisposition: Merge As always, details on the RFC process can be found here: https://github.com/bytecodealliance/rfcs/blob/main/accepted/rfc-process.md#making-a-decision-merge-or-close |
I second! (don't think I can re-approve) |
As there has been sign-off from representatives of two different BA stakeholder organizations, this RFC is now entering its 10-day Final Comment Periodand the last day to raise concerns before this RFC merges is 2025-08-16. Thanks everyone! |
Rendered