-
Notifications
You must be signed in to change notification settings - Fork 656
syntastica:0.1.0 #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
syntastica:0.1.0 #143
Conversation
The hugeness in combination with the slowness is certainly not ideal. I assume there is no way to make this smaller somehow? Also, do you know which parts is causing the slowdown? Is it just that |
I haven't exactly investigated what exactly makes it this big, but I strongly assume it's mostly the accumulation of all the tree-sitter parsers and queries, and I don't think there is much to do against that. It may potentially be possible to have multiple packages for each supported language, plus one core package with the main logic, but that's much less ergonomic for users and I can't think of a way to implement it unless tree-sitter/tree-sitter#1864 finally gets done. As for the slow speed, by far the longest time is spent on compiling the queries, which is an issue with tree-sitter itself and one that I already investigated outside of this Typst plugin. I am mainly waiting for tree-sitter to support some kind of pre-compilation of queries, which is in the works and will drastically improve the startup time (see tree-sitter/tree-sitter#2374 (comment) and tree-sitter/tree-sitter#2594). Although, even without that, what takes 2 seconds running natively on my machine takes 2+ minutes running through Note that the README also mentions the poor performance and advises to only enable If you want to compare yourself, you can compile the |
I've thought a bit about this and also discussed it with a few people on Discord. We've come to the conclusion that, at least for the time being, this package is too big to merge into the official package repository. It's not a good user experience and will bloat the repository (especially if more versions with slightly different binaries would be added over time). Some limit needs exist and with the current size, the package would also be too big for crates.io or the VS Code marketplace (for comparison). We'll have to see down the road how the plugin situation evolves. For now, I would suggest adding the package to Awesome Typst so that people can download it and try it locally. |
I agree. I didn't really expect this to be merged, but thanks for considering. As you suggested, I opened a PR on awesome-typst qjcg/awesome-typst#104. |
And I did a quick speed comparison between
I assume the original reason The full Rust code I used (not optimized)use std::{error::Error, time::Instant};
const WASM: &[u8] = include_bytes!("../syntastica_typst.wasm");
const FUNC_NAME: &str = "highlight";
const ARGS: &[&[u8]] = &[b"fn main() {}", b"rust", b"one::dark"];
fn main() -> Result<(), Box<dyn Error>> {
let start = Instant::now();
run_wasmi::run()?;
println!("wasmi TOTAL: {:?}", start.elapsed());
let start = Instant::now();
run_wasmtime::run()?;
println!("wasmtime TOTAL: {:?}", start.elapsed());
Ok(())
}
mod run_wasmi {
use std::error::Error;
use wasmi::{AsContextMut, Caller, Engine, Linker, Module, Store};
use crate::{ARGS, FUNC_NAME, WASM};
#[derive(Default)]
struct StoreData {
args: Vec<Vec<u8>>,
output: Vec<u8>,
}
pub fn run() -> Result<(), Box<dyn Error>> {
let start = std::time::Instant::now();
let engine = Engine::default();
let module = Module::new(&engine, WASM)?;
let mut linker = Linker::new(&engine);
linker.func_wrap(
"typst_env",
"wasm_minimal_protocol_send_result_to_host",
wasm_minimal_protocol_send_result_to_host,
)?;
linker.func_wrap(
"typst_env",
"wasm_minimal_protocol_write_args_to_buffer",
wasm_minimal_protocol_write_args_to_buffer,
)?;
println!("wasmi load module: {:?}", start.elapsed());
let start = std::time::Instant::now();
let mut store = Store::new(&engine, StoreData::default());
let instance = linker
.instantiate(&mut store, &module)
.and_then(|pre_instance| pre_instance.start(&mut store))
.map_err(|e| format!("{e}"))?;
println!("wasmi instantiate: {:?}", start.elapsed());
// Ensure that the plugin exports its memory.
if !matches!(
instance.get_export(&store, "memory"),
Some(wasmi::Extern::Memory(_))
) {
Err("plugin does not export its memory")?;
}
// Find the function with the given name.
let func = instance
.get_export(&store, FUNC_NAME)
.and_then(|e| e.into_func())
.ok_or_else(|| format!("plugin does not contain a function called {FUNC_NAME}"))?;
// Collect the lengths of the argument buffers.
let lengths = ARGS
.iter()
.map(|a| wasmi::Value::I32(a.len() as i32))
.collect::<Vec<_>>();
// Store the input data.
store.data_mut().args = ARGS.iter().map(|arg| arg.to_vec()).collect();
// Call the function.
let start = std::time::Instant::now();
let mut code = wasmi::Value::I32(-1);
func.call(
store.as_context_mut(),
&lengths,
std::slice::from_mut(&mut code),
)
.map_err(|err| format!("plugin panicked: {err}"))?;
println!("wasmi func call: {:?}", start.elapsed());
let start = std::time::Instant::now();
// Extract the returned data.
let output = std::mem::take(&mut store.data_mut().output);
// Parse the functions return value.
match code {
wasmi::Value::I32(0) => {}
wasmi::Value::I32(1) => match std::str::from_utf8(&output) {
Ok(message) => Err(format!("plugin errored with: {message}"))?,
Err(_) => Err("plugin errored, but did not return a valid error message")?,
},
_ => Err("plugin did not respect the protocol")?,
};
println!("wasmi get output: {:?}", start.elapsed());
Ok(())
}
/// Write the arguments to the plugin function into the plugin's memory.
fn wasm_minimal_protocol_write_args_to_buffer(mut caller: Caller<StoreData>, ptr: u32) {
let memory = caller.get_export("memory").unwrap().into_memory().unwrap();
let arguments = std::mem::take(&mut caller.data_mut().args);
let mut offset = ptr as usize;
for arg in arguments {
memory.write(&mut caller, offset, arg.as_slice()).unwrap();
offset += arg.len();
}
}
/// Extracts the output of the plugin function from the plugin's memory.
fn wasm_minimal_protocol_send_result_to_host(
mut caller: Caller<StoreData>,
ptr: u32,
len: u32,
) {
let memory = caller.get_export("memory").unwrap().into_memory().unwrap();
let mut buffer = std::mem::take(&mut caller.data_mut().output);
buffer.resize(len as usize, 0);
memory.read(&caller, ptr as _, &mut buffer).unwrap();
caller.data_mut().output = buffer;
}
}
mod run_wasmtime {
use std::error::Error;
use wasmtime::{Caller, Engine, Linker, Module, Store};
use crate::{ARGS, FUNC_NAME, WASM};
#[derive(Default)]
struct StoreData {
args: Vec<Vec<u8>>,
output: Vec<u8>,
}
pub fn run() -> Result<(), Box<dyn Error>> {
let start = std::time::Instant::now();
let engine = Engine::default();
let module = Module::new(&engine, WASM)?;
let mut linker = Linker::new(&engine);
linker.func_wrap(
"typst_env",
"wasm_minimal_protocol_send_result_to_host",
wasm_minimal_protocol_send_result_to_host,
)?;
linker.func_wrap(
"typst_env",
"wasm_minimal_protocol_write_args_to_buffer",
wasm_minimal_protocol_write_args_to_buffer,
)?;
println!("wasmtime load module: {:?}", start.elapsed());
let start = std::time::Instant::now();
let mut store = Store::new(&engine, StoreData::default());
let instance = linker
.instantiate(&mut store, &module)
.map_err(|e| format!("{e}"))?;
println!("wasmtime instantiate: {:?}", start.elapsed());
// Ensure that the plugin exports its memory.
if !matches!(
instance.get_export(&mut store, "memory"),
Some(wasmtime::Extern::Memory(_))
) {
Err("plugin does not export its memory")?;
}
// Find the function with the given name.
let func = instance
.get_export(&mut store, FUNC_NAME)
.and_then(|e| e.into_func())
.ok_or_else(|| format!("plugin does not contain a function called {FUNC_NAME}"))?;
// Collect the lengths of the argument buffers.
let lengths = ARGS
.iter()
.map(|a| wasmtime::Val::I32(a.len() as i32))
.collect::<Vec<_>>();
// Store the input data.
store.data_mut().args = ARGS.iter().map(|arg| arg.to_vec()).collect();
// Call the function.
let start = std::time::Instant::now();
let mut code = wasmtime::Val::I32(-1);
func.call(&mut store, &lengths, std::slice::from_mut(&mut code))
.map_err(|err| format!("plugin panicked: {err}"))?;
println!("wasmtime func call: {:?}", start.elapsed());
let start = std::time::Instant::now();
// Extract the returned data.
let output = std::mem::take(&mut store.data_mut().output);
// Parse the functions return value.
match code {
wasmtime::Val::I32(0) => {}
wasmtime::Val::I32(1) => match std::str::from_utf8(&output) {
Ok(message) => Err(format!("plugin errored with: {message}"))?,
Err(_) => Err("plugin errored, but did not return a valid error message")?,
},
_ => Err("plugin did not respect the protocol")?,
};
println!("wasmtime get output: {:?}", start.elapsed());
Ok(())
}
/// Write the arguments to the plugin function into the plugin's memory.
fn wasm_minimal_protocol_write_args_to_buffer(mut caller: Caller<StoreData>, ptr: u32) {
let memory = caller.get_export("memory").unwrap().into_memory().unwrap();
let arguments = std::mem::take(&mut caller.data_mut().args);
let mut offset = ptr as usize;
for arg in arguments {
memory.write(&mut caller, offset, arg.as_slice()).unwrap();
offset += arg.len();
}
}
/// Extracts the output of the plugin function from the plugin's memory.
fn wasm_minimal_protocol_send_result_to_host(
mut caller: Caller<StoreData>,
ptr: u32,
len: u32,
) {
let memory = caller.get_export("memory").unwrap().into_memory().unwrap();
let mut buffer = std::mem::take(&mut caller.data_mut().output);
buffer.resize(len as usize, 0);
memory.read(&caller, ptr as _, &mut buffer).unwrap();
caller.data_mut().output = buffer;
}
} |
It was chosen for binary size, dependency count and simplicity. Maybe we do need to switch to wasmtime after all. (And to native WebAssembly modules in the web app, right now it also uses wasmi, but that's really only because we didn't have time to implement it properly.) |
I personally also had a good experience with using wasmer in the web by simply enabling its "js" feature, which uses the browser's native Wasm support |
Just in case someone is interested, I also inspected the binary size a bit further. Most of the 30MB indeed come from the various tree-sitter parsers with the following approximate distribution:
The hexdump parser was used as a baseline to remove common code from the sizes, which is why it shows as 0 bytes here. I will probably remove support for verilog, which will already reduce the final Wasm binary size to 13 MB. It's difficult to measure precisely, but also roughly 2 MB seem to come from just the tree-sitter core package, which is a bit surprising given the official web-tree-sitter Wasm release binary is only 182 KB 🤔 (and yes I did use both |
I am submitting
I have read and followed the submission guidelines and, in particular, I
typst.toml
file with all required keysREADME.md
with documentation for my packageLICENSE
file or linked one in myREADME.md
name:version
of the submitted packageDescription:
Syntax highlighting of code blocks using tree-sitter. The package makes use of the
syntastica
Rust project and the new Wasm plugins. It generally provides better results and supports more/other languages than the built-in syntect highlighting. Tree-sitter based highlighting already was requested by others (typst/typst#967), but declined for good reasons:Warning
This package is both slow and big. The included Wasm binary is currently 30+ MB in size, and compilation time goes up into LaTeX territories (having it run in Wasm doesn't help). I would understand if that causes this package to not be accepted here.