-
Notifications
You must be signed in to change notification settings - Fork 14
Description
-
I don't think there much of a speed difference between D and Nim when GC'ed types are not involved for general code.
-
One perf gotcha is that by default Nim seq and strings have value semantics and assignment will deep copy.
I think performance all come down to data structure + programmer familiarity with the language + time spent.
Now there are domain specific considerations that can make a huge difference, and most Nim library authors publish extensive benchmarks of their solutions compared to their mainstream alternative
Generic need in the wild, parsing files
- There is the
Faster Command Line Tool in <insert language>
benchmark that was started by the D community, Nim also replicated it, TL;DR D and Nim had the same speed and same compilation time. To be honest the fastest CSV parser I used (to parse GBs of machine learning datasets) is XSV in Rust.
Domain specific
Http server:
- Mofuw by @2vg is faster than tokio-minihttp, the current Update README.md #1 on TechEmpower benchmark.
Functional programming
- Zero_functional is currently number 1 or 2 against 9 other langs. The other number 2 or 1 lang being Rust. Zero_functional fuses loop at compile-time when chaining zip.map.filter.reduce functional constructs.
Numerical/scientific computing
This is my domain so I know much more about it.
-
D has the advantage of having access to register size and L1 cache or L2 cache size at compile-time when using LDC, this is important for truly generic code.
-
D does not have access to restrict and builtin_assume_aligned which is necessary to reach Fortran speed when operating on arrays and tensors.
-
D cannot disable (?) the GC at specific point.
Open questions
-
Does D has an alternative to closures that can inline proc passed to higher order functions like map?
-
Can D arrays be parametrized with compile-time proc? For example, for efficient parallel reduction you need to create an intermediate array of N elements (N your number of cores), it should be padded so that the elements do not sit in the same cache line (64B on all CPU) to avoid false sharing/cache invalidation. For a type T I need something like this
var results{.align64, noInit.}: array[min(T.sizeof, OPENMP_NB_THREADS * maxItemsPerCacheLine), T]