Skip to content

runtime performance #11

@mratsim

Description

@mratsim
  • I don't think there much of a speed difference between D and Nim when GC'ed types are not involved for general code.

  • One perf gotcha is that by default Nim seq and strings have value semantics and assignment will deep copy.

I think performance all come down to data structure + programmer familiarity with the language + time spent.

Now there are domain specific considerations that can make a huge difference, and most Nim library authors publish extensive benchmarks of their solutions compared to their mainstream alternative

Generic need in the wild, parsing files

  • There is the Faster Command Line Tool in <insert language> benchmark that was started by the D community, Nim also replicated it, TL;DR D and Nim had the same speed and same compilation time. To be honest the fastest CSV parser I used (to parse GBs of machine learning datasets) is XSV in Rust.

Domain specific

Http server:

Functional programming

  • Zero_functional is currently number 1 or 2 against 9 other langs. The other number 2 or 1 lang being Rust. Zero_functional fuses loop at compile-time when chaining zip.map.filter.reduce functional constructs.

Numerical/scientific computing

This is my domain so I know much more about it.

  • D has the advantage of having access to register size and L1 cache or L2 cache size at compile-time when using LDC, this is important for truly generic code.

  • D does not have access to restrict and builtin_assume_aligned which is necessary to reach Fortran speed when operating on arrays and tensors.

  • D cannot disable (?) the GC at specific point.

Open questions

  • Does D has an alternative to closures that can inline proc passed to higher order functions like map?

  • Can D arrays be parametrized with compile-time proc? For example, for efficient parallel reduction you need to create an intermediate array of N elements (N your number of cores), it should be padded so that the elements do not sit in the same cache line (64B on all CPU) to avoid false sharing/cache invalidation. For a type T I need something like this var results{.align64, noInit.}: array[min(T.sizeof, OPENMP_NB_THREADS * maxItemsPerCacheLine), T]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions