Build `Vec`s without repeated re-allocation and copies

## The problem with `Vec`s

Bumpalo packs data tightly and grows downwards. This is generally a good thing, but when building `Vec`s, it's a disadvantage because when a `Vec` needs to grow beyond capacity, it always reallocates and is copied.

For example, while parsing a program, the top level statements `Vec` will initially be capacity 4 (I think that's the default). When a 5th statement is found:

* `Vec<Statement>` grows to capacity 8.
* Space for `[Statement; 8]` is allocated into arena.
* 1st 4 elements are copied to the new location.
* 5th element is written after them.

When more than 8 statements are found, the process repeats again.

This causes 3 different problems:

1. Repeated data copying (slow).
2. Leaves unused space behind in arena, making data less tightly packed and less cache-friendly.
3. Where the `Vec`'s backing storage ends up being located is unpredictable - could be before the node data, or after, or in the middle.

This problem is most acute for the top-level `Vec<Statement>`, as programs can be many many statements.

## Possible solutions

### 1. Estimate number of statements from source text size

For top-level `Vec<Statement>`, make a guess at eventual size based on source text size and allocate a `Vec` with (hopefully) sufficient capacity.

If we guess correctly, this will reduce copying a lot. However, it's hard to guess accurately. A module which is entirely wrapped in an IIFE could be massive in terms of source text size, but is only a single statement. Webpack-bundled files will also have a small number of statements.

Over-allocation is probably not so bad. It will leave large chunks of memory unused in arena, but that empty memory is never accessed, so shouldn't affect caching. If it's a large stretch of memory, OS may not even allocate memory pages for it.

It's also hard to guess number of statements in functions ahead of time. We could probably take good guesses on size of `Vec`s for other things. e.g. Most functions don't have more than 4 arguments.

### 2. Small vec optimization

Use a "small vec" type for `Vec`s which often have only 1 member. There's a few good candidates for this:

* `Vec<Argument>` in `CallExpression` (`foo(bar)`)
* `Vec<Directive>` in `FunctionBody` (`function foo() { 'use strict'; }`)
* `Vec<AssignmentTargetProperty>` in `ObjectAssignmentTarget` (`const {foo} = bar;`)
* `Vec<ImportDeclarationSpecifier>` in `ImportDeclaration` (`import foo from 'bar';`, `import {foo} from 'bar';`)
* `Vec<Expression>` in `ImportExpression` (`import('./foo.js')`)

If we could reduce size of `enum`s to 8 bytes with pointer compression/pointer tagging, could fit 2 elements inline, which would make it suitable for a few more types. But regardless, many types wouldn't be suitable, and it still it doesn't help with the biggest problem `Vec<Statement>`.

### 3. `Vec<Box<T>>`

For `Vec`s where the content is not already an enum with boxed variants:

Box the `Vec`'s elements. So when `Vec` grows, you're just copying `Box<T>`s rather than `T`s. In some cases `T` is quite large e.g. `FormalParameter` is 80 bytes, whereas `Box<FormalParameter>` is 8 bytes.

Trade-off of read speed vs write speed:

* Faster to push/insert/delete from `Vec`s.
* Slower to iterate over their contents due to greater indirection.

### 4. Replace `Vec` with chunked linked list

[Chunked linked list](https://en.wikipedia.org/wiki/Unrolled_linked_list)

As list grows, chunks would be added to end of linked list in sizes of increasing powers of 2 (same growth strategy as `Vec`). This keeps the memory overhead of forward pointers low, compared to a standard linked list.

Advantages:

* No memory copies at all involved in building a `Vec` = faster in parser.
* No wasted space/gaps in arena.
* Better cache locality than `Vec` for large lists for enum types, as memory for a list chunk is close to the memory containing the `Box<T>` content.
* Faster to insert into/delete elements as you only need to shuffle up/down entries in *one chunk*, rather than the whole list (see also https://github.com/oxc-project/oxc/issues/3503).

Disadvantages:

* Iterating over a Vec becomes slightly more complicated, so would have a small cost. But I expect that cost to be very minimal (1 branch which is mostly not taken).
* You can't index into the list, only iterate over it.

Indexing: I don't think we often need to index into `Vec`s. If we want to provide a `Path` API (https://github.com/oxc-project/backlog/issues/30#issuecomment-2144815711), that would require indexing. We could design a type which does allow indexing by using `index.trailing_zeros()` to calculate which chunk. This type would be large, so only suitable for top-level statements.

### 5. Build `Vec`s in scratch space

Add a scratch space arena to allocator. In the parser, build `Vec`s initially in the scratch space, and then copy the contents into main arena once the `Vec` is complete.

In parser, you're only building 1 x `Vec` at a time, so the scratch space can be treated as a stack, growing upwards.

Advantages:

* We end up with a standard `Vec` type.
* No wasted space/gaps in arena.
* No problem indexing into `Vec`.
* Less memory copies than current method.

Disadvantages vs chunked linked list:

* More memory copies than chunked linked list.
* Worse cache locality.
* Special API required for using vec builder.

Number of copies: Currently building a `Vec` with between 129 and 255 elements results in 252 elements being copied in total (4 + 8 + 16 + 32 + 64 + 128). Scratch space method copies the number of elements only once (129 to 255 copied elements) - so reduced copying in almost all cases. Copy happens only once in a single go, which may be faster (compiler will use SIMD ops and loop unrolling).

## Conclusion

All the above could be used in combination.

If we don't need indexing, chunked linked list is probably the best solution. Would need to benchmark it though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Build `Vec`s without repeated re-allocation and copies #35

The problem with `Vec`s

Possible solutions

1. Estimate number of statements from source text size

2. Small vec optimization

3. `Vec<Box<T>>`

4. Replace `Vec` with chunked linked list

5. Build `Vec`s in scratch space

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Build Vecs without repeated re-allocation and copies #35

Description

The problem with Vecs

Possible solutions

1. Estimate number of statements from source text size

2. Small vec optimization

3. Vec<Box<T>>

4. Replace Vec with chunked linked list

5. Build Vecs in scratch space

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Build `Vec`s without repeated re-allocation and copies #35

The problem with `Vec`s

3. `Vec<Box<T>>`

4. Replace `Vec` with chunked linked list

5. Build `Vec`s in scratch space