Skip to content

Conversation

MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Jul 21, 2025

Summary

This replaces the SmallVec on PlaceExpr with a ThinVec. The main motivation is that many places (all symbols) have no sub segments and paying the extra cost of an empty vec (32 bytes) for each of those adds quiet a bit to overall memory usage

This reduces the place_table size by roughly 25%

@MichaReiser MichaReiser added internal An internal refactor or improvement ty Multi-file analysis & type inference labels Jul 21, 2025
@MichaReiser MichaReiser added internal An internal refactor or improvement ty Multi-file analysis & type inference labels Jul 21, 2025
@MichaReiser MichaReiser self-assigned this Jul 21, 2025
@MichaReiser MichaReiser marked this pull request as draft July 21, 2025 17:51
Comment on lines 48 to 54
fn sub_segments_size(segments: &thin_vec::ThinVec<PlaceExprSubSegment>) -> usize {
segments.capacity() * std::mem::size_of::<PlaceExprSubSegment>()
+ segments
.iter()
.map(|segment| segment.get_heap_size())
.sum::<usize>()
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already PRd (and it's merged but not yet released) adding thin_vec support to the getsize crate

Copy link
Contributor

github-actions bot commented Jul 21, 2025

mypy_primer results

No ecosystem changes detected ✅

Memory usage changes were detected when running on open source projects
sphinx (https://github.com/sphinx-doc/sphinx)
-     memo fields = ~236MB
+     memo fields = ~224MB

prefect (https://github.com/PrefectHQ/prefect)
- TOTAL MEMORY USAGE: ~626MB
+ TOTAL MEMORY USAGE: ~596MB

Copy link
Contributor

github-actions bot commented Jul 21, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to parse examples/mcp/databricks_mcp_cookbook.ipynb:12:1:8: Simple statements must be separated by newlines or semicolons

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to parse examples/mcp/databricks_mcp_cookbook.ipynb:12:1:8: Simple statements must be separated by newlines or semicolons

@MichaReiser MichaReiser marked this pull request as ready for review July 21, 2025 18:13
@@ -775,7 +786,7 @@ impl std::fmt::Debug for PlaceTable {
pub(super) struct PlaceTableBuilder {
table: PlaceTable,

associated_place_ids: IndexVec<ScopedPlaceId, Vec<ScopedPlaceId>>,
associated_place_ids: IndexVec<ScopedPlaceId, SmallVec<[ScopedPlaceId; 4]>>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SmallVec here is "free" as 4 ScopedPlaceIds + the tag fit into 24 bytes

@AlexWaygood
Copy link
Member

Nice, looks like this yields some small speedups too! https://codspeed.io/astral-sh/ruff/branches/micha%2Fplace-table-thin-vec?runnerMode=Instrumentation

@AlexWaygood
Copy link
Member

I think PlaceExprSubSegment::StringSubscript could also be relatively easily changed so that it wraps a Box<str> rather than a String -- not sure if that's worth doing or not

@MichaReiser
Copy link
Member Author

I think PlaceExprSubSegment::StringSubscript could also be relatively easily changed so that it wraps a Box rather than a String -- not sure if that's worth doing or not

This, unfortunately, doesn't help because Name is as big as a String.

An alternative to what I've done here is to split PlaceTable and internally track:

  • symbol_map: HashTable<ScopedSymbolId>,
  • symbols: IndexVec<ScopedSymbolId, Symbol>
  • member_map: HashTable<ScopedMemberId>,
  • members: IndexVec<ScopedMemberId, Member>, Can use a SmallVec because each element has at least one item.
  • ScopedPlaceId becomes an enum over ScopedSymbolId and ScopedMemberId.

I suspect that this might be slightly larger refactor but it might also help to untangle some of the complexity in PlaceTable

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@carljm
Copy link
Contributor

carljm commented Jul 21, 2025

An alternative to what I've done here is to split PlaceTable

Yes, I thought about this when reviewing the PR that added PlaceTable; I think it might be a nice refactor.

@MichaReiser
Copy link
Member Author

Yes, I thought about this when reviewing the PR that added PlaceTable; I think it might be a nice refactor.

I'll try to find time to look into this because it might allow us to reduce memory usage even more by changing the Member definition from Name, Segments to ScopedSymbolId, Segments or just a Segments portion

@MichaReiser
Copy link
Member Author

Yes, I thought about this when reviewing the PR that added PlaceTable; I think it might be a nice refactor.

I think this would be a very nice refactor but it's also not trivial. I started and I've currently 188 errors... and many of them aren't very trivial to fix.

I do think that it would help clarify many things and having separate types for Symbol and Member even allows stricter typing in many cases. But, as it often is with these kind of refactors, introducing those abstractions later is sort of a pain

@MichaReiser MichaReiser merged commit dc10ab8 into main Jul 22, 2025
37 checks passed
@MichaReiser MichaReiser deleted the micha/place-table-thin-vec branch July 22, 2025 18:39
@MichaReiser
Copy link
Member Author

The main blocker here is that UseDefMap uses a bunch of IndexVec<ScopedPlaceId>. That prevents us from splitting the places IndexVec into two.

We could split the hash map but there's little value in that because the main motivation is to reduce the size of PlaceExpr which is stored in the IndexVec.

dcreager added a commit that referenced this pull request Jul 22, 2025
* main:
  [ty] Use `ThinVec` for sub segments in `PlaceExpr` (#19470)
  [ty] Splat variadic arguments into parameter list (#18996)
  [`flake8-pyi`] Skip fix if all `Union` members are `None` (`PYI016`)  (#19416)
  Skip notebook with errors in ecosystem check (#19491)
  [ty] Consistent use of American english (in rules) (#19488)
  [ty] Support iterating over enums (#19486)
  Fix panic for illegal `Literal[…]` annotations with inner subscript expressions (#19489)
  Move fix suggestion to subdiagnostic (#19464)
  [ty] Implement non-stdlib stub mapping for classes and functions (#19471)
  [ty] Disallow illegal uses of `ClassVar` (#19483)
  [ty] Disallow `Final` in function parameter/return-type annotations (#19480)
  [ty] Extend `Final` test suite (#19476)
  [ty] Minor change to diagnostic message for invalid Literal uses (#19482)
  [ty] Detect illegal non-enum attribute accesses in Literal annotation (#19477)
  [ty] Reduce size of `TypeInference` (#19435)
  Run MD tests for Markdown-only changes (#19479)
  Revert "[ty] Detect illegal non-enum attribute accesses in Literal annotation"
  [ty] Detect illegal non-enum attribute accesses in Literal annotation
  [ty] Added semantic token support for more identifiers (#19473)
  [ty] Make tuple subclass constructors sound (#19469)
UnboundVariable pushed a commit to UnboundVariable/ruff that referenced this pull request Jul 23, 2025
* main: (28 commits)
  [ty] highlight the argument in `static_assert` error messages (astral-sh#19426)
  [ty] Infer single-valuedness for enums based on `int`/`str` (astral-sh#19510)
  [ty] Restructure submodule query around `File` dependency
  [ty] Make `Module` a Salsa ingredient
  [ty] Reachability analysis for `isinstance(…)` branches (astral-sh#19503)
  [ty] Normalize single-member enums to their instance type (astral-sh#19502)
  [ty] Invert `ty_ide` and `ty_project` dependency (astral-sh#19501)
  [ty] Implement mock language server for testing (astral-sh#19391)
  [ty] Detect enums if metaclass is a subtype of EnumType/EnumMeta (astral-sh#19481)
  [ty] perform type narrowing for places marked `global` too (astral-sh#19381)
  [ty] Use `ThinVec` for sub segments in `PlaceExpr` (astral-sh#19470)
  [ty] Splat variadic arguments into parameter list (astral-sh#18996)
  [`flake8-pyi`] Skip fix if all `Union` members are `None` (`PYI016`)  (astral-sh#19416)
  Skip notebook with errors in ecosystem check (astral-sh#19491)
  [ty] Consistent use of American english (in rules) (astral-sh#19488)
  [ty] Support iterating over enums (astral-sh#19486)
  Fix panic for illegal `Literal[…]` annotations with inner subscript expressions (astral-sh#19489)
  Move fix suggestion to subdiagnostic (astral-sh#19464)
  [ty] Implement non-stdlib stub mapping for classes and functions (astral-sh#19471)
  [ty] Disallow illegal uses of `ClassVar` (astral-sh#19483)
  ...

# Conflicts:
#	crates/ty_ide/src/goto.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement ty Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants