Skip to content

go/token: add FileSet.RemoveFile method #53200

@adonovan

Description

@adonovan

Background: A token.FileSet holds a collection of non-overlapping token.Files and provides a mapping from token.Pos integers to position information (file/line/column/offset). This is analogous to the way an address space containing several non-overlapping file mappings defines the meaning of a numeric pointer value. Applications that parse Go files are expected to use a single FileSet to provide meaning to all the token.Pos values in all the ASTs; in this way, the AST nodes themselves needn't include a pointer to the mapping information, as it can be implicitly supplied by the contextual FileSet.

The problem: this design leads applications to create a single FileSet that is effectively a ubiquitous global variable. A long-running application may encounter and parse an unbounded stream of files, calling AddFile for each one, causing its set of Files to grow without bound. The FileSet retains each File indefinitely, even after the application no longer cares about it. The memory footprint of a File is approximately one int per line of the file. Go source files in the standard library have an average of 371 lines, leading to 3KB of wasted space per file. Our application (gopls) parses a file at least once per keystroke of a Go developer's editor.

I can think of a few ways the memory usage of FileSet could be reduced.

The first approach is to move away from the concept of a single central FileSet in an application. Instead one would create a new FileSet for every parse, so that FileSets and Files are typically in 1:1 correspondence. An application could, as needed, create transient FileSets containing only the small subset of existing Files necessary for a given operation, such as invoking the type checker. However, this would still require all those Files to be non-overlapping, which requires central management of the "next free address" variable (like sbrk(2) in the mmap analogy) that should be provided in each call to FileSet.AddFile(base). It would also require that every call to Parse go through this central management, which might be practical within an application but doesn't seem feasible if libraries with published APIs are considered.

A second approach would be for the parser to save the token.File in every node of the AST so that there is never any need to maintain or consult an external FileSet data structure: each AST node's (*File, Pos) information would be complete. This approach however requires that every place that today uses a token.Pos to refer to a file position (relative to some implied FileSet) would need to be changed to a (*File, Pos) pair. For example, in errors from the type checker, and in diagnostics from the analysis framework. This seems like a major incompatible change to existing libraries.

The third, and simplest, approach, is to add a (*FileSet).RemoveFile(*File) method that removes a File from the set so that subsequent queries such as Pos, File, and Iterate return a negative result---like munmap in the mmap analogy. The liberated portion of the address space would not be re-used, but the File's memory could be reclaimed. As with munmap, an application would be responsible for remembering to call RemoveFile at an appropriate time, and not too soon. I propose we take this approach. The change to FileSet is very simple and low-risk.

@findleyr @griesemer

Metadata

Metadata

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions