Skip to content

os: add ReadDir method for lightweight directory reading #41467

@rsc

Description

@rsc

os.File provides two ways to read a directory: Readdirnames returns a list of the names of the directory entries, and Readdir returns the names along with stat information.

On Plan 9 and Windows, Readdir can be implemented with only a directory read - the directory read operation provides the full stat information.

But many Go users use Unix systems.

On most Unix systems, the directory read does not provide full stat information. So the implementation of Readdir reads the names from the directory and then calls Lstat for each file. This is fairly expensive.

Much of the time, such as in the implementation of file system walking, the only information the caller of Readdir really needs is the name and whether the name denotes a directory. On most Unix systems, that single bit of information—is this name a directory?—is available from the plain directory read, without an additional stat. If the caller is only using that bit, the extra Lstat calls are unnecessary and slow. (Goimports, for example, has its own directory walker to avoid this cost.) In fact, a survey of existing Go code found that only about 10% of uses of ReadDir actually need more than names and is-directory bits.

It appears that a third way to read directories should be added, to let all this code be written more efficiently. Expanding on a suggestion by @mpx, I propose to add:

// ReadDir reads the contents of the directory associated with the file f
// and returns a slice of DirEntry values in directory order.
// Subsequent calls on the same file will yield later DirEntry records in the directory.
//
// If n > 0, ReadDir returns at most n DirEntry records.
// In this case, if ReadDir returns an empty slice, it will return an error explaining why.
// At the end of a directory, the error is io.EOF.
//
// If n <= 0, ReadDir returns all the DirEntry records remaining in the directory.
// When it succeeds, it returns a nil error (not io.EOF).
func (f *File) ReadDir(n int) ([]DirEntry, error) 

// A DirEntry is an entry read from a directory (using the ReadDir method).
type DirEntry interface {
	// Name returns the name of the file (or subdirectory) described by the entry.
	// This name is only the final element of the path, not the entire path.
	// For example, Name would return "hello.go" not "/home/gopher/hello.go".
	Name() string
	
	// IsDir reports whether the entry describes a subdirectory.
	IsDir() bool
	
	// Info returns the FileInfo for the file or subdirectory described by the entry.
	// The returned FileInfo may be from the time of the original directory read
	// or from the time of the call to Info. If the file has been removed or renamed
	// since the directory read, Info may return an error satisfying errors.Is(err, ErrNotExist).
	// If the entry denotes a symbolic link, Info reports the information about the link itself,
	// not the link's target.
	Info() (FileInfo, error)
}

The FS proposal would then adopt this ReadDir and ignore Readdir entirely.

In #41188 I wrote:

Various people have proposed adding a third directory reading option of one form or another, to get names and IsDir bits. This would certainly address the slow directory walk issue on Unix systems, but it seems like overfitting to Unix.

I still believe that, but the survey convinced me that nearly all existing Readdir uses fall into this category, so it's not quite so bad to provide an optimized path for Unix systems. The DirEntry.Info method specification above allows both the eager info loading of Plan 9/Windows and the lazy loading needed on Unix. In contrast to #41188, the laziness is explicitly allowed from the beginning, and failures of the lazy loading can be reported in the error result.

Thoughts?

Update: A few clarifications to common questions:

  • Info returns the result of Lstat on systems where symbolic links happen.
  • On file systems that do not return even IsDir bits in the directory listing, ReadDir will have to do the full lstat loop that Readdir does today. That seems OK: slow file systems are slow.
  • IsDir returns false for symlinks to directories, since Lstat's FileInfo's IsDir does too.
  • ReadDir vs Readdir corrects the spelling of the name but does not differ only in case. It also differs in signature, so mistakes will not silently misbehave; they will loudly miscompile. In any event, the motivation here is to set a good, short name for the FS proposal, and for a method many people will need to implement, ReadDir is better than the longer ReadDirEntries.

Update 2: A few changes were made along the way to acceptnce. See #41467 (comment) for the final version.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions