-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Created after this discussion - #13413
OSes, python and pytest versions
OS: macOS 15.4.1, Ubuntu 22.04
Python 3.12.8
Pytest 8.3.4
Problem description
I need to execute a lot of non-python tests that are stored in folders with lots of nesting. And I found that Pytest struggles during the collection.
Some code context:
@pytest.hookimpl(wrapper=True)
def pytest_collection(session):
resolved_paths = resolve_suites(session)
session.config.args.extend(resolved_paths)
return (yield)
def pytest_collect_file(parent, file_path):
if file_path.suffix == ".yaml":
return YamlFile.from_parent(parent, path=file_path)
class YamlFile(pytest.File):
def collect(self) -> Iterable[pytest.Item | pytest.Collector]:
test_cases = YamlTestResolver().from_file(f"{self.path}") # leftover from previous runner, but resolves needed stuff.
for tc in test_cases:
yield YamlTest.from_parent(self, name=tc.name, tc_spec=tc)
class YamlTest(pytest.Item):
def __init__(self, ptul_tc, **kwargs) -> None:
super().__init__(**kwargs)
self.tc_spec = tc_spec
Consider this folder structure:
root_working_folder
├── framework_repo
│ └── framework_internal_folder
└── repo_with_tests
└── tests
├── test_folder_1
│ └── inner_folder
└── test_folder_2
└── inner_folder
└── even_more_depth
But even more subfolders in repo_with_tests
Pytest call is the following: pytest --collect only ${list with 1k non-python tests}
. (1 test per file)
When I execute the above from framework_internal_folder
, the execution time is 56 minutes with cProfile
, 23 minutes without. When I make the same call from root_working_folder
or repo_with_tests
, the execution time is ~2 minutes with with cProfile
/ 38 seconds without.
The most significant time difference in the two calls is in the cumulative time of that function - nodes.py:546(_check_initialpaths_for_relpath)
# from framework_internal_folder
ncalls tottime percall cumtime percall filename:lineno(function)
237033 85.508 0.000 3176.004 0.013 ../_pytest/nodes.py:546(_check_initialpaths_for_relpath)
# from root_working_folder
ncalls tottime percall cumtime percall filename:lineno(function)
135 0.051 0.000 1.772 0.013 ../_pytest/nodes.py:546(_check_initialpaths_for_relpath)
According to stats, when executing from framework_internal_folder, the most struggling function is here:
ncalls tottime percall cumtime percall filename:lineno(function)
206063304 262.471 0.000 1580.672 0.000 ../_pytest/pathlib.py:990(commonpath)
# and stats for callers of that function:
Function was called by...
ncalls tottime cumtime
pathlib.py:990(commonpath) <- 205937164/3722648 262.308 28.844 nodes.py:546(_check_initialpaths_for_relpath)
Possible solutions
Cache for _check_initialpaths_for_relpath
I experimented with adding lru_cache
to _check_initialpaths_for_relpath
:
@lru_cache(maxsize=1000)
def _check_initialpaths_for_relpath(initialpaths: frozenset[Path], path: Path) -> str | None:
for initial_path in initialpaths:
if commonpath(path, initial_path) == initial_path:
rel = str(path.relative_to(initial_path))
return "" if rel == "." else rel
return None
That change decreased the overall collection time to 4 minutes.
Stats are also impressive:
ncalls tottime percall cumtime percall filename:lineno(function)
5798 2.109 0.000 79.265 0.014 nodes.py:545(_check_initialpaths_for_relpath)
I'm not sure if commonpath
needs caching as well.
Anything else on the collection mechanism?
Other optimizations in directory/file collections