Skip to content

Cache always updates clusters even if not needed anymore #38

@shiosai

Description

@shiosai

During fast_match, drain always iterates over all possible clusters and updates their access time in the cache. This leads to two problems:

  • The update slows down the performance
  • Even clusters that will never match anymore will never be removed from cache

Expected behavior:

Cluster will only be updated/touched in cache after they were actual used/chosen. There is actually a comment for this in the source code already:

Try to retrieve cluster from cache with bypassing eviction algorithm as we are only testing candidates for a match.
https://github.com/IBM/Drain3/blob/15470e391caed9a9ef5038cdd1dbd373bd2386a8/drain3/drain.py#L217

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingperformanceA performance issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions