Skip to content

aarch64 linux: torch performance is regressed for openblas backend from torch 2.0 to 2.1+ #119374

@snadampal

Description

@snadampal

🐛 Describe the bug

on aarch64 linux platform, PyTorch inference latencies are increased on torch 2.1 and 2.2 compared to torch2.0 when openblas backend is used for multi-threaded configuration. The regression is higher for larger thread counts.

On AWS Graviton3, c7g.4xl, with 16 threads, the inference latency with torch2.0 is
Time elapsed: 2.777902126312256 seconds

whereas with torch 2.1 and later, it is
Time elapsed: 4.907686471939087 seconds

Reproducer:

pip3 install torch
pip3 install transformers
OMP_NUM_THREADS=16 python3 gpt2-large.py

gpt2-large.py

import time

from transformers import pipeline, set_seed

generator = pipeline('text-generation', model='gpt2-large')
set_seed(42)

start_time = time.time()

generator("Hello, I'm a language model", max_length=40, num_return_sequences=1)

end_time = time.time()

elapsed_time = end_time - start_time
print(f"Time elapsed: {elapsed_time} seconds")

Versions

aarch64 linux, Ubuntu 22.04, torch 2.1 or later

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions