-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Closed
Labels
Description
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
- 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 5. Please use English, otherwise it will be closed.
Describe the bug
test_cublas_grouped_gemm.py
test_cublas_grouped_gemm.py ....................................................................................................................................................... [ 60%]
....................................................FFFFFFF........................................ [100%]
======================================================================================== FAILURES =========================================================================================
____________________________________________________________________ test_grouped_gemm_accuracy[8192-2-16-out_dtype1] _____________________________________________________________________
out_dtype = torch.bfloat16, M = 16, N = 2, K = 8192
@pytest.mark.skipif(
skip_condition, reason="CUDA not available or CUDA version lower than 12.5"
)
@pytest.mark.parametrize("out_dtype", [torch.float16, torch.bfloat16])
@pytest.mark.parametrize("M", [1, 16, 32, 256, 1024])
@pytest.mark.parametrize("N", [2, 16, 128, 256, 4096])
@pytest.mark.parametrize("K", [3, 16, 32, 512, 8192])
def test_grouped_gemm_accuracy(out_dtype, M, N, K):
a = torch.randn((M, K), device="cuda", dtype=out_dtype) * 5
b = torch.randn((N, K), device="cuda", dtype=out_dtype) * 5
expected = torch.matmul(a, b.t()).to(out_dtype)
a_array = [a]
b_array = [b]
c_array = [torch.empty((M, N), device="cuda", dtype=out_dtype)]
result_torch = torch_grouped_gemm(a_array, b_array, out_dtype)[0]
cublas_grouped_gemm(a_array, b_array, c_array, out_dtype)
torch.testing.assert_close(result_torch, expected)
> torch.testing.assert_close(c_array[0], expected)
E AssertionError: Tensor-likes are not close!
E
E Mismatched elements: 3 / 32 (9.4%)
E Greatest absolute difference: 2.375 at index (7, 1) (up to 1e-05 allowed)
E Greatest relative difference: 0.11865234375 at index (7, 1) (up to 0.016 allowed)
test_cublas_grouped_gemm.py:36: AssertionError
____________________________________________________________________ test_grouped_gemm_accuracy[8192-2-32-out_dtype0] _____________________________________________________________________
out_dtype = torch.float16, M = 32, N = 2, K = 8192
@pytest.mark.skipif(
skip_condition, reason="CUDA not available or CUDA version lower than 12.5"
)
@pytest.mark.parametrize("out_dtype", [torch.float16, torch.bfloat16])
@pytest.mark.parametrize("M", [1, 16, 32, 256, 1024])
@pytest.mark.parametrize("N", [2, 16, 128, 256, 4096])
@pytest.mark.parametrize("K", [3, 16, 32, 512, 8192])
def test_grouped_gemm_accuracy(out_dtype, M, N, K):
a = torch.randn((M, K), device="cuda", dtype=out_dtype) * 5
b = torch.randn((N, K), device="cuda", dtype=out_dtype) * 5
expected = torch.matmul(a, b.t()).to(out_dtype)
a_array = [a]
b_array = [b]
c_array = [torch.empty((M, N), device="cuda", dtype=out_dtype)]
result_torch = torch_grouped_gemm(a_array, b_array, out_dtype)[0]
cublas_grouped_gemm(a_array, b_array, c_array, out_dtype)
torch.testing.assert_close(result_torch, expected)
> torch.testing.assert_close(c_array[0], expected)
E AssertionError: Tensor-likes are not close!
E
E Mismatched elements: 10 / 64 (15.6%)
E Greatest absolute difference: 2.0 at index (1, 1) (up to 1e-05 allowed)
E Greatest relative difference: 0.0033245086669921875 at index (29, 0) (up to 0.001 allowed)
test_cublas_grouped_gemm.py:36: AssertionError
____________________________________________________________________ test_grouped_gemm_accuracy[8192-2-32-out_dtype1] _____________________________________________________________________
out_dtype = torch.bfloat16, M = 32, N = 2, K = 8192
@pytest.mark.skipif(
skip_condition, reason="CUDA not available or CUDA version lower than 12.5"
)
@pytest.mark.parametrize("out_dtype", [torch.float16, torch.bfloat16])
@pytest.mark.parametrize("M", [1, 16, 32, 256, 1024])
@pytest.mark.parametrize("N", [2, 16, 128, 256, 4096])
@pytest.mark.parametrize("K", [3, 16, 32, 512, 8192])
def test_grouped_gemm_accuracy(out_dtype, M, N, K):
a = torch.randn((M, K), device="cuda", dtype=out_dtype) * 5
b = torch.randn((N, K), device="cuda", dtype=out_dtype) * 5
expected = torch.matmul(a, b.t()).to(out_dtype)
a_array = [a]
b_array = [b]
c_array = [torch.empty((M, N), device="cuda", dtype=out_dtype)]
result_torch = torch_grouped_gemm(a_array, b_array, out_dtype)[0]
cublas_grouped_gemm(a_array, b_array, c_array, out_dtype)
torch.testing.assert_close(result_torch, expected)
> torch.testing.assert_close(c_array[0], expected)
E AssertionError: Tensor-likes are not close!
E
E Mismatched elements: 3 / 64 (4.7%)
E Greatest absolute difference: 4.5 at index (11, 0) (up to 1e-05 allowed)
E Greatest relative difference: 0.1259765625 at index (27, 0) (up to 0.016 allowed)
test_cublas_grouped_gemm.py:36: AssertionError
____________________________________________________________________ test_grouped_gemm_accuracy[8192-2-256-out_dtype0] ____________________________________________________________________
out_dtype = torch.float16, M = 256, N = 2, K = 8192
@pytest.mark.skipif(
skip_condition, reason="CUDA not available or CUDA version lower than 12.5"
)
@pytest.mark.parametrize("out_dtype", [torch.float16, torch.bfloat16])
@pytest.mark.parametrize("M", [1, 16, 32, 256, 1024])
@pytest.mark.parametrize("N", [2, 16, 128, 256, 4096])
@pytest.mark.parametrize("K", [3, 16, 32, 512, 8192])
def test_grouped_gemm_accuracy(out_dtype, M, N, K):
a = torch.randn((M, K), device="cuda", dtype=out_dtype) * 5
b = torch.randn((N, K), device="cuda", dtype=out_dtype) * 5
expected = torch.matmul(a, b.t()).to(out_dtype)
a_array = [a]
b_array = [b]
c_array = [torch.empty((M, N), device="cuda", dtype=out_dtype)]
result_torch = torch_grouped_gemm(a_array, b_array, out_dtype)[0]
cublas_grouped_gemm(a_array, b_array, c_array, out_dtype)
torch.testing.assert_close(result_torch, expected)
> torch.testing.assert_close(c_array[0], expected)
E AssertionError: Tensor-likes are not close!
E
E Mismatched elements: 63 / 512 (12.3%)
E Greatest absolute difference: 1.5 at index (5, 0) (up to 1e-05 allowed)
E Greatest relative difference: 0.09844970703125 at index (31, 0) (up to 0.001 allowed)
test_cublas_grouped_gemm.py:36: AssertionError
____________________________________________________________________ test_grouped_gemm_accuracy[8192-2-256-out_dtype1] ____________________________________________________________________
out_dtype = torch.bfloat16, M = 256, N = 2, K = 8192
@pytest.mark.skipif(
skip_condition, reason="CUDA not available or CUDA version lower than 12.5"
)
@pytest.mark.parametrize("out_dtype", [torch.float16, torch.bfloat16])
@pytest.mark.parametrize("M", [1, 16, 32, 256, 1024])
@pytest.mark.parametrize("N", [2, 16, 128, 256, 4096])
@pytest.mark.parametrize("K", [3, 16, 32, 512, 8192])
def test_grouped_gemm_accuracy(out_dtype, M, N, K):
a = torch.randn((M, K), device="cuda", dtype=out_dtype) * 5
b = torch.randn((N, K), device="cuda", dtype=out_dtype) * 5
expected = torch.matmul(a, b.t()).to(out_dtype)
a_array = [a]
b_array = [b]
c_array = [torch.empty((M, N), device="cuda", dtype=out_dtype)]
result_torch = torch_grouped_gemm(a_array, b_array, out_dtype)[0]
cublas_grouped_gemm(a_array, b_array, c_array, out_dtype)
torch.testing.assert_close(result_torch, expected)
> torch.testing.assert_close(c_array[0], expected)
E AssertionError: Tensor-likes are not close!
E
E Mismatched elements: 31 / 512 (6.1%)
E Greatest absolute difference: 40.0 at index (148, 1) (up to 1e-05 allowed)
E Greatest relative difference: 0.392578125 at index (48, 0) (up to 0.016 allowed)
test_cublas_grouped_gemm.py:36: AssertionError
___________________________________________________________________ test_grouped_gemm_accuracy[8192-2-1024-out_dtype0] ____________________________________________________________________
out_dtype = torch.float16, M = 1024, N = 2, K = 8192
@pytest.mark.skipif(
skip_condition, reason="CUDA not available or CUDA version lower than 12.5"
)
@pytest.mark.parametrize("out_dtype", [torch.float16, torch.bfloat16])
@pytest.mark.parametrize("M", [1, 16, 32, 256, 1024])
@pytest.mark.parametrize("N", [2, 16, 128, 256, 4096])
@pytest.mark.parametrize("K", [3, 16, 32, 512, 8192])
def test_grouped_gemm_accuracy(out_dtype, M, N, K):
a = torch.randn((M, K), device="cuda", dtype=out_dtype) * 5
b = torch.randn((N, K), device="cuda", dtype=out_dtype) * 5
expected = torch.matmul(a, b.t()).to(out_dtype)
a_array = [a]
b_array = [b]
c_array = [torch.empty((M, N), device="cuda", dtype=out_dtype)]
result_torch = torch_grouped_gemm(a_array, b_array, out_dtype)[0]
cublas_grouped_gemm(a_array, b_array, c_array, out_dtype)
torch.testing.assert_close(result_torch, expected)
> torch.testing.assert_close(c_array[0], expected)
E AssertionError: Tensor-likes are not close!
E
E Mismatched elements: 280 / 2048 (13.7%)
E Greatest absolute difference: 4.0 at index (143, 0) (up to 1e-05 allowed)
E Greatest relative difference: 0.377197265625 at index (65, 0) (up to 0.001 allowed)
test_cublas_grouped_gemm.py:36: AssertionError
___________________________________________________________________ test_grouped_gemm_accuracy[8192-2-1024-out_dtype1] ____________________________________________________________________
out_dtype = torch.bfloat16, M = 1024, N = 2, K = 8192
@pytest.mark.skipif(
skip_condition, reason="CUDA not available or CUDA version lower than 12.5"
)
@pytest.mark.parametrize("out_dtype", [torch.float16, torch.bfloat16])
@pytest.mark.parametrize("M", [1, 16, 32, 256, 1024])
@pytest.mark.parametrize("N", [2, 16, 128, 256, 4096])
@pytest.mark.parametrize("K", [3, 16, 32, 512, 8192])
def test_grouped_gemm_accuracy(out_dtype, M, N, K):
a = torch.randn((M, K), device="cuda", dtype=out_dtype) * 5
b = torch.randn((N, K), device="cuda", dtype=out_dtype) * 5
expected = torch.matmul(a, b.t()).to(out_dtype)
a_array = [a]
b_array = [b]
c_array = [torch.empty((M, N), device="cuda", dtype=out_dtype)]
result_torch = torch_grouped_gemm(a_array, b_array, out_dtype)[0]
cublas_grouped_gemm(a_array, b_array, c_array, out_dtype)
torch.testing.assert_close(result_torch, expected)
> torch.testing.assert_close(c_array[0], expected)
E AssertionError: Tensor-likes are not close!
E
E Mismatched elements: 118 / 2048 (5.8%)
E Greatest absolute difference: 30.0 at index (893, 0) (up to 1e-05 allowed)
E Greatest relative difference: 5.0625 at index (347, 1) (up to 0.016 allowed)
test_cublas_grouped_gemm.py:36: AssertionError
================================================================================= short test summary info =================================================================================
FAILED test_cublas_grouped_gemm.py::test_grouped_gemm_accuracy[8192-2-16-out_dtype1] - AssertionError: Tensor-likes are not close!
FAILED test_cublas_grouped_gemm.py::test_grouped_gemm_accuracy[8192-2-32-out_dtype0] - AssertionError: Tensor-likes are not close!
FAILED test_cublas_grouped_gemm.py::test_grouped_gemm_accuracy[8192-2-32-out_dtype1] - AssertionError: Tensor-likes are not close!
FAILED test_cublas_grouped_gemm.py::test_grouped_gemm_accuracy[8192-2-256-out_dtype0] - AssertionError: Tensor-likes are not close!
FAILED test_cublas_grouped_gemm.py::test_grouped_gemm_accuracy[8192-2-256-out_dtype1] - AssertionError: Tensor-likes are not close!
FAILED test_cublas_grouped_gemm.py::test_grouped_gemm_accuracy[8192-2-1024-out_dtype0] - AssertionError: Tensor-likes are not close!
FAILED test_cublas_grouped_gemm.py::test_grouped_gemm_accuracy[8192-2-1024-out_dtype1] - AssertionError: Tensor-likes are not close!
============================================================================== 7 failed, 243 passed in 8.62s ==============================================================================
test_int8_gemm.py
all failed
test_per_token_group_quant_8bit.py
============================================================================ 900 failed, 300 passed in 16.52s =============================================================================
Reproduction
N/A
Environment
N/A