Skip to content

TMA: Pause-loop is not classified at all levels #339

@aayasin

Description

@aayasin

A rather simple pause-loop kernel is classified properly as Core Bound at levels 1 & 2 and so in levels 5 and 6, but not at the mid-levels 4 and 5.
I am documenting this here and will address it in TMA 4.2 release.
Here is a reproducer with perf-tools.

P.S. @andikleen: This is a CFL machine (8th gen Core). Can that be reflected instead of [skl] in 1st line of toplev output?

$ ./kernels/gen-kernel.py -i pause > ./kernels/pause3x.c
$ gcc -g -O2 -o ./kernels/pause3x ./kernels/pause3x.c

$ ./pmu-tools/toplev.py --no-desc --no-perf --nodes '+CoreIPC,+UPI,+Time,+MUX' -vl6 -- ./kernels/pause3x 10000000  2>&1 | egrep -v ' [10]\.. '
# 4.11-full-perf on Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz [skl]
BE             Backend_Bound                                                                                 % Slots                      98.9
BE/Core        Backend_Bound.Core_Bound                                                                      % Slots                      98.8   <==
FE             Frontend_Bound.Fetch_Latency.MS_Switches                                                      % Clocks                      2.6 <
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization                                                    % Clocks                      9.9 <
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0                                   % Clocks                      8.5 <
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0.Serializing_Operation             % Clocks                     98.9 <
BE/Core        Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_0.Serializing_Operation.Slow_Pause  % Clocks                     91.3 <
Info.Thread    UPI                                                                                             Metric                      2.9
RET            Retiring.Light_Operations.Other                                                               % Uops                      100.0 <
MUX                                                                                                          %                             2.2
Using level 6.

$ lscpu
Architecture:         x86_64
CPU op-mode(s):       32-bit, 64-bit
Byte Order:           Little Endian
CPU(s):               12
On-line CPU(s) list:  0-5
Off-line CPU(s) list: 6-11
Thread(s) per core:   1
Core(s) per socket:   6
Socket(s):            1
NUMA node(s):         1
Vendor ID:            GenuineIntel
CPU family:           6
Model:                158
Model name:           Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
Stepping:             10
CPU MHz:              3701.500
CPU max MHz:          3700.0000
CPU min MHz:          800.0000
BogoMIPS:             7399.70
Virtualization:       VT-x
L1d cache:            32K
L1i cache:            32K
L2 cache:             256K
L3 cache:             12288K

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions