[inductor][cpu] Perf regression

<p>perf regression found - compare with 2023_08_22 nightly</p>
<p>Repro</p>

bash [inductor_single_test.sh](https://github.com/chuanqi129/inductor-tools/blob/yudong/aws_auto/scripts/modelbench/inductor_single_run.sh)
multiple inference performance suite model float32 first dynamic cpp 0



<p>new_perf_regression</p>
<table border="1" class="dataframe table">
  <thead>
    <tr style="text-align: right;">
      <th>name</th>
      <th>batch_size_new</th>
      <th>speed_up_new</th>
      <th>inductor_new</th>
      <th>eager_new</th>
      <th>compilation_latency_new</th>
      <th>batch_size_old</th>
      <th>speed_up_old</th>
      <th>inductor_old</th>
      <th>eager_old</th>
      <th>compilation_latency_old</th>
      <th>Ratio Speedup(New/old)</th>
      <th>Eager Ratio(old/new)</th>
      <th>Inductor Ratio(old/new)</th>
      <th>Compilation_latency_Ratio(old/new)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>doctr_det_predictor</td>
      <td>1</td>
      <td>1.069458</td>
      <td>0.148391054</td>
      <td>0.158697999828732</td>
      <td>33.396333</td>
      <td>1</td>
      <td>1.503562</td>
      <td>0.106578389</td>
      <td>0.160247215721618</td>
      <td>37.416409</td>
      <td>0.71</td>
      <td>1.01</td>
      <td>0.72</td>
      <td>1.12</td>
    </tr>
    <tr>
      <td>pytorch_unet</td>
      <td>1</td>
      <td>0.862569</td>
      <td>0.310560677</td>
      <td>0.267880012599213</td>
      <td>18.169774</td>
      <td>1</td>
      <td>1.057315</td>
      <td>0.24839536899999998</td>
      <td>0.262632149574235</td>
      <td>27.68669</td>
      <td>0.82</td>
      <td>0.98</td>
      <td>0.8</td>
      <td>1.52</td>
    </tr>
    <tr>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
    </tr>
    <tr>
      <td>doctr_det_predictor</td>
      <td>1</td>
      <td>0.652977</td>
      <td>3.336484332</td>
      <td>2.178647529656364</td>
      <td>38.657023</td>
      <td>1</td>
      <td>1.20895</td>
      <td>1.828299074</td>
      <td>2.2103221655123</td>
      <td>36.253952</td>
      <td>0.54</td>
      <td>1.01</td>
      <td>0.55</td>
      <td>0.94</td>
    </tr>
    <tr>
      <td>pytorch_unet</td>
      <td>1</td>
      <td>0.915661</td>
      <td>5.48157092</td>
      <td>5.01926071017812</td>
      <td>20.518048</td>
      <td>1</td>
      <td>0.998196</td>
      <td>4.898655984</td>
      <td>4.889818808604864</td>
      <td>29.142998</td>
      <td>0.92</td>
      <td>0.97</td>
      <td>0.89</td>
      <td>1.42</td>
    </tr>
    <tr>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
    </tr>
  </tbody>
</table>

bash
[inductor_single_test.sh](https://github.com/chuanqi129/inductor-tools/blob/yudong/aws_auto/scripts/modelbench/inductor_single_run.sh)
multiple inference performance torchbench llama float32 first static default 0
<p>new_perf_regression</p>
<table border="1" class="dataframe table">
  <thead>
    <tr style="text-align: right;">
      <th>name</th>
      <th>batch_size_new</th>
      <th>speed_up_new</th>
      <th>inductor_new</th>
      <th>eager_new</th>
      <th>compilation_latency_new</th>
      <th>batch_size_old</th>
      <th>speed_up_old</th>
      <th>inductor_old</th>
      <th>eager_old</th>
      <th>compilation_latency_old</th>
      <th>Ratio Speedup(New/old)</th>
      <th>Eager Ratio(old/new)</th>
      <th>Inductor Ratio(old/new)</th>
      <th>Compilation_latency_Ratio(old/new)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>llama</td>
      <td>32</td>
      <td>0.578757</td>
      <td>0.053369041000000006</td>
      <td>0.030887706062037</td>
      <td>35.195648</td>
      <td>32</td>
      <td>1.143321</td>
      <td>0.027263855</td>
      <td>0.031171337962455</td>
      <td>40.613965</td>
      <td>0.51</td>
      <td>1.01</td>
      <td>0.51</td>
      <td>1.15</td>
    </tr>
    <tr>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
      <td>*</td>
    </tr>
  </tbody>
</table>


<p>SW info</p>
<table border="1" class="dataframe table">
  <thead>
    <tr style="text-align: right;">
      <th>SW</th>
      <th>Nightly commit</th>
      <th>Main commit</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Pytorch</td>
      <td>f54acf0</td>
      <td>bad3f2d</td>
    </tr>
    <tr>
      <td>Torchbench</td>
      <td>/</td>
      <td>770d5cf7</td>
    </tr>
    <tr>
      <td>torchaudio</td>
      <td>dc83b38</td>
      <td>66f661d</td>
    </tr>
    <tr>
      <td>torchtext</td>
      <td>c11d758</td>
      <td>60bea66</td>
    </tr>
    <tr>
      <td>torchvision</td>
      <td>58366ab</td>
      <td>a6dea86</td>
    </tr>
    <tr>
      <td>torchdata</td>
      <td>1d231d1</td>
      <td>757c032</td>
    </tr>
    <tr>
      <td>dynamo_benchmarks</td>
      <td>f228c8b</td>
      <td>/</td>
    </tr>
  </tbody>
</table>

cc @ezyang @msaroufim @wconstab @bdhirsh @zou3519

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inductor][cpu] Perf regression #108324

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

name	batch_size_new	speed_up_new	inductor_new	eager_new	compilation_latency_new	batch_size_old	speed_up_old	inductor_old	eager_old	compilation_latency_old	Ratio Speedup(New/old)	Eager Ratio(old/new)	Inductor Ratio(old/new)	Compilation_latency_Ratio(old/new)
doctr_det_predictor	1	1.069458	0.148391054	0.158697999828732	33.396333	1	1.503562	0.106578389	0.160247215721618	37.416409	0.71	1.01	0.72	1.12
pytorch_unet	1	0.862569	0.310560677	0.267880012599213	18.169774	1	1.057315	0.24839536899999998	0.262632149574235	27.68669	0.82	0.98	0.8	1.52
*	*	*	*	*	*	*	*	*	*	*	*	*	*	*
doctr_det_predictor	1	0.652977	3.336484332	2.178647529656364	38.657023	1	1.20895	1.828299074	2.2103221655123	36.253952	0.54	1.01	0.55	0.94
pytorch_unet	1	0.915661	5.48157092	5.01926071017812	20.518048	1	0.998196	4.898655984	4.889818808604864	29.142998	0.92	0.97	0.89	1.42
*	*	*	*	*	*	*	*	*	*	*	*	*	*	*

name	batch_size_new	speed_up_new	inductor_new	eager_new	compilation_latency_new	batch_size_old	speed_up_old	inductor_old	eager_old	compilation_latency_old	Ratio Speedup(New/old)	Eager Ratio(old/new)	Inductor Ratio(old/new)	Compilation_latency_Ratio(old/new)
llama	32	0.578757	0.053369041000000006	0.030887706062037	35.195648	32	1.143321	0.027263855	0.031171337962455	40.613965	0.51	1.01	0.51	1.15
*	*	*	*	*	*	*	*	*	*	*	*	*	*	*

SW	Nightly commit	Main commit
Pytorch	`f54acf0`	`bad3f2d`
Torchbench	/	770d5cf7
torchaudio	dc83b38	66f661d
torchtext	c11d758	60bea66
torchvision	58366ab	a6dea86
torchdata	1d231d1	757c032
dynamo_benchmarks	`f228c8b`	/

[inductor][cpu] Perf regression #108324

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions