Releases · mcthouacbb/Sirius

@JonathanHallstrom

Release notes

After months of work, issues with buggy OpenBench workers, and over 100 commits, Sirius is back with its next major release. Many improvements to the search and evaluation were made, along with some QoL improvements, including support for the Stockfish WDL model. Major changes include 3 SPSA tunes at LTC, a much improved king safety evaluation, many tweaks and additions to passed pawns and pawn structure evaluation, generating quiet check evasion moves in quiescence search, tweaks and additions to correction histories, and more. Sirius v9.0 is nearly 130 elo stronger than Sirius v8.0 on a balanced book.

Additionally, Sirius made its TCEC(Top Chess Engine Championship) debut these past few months. Sirius performed well against other HCE(handcrafted evaluation) engines, but was not very competitive against most NNUE engines due to their much stronger evaluation.

Changelog

Lots of refactors and cleanups
- Added clang-format for more consistent code formatting
- Fixed promotions printing uppercase letters
Support for FRC and DFRC chess(chess960)
Some stability and speed improvements for multithreading and TT clearing
- Tested up to TCEC conditions(512 threads and 256GB hash)
Stockfish WDL Model Support
- Normalized Eval
- Estimated win, draw, and loss probabilities
Elo estimates for all major search features
Speedups
- Staged move generation for killer moves
Many evaluation changes
- Fixing the evaluation to be fully vertically symmetric.
- King safety
  - A quadratic adjustment formula applied after summing all safety terms
  - Penalty for weak squares around the king
  - Penalty for an attack without a queen
  - Bonus/penalty for king flank attacks and defenses
  - Improved safe checks calculation
- Pawn structure and passed pawn evaluation
  - Bonus for candidate passed pawns
  - Penalty for backwards and doubled pawns
  - Tweaked definition of isolated pawns
  - Better king-passer proximity calculation
- Exclude the king and blocked pawns from mobility
- Bonus for restricting the enemy's mobility
- Bonus for being able to move a piece to attack the enemy queen
- Bonus for a bishop on a long center diagonal
- Specialized evaluation and scaling functions for certain endgames
Many search tweaks
- Quiescence search quiet check evasion generation and pruning
- Correction history
  - Added continuation correction history
  - Tuned correction history weights
  - Improved correction history updates
  - LMR adjustments based on correction history
- Move loop pruning
  - Added noisy futility pruning
  - Increased max pruning depth
  - Adjust futility margin by history
- Better capture move ordering
- Improved internal iterative reductions implementation
- More accurate handling of pinned pieces in static exchange evaluation
- Added probcut
- Improved singular extensions and negative extensions
- Many other changes
3 LTC SPSA tunes of the entire search

Self-play against Sirius v8.0

Elo   | 127.02 +- 6.45 (95%)
Conf  | 8.0+0.08s Threads=1 Hash=16MB
Games | N: 5004 W: 2087 L: 335 D: 2582
Penta | [12, 151, 814, 1123, 402]


Elo   | 128.92 +- 7.78 (95%)
Conf  | 40.0+0.40s Threads=1 Hash=64MB
Games | N: 2502 W: 965 L: 77 D: 1460
Penta | [1, 28, 451, 624, 147]

Elo Estimates

Results of sirius-dev-a2c68f9 vs komodo-14.1 (40+0.4, 1t, 32MB, UHO_Lichess_4852_v1.epd):
Elo: 72.08 +/- 11.80, nElo: 117.91 +/- 18.74
LOS: 100.00 %, DrawRatio: 38.33 %, PairsRatio: 3.33
Games: 1320, Wins: 498, Losses: 228, Draws: 594, Points: 795.0 (60.23 %)
Ptnml(0-2): [5, 89, 253, 257, 56], WL/DD Ratio: 1.04

Results of sirius-dev-a2c68f9 vs weiss-dev-95b0951 (20+0.2, 1t, 32MB, UHO_Lichess_4852_v1.epd):
Elo: 32.16 +/- 8.23, nElo: 52.51 +/- 13.35
LOS: 100.00 %, DrawRatio: 44.62 %, PairsRatio: 1.62
Games: 2600, Wins: 868, Losses: 628, Draws: 1104, Points: 1420.0 (54.62 %)
Ptnml(0-2): [14, 261, 580, 361, 84], WL/DD Ratio: 1.41

At medium-long time controls, Sirius is significantly stronger than the last HCE release of Komodo and quite a bit stronger than Weiss dev. This should cement Sirius as the current 2nd strongest HCE engine. It is still quite far from Stockfish classical, which is the current strongest HCE engine.

Results of sirius-dev-a2c68f9 vs sf-classical (40+0.4, 1t, 32MB, UHO_Lichess_4852_v1.epd):
Elo: -40.18 +/- 10.26, nElo: -69.01 +/- 17.47
LOS: 0.00 %, DrawRatio: 42.50 %, PairsRatio: 0.46
Games: 1520, Wins: 326, Losses: 501, Draws: 693, Points: 672.5 (44.24 %)
Ptnml(0-2): [25, 275, 323, 124, 13], WL/DD Ratio: 1.20

Estimated 3530 rating CCRL Blitz
Estimated 3700 rating CCRL FRC

Credits

Thanks to members in the engine dev community who continue to support and discuss topics in computer chess with me.
Additionally, much credit goes to Stockfish, Ethereal, Weiss, Stash, and Perseus, along with their developers.
Many ideas and features in Sirius were inspired by or taken from these engines, and their knowledge has been invaluable to the development of Sirius.

Selecting a binary

v1 is significantly slower than v2, v3, and v4, and should only be used when absolutely necessary. In general, higher levels are faster.
If you download a binary and it crashes(closes immediately or doesn't respond to commands), it likely doesn't work, and you should download a different binary.

Linux binaries courtesy of @JonathanHallstrom

Full Changelog: v8.0...v9.0

Release notes

Sirius v8.0 is much stronger than the previous version. Many improvements to the search and evaluation were made, along with some QoL improvements, including pretty printing. Major changes include singular extensions, a much better correction history implementation, a rework of threats eval, and staged move generation. Sirius v8.0 is 200+ elo stronger than Sirius v7.0 on a balanced book.

Changelog

Lots of refactors
UCI changes
- Pretty printing
- perft and eval command
Speedups
- Staged move generation
- Incrementally updated evaluation terms
A few new evaluation features
- Horizontally mirrored piece-square tables
- Threats
  - Complete rework
  - Pawn push and king threats
  - defended/weak threats
- King safety
  - Pawn shield/storm rework
- Passed pawns
  - Score based on whether push square is blocked/enemy attacked
- Minors behind pawns
- Bishop same color pawns
Many search tweaks
- Best move stability time management
- Correction history
  - Qsearch correction
  - Many new tables besides pawn structure: minor/major piece, non pawn, material, and threats
- Singular extensions
  - multicut, double extensions, and negative extensions
- Razoring
- Many other search changes

Self-play against Sirius v7.0

Elo   | 212.25 +- 7.84 (95%)
Conf  | 8.0+0.08s Threads=1 Hash=16MB
Games | N: 5004 W: 2946 L: 220 D: 1838
Penta | [2, 66, 489, 1094, 851]


Elo   | 207.46 +- 16.39 (95%)
Conf  | 40.0+0.40s Threads=1 Hash=64MB
Games | N: 1000 W: 564 L: 29 D: 407
Penta | [1, 5, 103, 240, 151]

Estimated 3450 rating

Credits

Thanks to members in the engine dev community who continue to support and discuss topics in computer chess with me

Selecting a binary

v1 is significantly slower than v2, v3, and v4(about 15x), and should only be used when absolutely necessary. In general, higher levels are faster.
If you download a binary and it crashes(closes immediately or doesn't respond to commands), it likely doesn't work, and you should download a different binary.

Full Changelog: v7.0...v8.0

Release notes

This version of Sirius is, unfortunately, no longer using just piece square tables for the evaluation function. It is, however, significantly stronger than Sirius 6.0 in self-play. Major new features include a more sophisticated handcrafted evaluation function tuned with my tuner, and a few major search features

Changelog

A few minor refactors
Many new evaluation features
- Mobility
- Threats
- Pawn structure
  - Passed pawns, isolated pawns, etc.
- King safety
  - Pawn shield/storm
  - Safe checks from enemy pieces
  - Attacks to squares near kings
- Rooks on open/semi-open files
- Knight outposts
- Bishop pair
- Tempo bonus
A few search changes
- Static evaluation correction history
- Capture history
- Quiescence search futility pruning
- Threat history
- Some other tweaks
Complete LTC SPSA tune of the entire search

Self-play against Sirius v6.0

Elo   | 269.79 +- 7.38 (95%)
Conf  | 8.0+0.08s Threads=1 Hash=16MB
Games | N: 10000 W: 7189 L: 682 D: 2129
Penta | [29, 163, 708, 1472, 2628]

Elo   | 306.84 +- 10.27 (95%)
Conf  | 40.0+0.40s Threads=1 Hash=64MB
Games | N: 5000 W: 3715 L: 175 D: 1110
Penta | [0, 41, 251, 835, 1373]

Estimated 3200 rating

Credits

Thanks to members in the engine dev community who continue to support me
Special thanks to Andy Grant for creating OpenBench

Selecting a binary

v1 is significantly slower than v2, v3, and v4(about 15x), and should only be used when absolutely necessary. In general, higher levels are faster.
If you download a binary and it crashes(closes immediately or doesn't respond to commands), it likely doesn't work, and you should download a different binary.

Full Changelog: v6.0...v7.0

Thanks to dio from cegt for helping with finding the bug. Sirius had a bug where it would output 2 spaces between "score" and "cp" in the uci output, and Arena is unable to parse this and outputs a score of 0. This release contains the fixed binaries without the double space output. It is functionally equivalent to the normal 6.0 release in every other way

Changelog

Lots of refactors
Lots of fixes
Much better time management
- soft time management
- node count time management
A new set of piece square tables
Improvements to various pruning techniques
The improving heuristic
- reverse futility pruning
- late move pruning
- late move reductions
Better history implementation
- continuation history
3 fold pvs/lmr
Lots of performance improvements
- pseudo legal movegen
- Better tt lookup
Added history pruning
Added internal iterative reductions
Adjust static evaluation using the tt score
Lazy SMP
Other tweaks and improvements

Full changelog: v5.0...v6.0

Release Notes

This is the final version where Sirius is piece-square table only. This release brings many improvements to the search, a completely new set of piece square tables, and tons of bug fixes and performance improvements. Sirius-v6.0 is around ~370 elo stronger than the previous version in self-play, and is an estimated ~3050 elo.

Self-play against sirius-v5.0

Score of sirius-6.0 vs sirius-5.0: 3293 - 101 - 641  [0.896] 4035
...      sirius-6.0 playing White: 1693 - 39 - 286  [0.910] 2018
...      sirius-6.0 playing Black: 1600 - 62 - 355  [0.881] 2017
...      White vs Black: 1755 - 1639 - 641  [0.514] 4035
Elo difference: 373.3 +/- 13.3, LOS: 100.0 %, DrawRatio: 15.9 %
tc: 8+0.08

Score of sirius-6.0 vs sirius-5.0: 1397 - 28 - 325  [0.891] 1750
...      sirius-6.0 playing White: 746 - 8 - 121  [0.922] 875
...      sirius-6.0 playing Black: 651 - 20 - 204  [0.861] 875
...      White vs Black: 766 - 659 - 325  [0.531] 1750
Elo difference: 365.2 +/- 18.9, LOS: 100.0 %, DrawRatio: 18.6 %
tc: 60+0.6s

Against PeSTO

Comparison against PeSTO, the current(previous?) strongest pst-only engine

Score of sirius-6.0 vs PeSTO: 784 - 629 - 1347  [0.528] 2760
...      sirius-6.0 playing White: 410 - 290 - 680  [0.543] 1380
...      sirius-6.0 playing Black: 374 - 339 - 667  [0.513] 1380
...      White vs Black: 749 - 664 - 1347  [0.515] 2760
Elo difference: 19.5 +/- 9.3, LOS: 100.0 %, DrawRatio: 48.8 %
tc: 20+0.2

Rating estimates

Rank Name                          Elo     +/-   Games    Wins  Losses   Draws   Points   Score    Draw   White   Black
   0 sirius-6.0                     -9       9    4500    1600    1720    1180   2190.0   48.7%   26.2%   50.4%   46.9%
   1 altair-5.0.0                  176      28     500     300      66     134    367.0   73.4%   26.8%   74.0%   72.8%
   2 stash-27.0                     15      25     500     183     162     155    260.5   52.1%   31.0%   53.2%   51.0%
   3 4ku-4.0                        12      25     500     183     166     151    258.5   51.7%   30.2%   52.4%   51.0%
   4 midnight-6                     11      26     500     188     172     140    258.0   51.6%   28.0%   52.0%   51.2%
   5 polaris-1.8.1                   0      26     500     176     176     148    250.0   50.0%   29.6%   51.2%   48.8%
   6 akimbo-0.5.0                   -7      27     500     188     198     114    245.0   49.0%   22.8%   53.2%   44.8%
   7 peacekeeper-1.60              -12      26     500     179     196     125    241.5   48.3%   25.0%   49.8%   46.8%
   8 pedantic-0.6.0                -38      26     500     159     214     127    222.5   44.5%   25.4%   50.2%   38.8%
   9 princhess-0.15.1              -60      28     500     164     250      86    207.0   41.4%   17.2%   41.8%   41.0%
tc: 8+0.08

All tests done on 8moves_v3.epd

Credits

Huge thanks to Ciekce, JW, Alex2262, and Cj5716 along with many others in the engine dev community, who have continued to provide valuable support and insight on the development of Sirius

Selecting a binary

v1 is significantly slower than v2, v3, and v4(about 3x), and should only be used when absolutely necessary. In general, higher levels are faster.
If you download a binary and it crashes(closes immediately or doesn't respond to commands), it likely doesn't work, and you should download a different binary.

Release notes

This release brings about a ton of strength improvements regarding various search changes, bringing Sirius to around ~2700 elo. In self-play, Sirius v5.0 is over 400 elo stronger than Sirius v4.0 with a balanced book. This release also fixes many bugs and crashes related to the uci implementation in the previous version. The evaluation has not changed significantly, but has been tuned more effectively.

Self Play against v4.0

8+0.08 STC with 8moves_v3.epd

Score of Sirius-5.0 vs Sirius-4.0: 642 - 32 - 77  [0.906] 751
...      Sirius-5.0 playing White: 328 - 16 - 32  [0.915] 376
...      Sirius-5.0 playing Black: 314 - 16 - 45  [0.897] 375
...      White vs Black: 344 - 330 - 77  [0.509] 751
Elo difference: 393.9 +/- 35.9, LOS: 100.0 %, DrawRatio: 10.3 %

60+0.6 LTC with 8moves_v3.epd

Score of Sirius-5.0 vs Sirius-4.0: 618 - 14 - 68  [0.931] 700
...      Sirius-5.0 playing White: 309 - 8 - 33  [0.930] 350
...      Sirius-5.0 playing Black: 309 - 6 - 35  [0.933] 350
...      White vs Black: 315 - 317 - 68  [0.499] 700
Elo difference: 453.2 +/- 40.6, LOS: 100.0 %, DrawRatio: 9.7 %

Changelog

Tons of bug fixes, improvements, and tweaks
Completely reworked UCI implementation
Large refactor
Switched to asymmetric piece square tables
Switched to fail-soft alpha beta pruning
Fixed PVS/LMR implementation
Improved futility pruning implementation
Added reverse futility pruning
Improved late move reductions
Added history malus
Added bench
Added SEE pruning
Added history gravity
Added SEE move ordering
Improved tt replacement scheme
Added tt cutoffs in quiescence search
Added late move pruning
Added packed evaluation

Full changelog v4.0...v5.0

Credits

Huge thanks to Ciekce, JW, and Alex2262, along with many others in the engine dev community. Their insight and experience with engine dev has been invaluable for the development of Sirius

Selecting a binary

v1 is significantly slower than v2, v3, and v4(about 3x), so it should only be used on very old CPUs. In general, higher levels are faster.
If you download a binary and it crashes(closes immediately or doesn't respond to commands), it likely doesn't work, and you should download a different binary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release notes

Changelog

Self-play against Sirius v8.0

Elo Estimates

Credits

Selecting a binary

Contributors

Uh oh!

Release notes

Changelog

Self-play against Sirius v7.0

Credits

Selecting a binary

Uh oh!

Release notes

Changelog

Self-play against Sirius v6.0

Credits

Selecting a binary

Uh oh!

Uh oh!

Changelog

Release Notes

Self-play against sirius-v5.0

Against PeSTO

Rating estimates

Credits

Selecting a binary

Uh oh!

Release notes

Self Play against v4.0

Changelog

Credits

Selecting a binary

Uh oh!

Releases: mcthouacbb/Sirius

Sirius v9.0

Release notes

Changelog

Self-play against Sirius v8.0

Elo Estimates

Credits

Selecting a binary

Contributors

Uh oh!

Sirius v8.0

Release notes

Changelog

Self-play against Sirius v7.0

Credits

Selecting a binary

Uh oh!

v7.0

Release notes

Changelog

Self-play against Sirius v6.0

Credits

Selecting a binary

Uh oh!

Sirius-v6.0 output bugfix

Uh oh!

v6.0

Changelog

Release Notes

Self-play against sirius-v5.0

Against PeSTO

Rating estimates

Credits

Selecting a binary

Uh oh!

Sirius 5.0

Release notes

Self Play against v4.0

Changelog

Credits

Selecting a binary

Uh oh!