-
Notifications
You must be signed in to change notification settings - Fork 627
Implement Metal Backend for macOS #1039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2a9f3f0
to
26a488e
Compare
774bb60
to
152123a
Compare
The Metal backend is extracted from the CoreML backend. This removes all of the CoreML process and leaves Metal backend for maximizing compatibility to KataGo distributed training.
- Updated project definition to conditionally include languages (CXX, Swift) based on the USE_BACKEND variable.
- Updated the `processScoreValues` function to use `modelVersion` instead of a generic `version`. - Refactored score value data extraction with proper assertions based on model version. - Removed unused variables related to score values and cleaned up memory management for better efficiency and clarity.
Added `-Wno-cast-qual` and `-Wno-c11-extensions` to the CMAKE_CXX_FLAGS for AppleClang builds. This will suppress specific warnings related to casting and C11 features, providing a cleaner build output while maintaining existing warnings for other compilers.
This commit introduces a new section in the Compiling.md file that provides detailed instructions for compiling KataGo on macOS. It includes prerequisites, installation commands, and specific recommendations for backend configuration, ensuring users have a clear guide to successfully compile and run KataGo on Mac systems.
- Removed redundant model dimensions (modelXLen and modelYLen) as they were replaced with nnXLen and nnYLen. - Consolidated policy result element variables, simplifying their calculations and reducing memory footprint. - Removed unused modelPolicyResultChannels from InputBuffers structure.
152123a
to
0ceb90f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked over the code now. Thanks for writing this and getting this all tested! I left some questions as individual inline comments, and also have a few further overall questions:
-
Would it be possible to move the cpp/macos folder to cpp/external/macos and update any relevant references and paths to work properly with it there? That way, externally licensed code is better contained in one place to the cpp/external folder.
-
Since I've looked at the code now, from here forward, would it be possible to avoid rebase/force-pushing when updating the branch? That way, it's easier to review because I can better follow incremental diffs, without the commits of the entire branch seeming to change from Git's perspective.
-
I see that one of the commits pushed was to dehardcode the FP32 data type in places. However, am I correct that this does not go the full way to supporting FP16? Given that the testing for Metal so far does not include testing for FP16 and the amount of error it introduces for this backend, I would actually prefer that users cannot activate FP16 in the backend ("auto" should choose FP32 and explicit setting FP16 should be rejected), and that the generalization of data types cannot result in Metal somehow on its own deciding to use FP16 suddenly depending on the device. I wanted to confirm - is this the case right now?
Thanks!
/// This function applies the Mish activation function on the input tensor `x`. The Mish function is defined as | ||
/// x * tanh(Softplus(x)), where Softplus(x) is defined as log(1 + exp(min(x, 10.39))) if x < 10.39 and x otherwise. | ||
/// The threshold of softplus is modified to 10.39, which is different from the original 20. This is because | ||
/// exp(10.39) = 32532.666936 < 32767.0 < 65504.0, so the result of exp(10.39) can be represented by float16. If the threshold | ||
/// of softplus is 20, the result of exp(20) is 485165195.40979004, which is out of range of float16. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where in the code is this threshold of 10.39 or 20 being applied?
The current logic seems to just unconditionally apply exp and log. How does it avoid floating point overflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A threshold value of 10.39 or 20 was used in a previous revision, during which FP16 was supported in the Metal backend. In the current implementation, however, the logic unconditionally applies exp and log operations for FP32 data. Because the KataGo model normalizes inputs toward one, removing the threshold from the mish activation does not result in significant error. On the contrary, this change has led to a slight performance gain in the Metal backend, while maintaining very low numerical error.
The code comments regarding the threshold are retained above the mish function, as I may revisit FP16 support in the future to evaluate its impact on throughput and numerical accuracy in the Metal backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reintroduced the threshold in eda158f.
cpp/neuralnet/metalbackend.h
Outdated
* @brief The x length of the CoreML model. | ||
*/ | ||
int modelXLen = COMPILE_MAX_BOARD_LEN; | ||
|
||
/** | ||
* @brief The y length of the CoreML model. | ||
*/ | ||
int modelYLen = COMPILE_MAX_BOARD_LEN; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these unused now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for finding these unused variables. I have remove them from the source code 86ab38f.
* @param W Width. | ||
* @param inputsUseNHWC Flag indicating if the input data is currently in NHWC format. | ||
*/ | ||
void MetalProcess::convertNCHW( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer the previous version of the code, where there was:
if(inputsUseNHWC != false) {
throw StringError("Metal backend: inputsUseNHWC = false required, other configurations not supported");
}
It means that we don't have the additional complexity of conversion here and it reduces the number of configurations to test. The KataGo engine above this is perfectly capable of specifying the input directly in NCHW format so that conversion is not needed, and there is no meaningful end-user functionality that is enabled by allowing KataGo's engine to supply a less efficient format and then to have the backend run an additional routine to convert it.
To make the tests still easy to run for the ones that try to set NHWC, the only additional thing that needs to be added I think is to tests/testcommon.cpp, in TestCommon::overrideForBackends
, an additional case:
#elif defined(USE_METAL_BACKEND)
if(inputsNHWC != false) {
cout << "Backend is Metal, ignoring args and forcing inputsNHWC=false" << endl;
inputsNHWC = false;
}
if(useNHWC != false) {
cout << "Backend is Metal, ignoring args and forcing useNHWC=false" << endl;
useNHWC = false;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm cautious about adding a condition for the Metal backend in tests/testcommon.cpp
. Although it would be logically valid, doing so introduces Metal-specific logic into a shared test file that I don’t primarily maintain. My main responsibility lies in metalbackend.{h,cpp,swift}
, and I aim to keep changes outside of those files minimal. This approach helps keep the Metal backend modular for my maintenance.
Regarding test coverage, I can run your test cases to ensure both NCHW and NHWC formats are exercised. From my perspective, this should be sufficient and does not raise any concerns.
Additionally, since the Metal API natively supports the NHWC layout, the Metal backend has the potential to deliver better performance in future optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay cool. So let's leave this aspect alone for now.
There are actually an uncomfortable number of places in the current KataGo code outside of the backend code where inputsNHWC and possibly one or two other settings are hardcoded based on backend-specific knowledge as to the preferred orientation. So, I propose doing the following in a future version (not the coming release).
- I add a direct way in nninterface.h that all backends can report...
- Their default/preferred setting for this and any other similar parameters
- Which settings are supported, even if not preferred.
- Metal can then use this interface to report to the backend-agonistic part of KataGo that inputsNCHW is the preferred format (skipping the conversion of the format). Optionally, you could also then simplify the code by removing the conversion code, because the same interface can be used to inform KataGo that NHWC is not supported, in a backend-general way, and it will be a reliable part of the contract that then KataGo will never pass that format down to the backend.
- If later you implement optimized support for NHWC as the faster format (which presumably will handle it directly rather than use the conversion logic), it will be easy to just change what the backend returns in that interface to report that NHWC is now supported and is the preferred format instead.
- In exactly the same way, OpenCL, TensorRT will also use this interface, and so everywhere we can remove the backend-specific hacks about input format.
Does that sound reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds great!
The
I will merge changes from
Even though I dehardcode the FP32 data type in places, this does not support FP16 at this moment. In other words, FP16 is impossible to be enabled in this Metal backend at this moment. |
This Metal backend does not support FP16 at this moment, so remove FP16 from configuration files for clarification.
This PR introduces a GPU backend for macOS using Metal, enabling full support for distributed training in KataGo. This addition benefits macOS users who wish to contribute to KataGo training and self-play.
Verification
The Metal backend has been tested using the
testgpuerror
command, with results demonstrating minimal floating-point error compared to the reference implementation: