-
Notifications
You must be signed in to change notification settings - Fork 4
PyPI package for Khoca #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
bd0fb31
to
bc56529
Compare
233d851
to
d5fc92d
Compare
0823478
to
2b01541
Compare
f5be7c9
to
22ce636
Compare
As of May 2025I've made some progress over the past three months:
Unlike the now-fixed segfault issue, this problem also occurs in binary distributions. Therefore, it's likely not due to the shared Pari stack. Any help in fixing it is welcome! When building binary Windows wheels natively, the workflow hangs at the linker:
I think the missing symbols must be contained in
and what's specified in the linker options. So I don't understand the problem here. Any help with this is also welcome! |
How can I reproduce the
Also, in what platform did you see the |
Unless you set
Yes! I saw it under Linux Mint 21.1 on a ThinkPad with an i5 and under Ubuntu 20.04 running under WSL2 on a ThinkPad with an i7. |
Sorry for the late reply. As can be seen in the stack trace, the SystemError was triggered because of the following statement in
where When I checked where
As you can see, there seem to be race conditions when multiple threads involving at least one
So it seems that it is unsafe to run things concurrently in Sage that involves GMP integers, even though they are operating on different integers because they might mess up the global counters like I am not an expert in this area, but it seems to be an unfortunate side effect of cysignals design. A non-ideal workaround here is to protect all of your access to GMP from running concurrently. On sage side, perhaps we should protect |
I am not at all an expert on cysignals, so my opinion here might not make any sense at all - but I am wondering if it even remotely reasonable to consider adding some cysignals methods that are more "local" (creating a local counters instead of using global counters) on cysignals side (I am throwing this thought because I added @dimpase @jdemeyer here)... It just seems scary that a really different part of sage can affect others due to such global state. |
No problem, there's no rush. Thank you for investigating the issue!
I probably haven't figured out everything you've discovered so far: Does this mean that the |
Now I think about it again, it is also possible that the problem happened before khoca, I will try to do a deeper test to see if it is actually started in khoca in this case. But there is also a possibility that it was corrupted in khoca (which I meant above that I described in more detail below). As you correctly mentioned, In the code above, it was configured so that the GMP integers allocation will be done through And the problem is not necessarily that And for cypari2: note that this problem occured not necessarily because cypari2 is running concurrently with khoca threads. It just happened that when we are reaching cypari2 code (which might run way after those concurrent threads), the And note that, (1) gives a different behavior compare with if you run khoca outside sage. So it is a sage-specific problem. But as I mentioned in the beginning, it is also entirely possible that the race condition actually happened before khoca and then |
@ptrrsn, thank you for the very clear explanation. I now fully understand your arguments and agree with your conclusion that these Cysignals globals like For now I'm thinking about protecting Khoca from the swapped GMP memory allocation functions. I've already included code to protect Cypari2's Pari stack from the Pari stack used by Khoca (see, for example, pari_backup and pari_rollback). This is only required if Khoca was built from source in the Sage environment. Otherwise, Khoca uses the Pari stack from its own shared library, which amazingly also includes
Do you think it makes sense to use a similar construction for the GMP memory allocation functions, i.e. to write the function pointers to a backup when entering Khoca and restore them when leaving? |
Hmm, that's interesting. But I am still a bit hesitant to make an actual suggestion before finding the exact reason for the |
Continuing the story, it appears that the root cause was really the race condition during the Gaussian Thread execution, as I elaborated above. The I will make a PR to fix that in cysignals so you don't need to change your code. That will also fix potential race conditions in other parts of sage. |
Just to double check, can you post the content of: |
I had a PR in cysignals that will fix this problem: sagemath/cysignals#230 |
Great! Looks like very good work! So the issue was caused by a regression that occurred eight months ago when switching
For the sake of completeness, I confirm your assumption:
|
Thank you for providing the header content. And I agree that the problem seems to be introduced by regression from the meson migration. |
@LLewark, I suggest merging this PR so that the PyPI package aligns with the code in your repository's master branch. According to recent comments above, there is no longer a serious issue here. Furthermore, there are successful CI runs that ensure there are no regressions regarding existing functionality, i.e., local builds from source via the Makefile (see the Run Tests CI runs) and the creation of Docker images on DockerHub (see the Publish Docker CI runs). Furthermore, I think you should also merge PR #2, as it fixes a mathematical error. |
sagemathgh-40081: Integration of a new optional package for Khovanov homology <!-- ^ Please provide a concise and informative title. --> <!-- ^ Don't put issue numbers in the title, do this in the PR description below. --> <!-- ^ For example, instead of "Fixes sagemath#12345" use "Introduce new method to calculate 1 + 2". --> <!-- v Describe your changes below in detail. --> <!-- v Why is this change required? What problem does it solve? --> <!-- v If this PR resolves an open issue, please link to it here. For example, "Fixes sagemath#12345". --> Sage has a native implementation of Khovanov homology. Unfortunately, it's not very performant. For knots and links with more than six crossings, the calculation takes more than a second on a ThinkPad with an i7 (i7-1165G7 @ 2.80GHz) processor (running Ubuntu under WSL2). For a knot with nine crossings, it takes more than a minute, and for 10 crossings, the calculation breaks (after about an hour) with a `Pari stack overflow`: ``` sage: K7 = Knots().from_table(7,1) ....: K8 = Knots().from_table(8,1) ....: K9 = Knots().from_table(9,1) ....: K10 = Knots().from_table(10,1) sage: %time K7.khovanov_homology() CPU times: user 679 ms, sys: 0 ns, total: 679 ms Wall time: 679 ms {-21: {-7: Z}, -19: {-7: 0, -6: C2}, -17: {-7: 0, -6: Z, -5: Z}, -15: {-7: 0, -6: 0, -5: 0, -4: C2}, -13: {-7: 0, -6: 0, -5: 0, -4: Z, -3: Z}, -11: {-7: 0, -6: 0, -5: 0, -4: 0, -3: 0, -2: C2}, -9: {-7: 0, -6: 0, -5: 0, -4: 0, -3: 0, -2: Z, -1: 0, 0: 0}, -7: {-7: 0, -6: 0, -5: 0, -4: 0, -3: 0, -2: 0, -1: 0, 0: Z}, -5: {0: Z}} sage: %time K8.khovanov_homology() CPU times: user 22.9 s, sys: 74.1 ms, total: 23 s Wall time: 23 s {-13: {-6: Z}, -11: {-6: 0, -5: C2, -4: 0, -3: 0}, -9: {-6: 0, -5: Z, -4: Z, -3: 0, -2: 0, -1: 0}, -7: {-6: 0, -5: 0, -4: 0, -3: Z x C2, -2: 0, -1: 0, 0: 0, 1: 0}, -5: {-6: 0, -5: 0, -4: 0, -3: Z, -2: Z x C2, -1: 0, 0: 0, 1: 0, 2: 0}, -3: {-6: 0, -5: 0, -4: 0, -3: 0, -2: Z, -1: Z x C2, 0: 0, 1: 0, 2: 0, 3: 0}, -1: {-4: 0, -3: 0, -2: 0, -1: Z, 0: Z x C2, 1: 0, 2: 0, 3: 0, 4: 0}, 1: {-3: 0, -2: 0, -1: 0, 0: Z x Z, 1: Z, 2: 0, 3: 0, 4: 0}, 3: {-1: 0, 0: 0, 1: 0, 2: C2, 3: 0, 4: 0}, 5: {1: 0, 2: Z, 3: 0, 4: 0}} sage: %time K9.khovanov_homology() CPU times: user 1min 23s, sys: 260 ms, total: 1min 23s Wall time: 1min 23s {-27: {-9: Z}, -25: {-9: 0, -8: C2}, -23: {-9: 0, -8: Z, -7: Z}, -21: {-9: 0, -8: 0, -7: 0, -6: C2}, -19: {-9: 0, -8: 0, -7: 0, -6: Z, -5: Z}, -17: {-9: 0, -8: 0, -7: 0, -6: 0, -5: 0, -4: C2}, -15: {-9: 0, -8: 0, -7: 0, -6: 0, -5: 0, -4: Z, -3: Z}, -13: {-9: 0, -8: 0, -7: 0, -6: 0, -5: 0, -4: 0, -3: 0, -2: C2}, -11: {-9: 0, -8: 0, -7: 0, -6: 0, -5: 0, -4: 0, -3: 0, -2: Z, -1: 0, 0: 0}, -9: {-9: 0, -8: 0, -7: 0, -6: 0, -5: 0, -4: 0, -3: 0, -2: 0, -1: 0, 0: Z}, -7: {0: Z}} sage: %time K10.khovanov_homology() Traceback (most recent call last): ... PariError: the PARI stack overflows (current size: 1073741824; maximum size: 1073741824) You can use pari.allocatemem() to change the stack size and try again ``` If we do the same with the optional package `Khoca` implemented in this PR, we get the following: The knot with 9 crossings takes less than half a second and the example with 10 crossings terminates after less than two seconds: ``` sage: %time K9.khovanov_homology(implementation='Khoca') CPU times: user 268 ms, sys: 44.7 ms, total: 313 ms Wall time: 293 ms {-27: {-9: Z}, -25: {-9: 0, -8: C2}, -23: {-9: 0, -8: Z, -7: Z}, -21: {-8: 0, -7: 0, -6: C2}, -19: {-7: 0, -6: Z, -5: Z}, -17: {-6: 0, -5: 0, -4: C2}, -15: {-5: 0, -4: Z, -3: Z}, -13: {-4: 0, -3: 0, -2: C2}, -11: {-3: 0, -2: Z}, -9: {0: Z}, -7: {0: Z}} sage: %time K10.khovanov_homology(implementation='Khoca') CPU times: user 1.55 s, sys: 123 ms, total: 1.68 s Wall time: 1.65 s {-17: {-8: Z}, -15: {-8: 0, -7: C2}, -13: {-8: 0, -7: Z, -6: Z}, -11: {-7: 0, -6: 0, -5: Z x C2}, -9: {-6: 0, -5: Z, -4: Z x C2}, -7: {-5: 0, -4: Z, -3: Z x C2}, -5: {-4: 0, -3: Z, -2: Z x C2}, -3: {-3: 0, -2: Z, -1: Z x C2}, -1: {-2: 0, -1: Z, 0: Z x C2}, 1: {-1: 0, 0: Z x Z, 1: Z}, 3: {0: 0, 1: 0, 2: C2}, 5: {1: 0, 2: Z}} ``` Even knots with 13 crossing only need a few seconds: ``` sage: K13 = KnotInfo.K13a_1.link() sage: %time K13.khovanov_homology(implementation='Khoca') CPU times: user 2.28 s, sys: 578 ms, total: 2.86 s Wall time: 2.6 s {-17: {-8: Z}, -15: {-8: 0, -7: Z x Z x Z x C2}, -13: {-8: 0, -7: Z, -6: Z^7 x C2 x C2 x C2}, -11: {-7: 0, -6: Z x Z x Z, -5: Z^13 x C2^7}, -9: {-6: 0, -5: Z^7, -4: Z^19 x C2^13}, -7: {-5: 0, -4: Z^13, -3: Z^24 x C2^19}, -5: {-4: 0, -3: Z^19, -2: Z^27 x C2^24}, -3: {-3: 0, -2: Z^24, -1: Z^25 x C2^27}, -1: {-2: 0, -1: Z^27, 0: Z^22 x C2^25}, 1: {-1: 0, 0: Z^26, 1: Z^16 x C2^21}, 3: {0: 0, 1: Z^21, 2: Z^9 x C2^16}, 5: {1: 0, 2: Z^16, 3: Z x Z x Z x Z x C2^9}, 7: {2: 0, 3: Z^9, 4: Z x C2 x C2 x C2 x C2}, 9: {3: 0, 4: Z x Z x Z x Z, 5: C2}, 11: {4: 0, 5: Z}} ``` The software behind the optional package is [Khoca](http://lewark.de/lukas/khoca.html). It was written in C++ by [Lukas Lewark](https://people.math.ethz.ch/~llewark/index.php). I packaged it on [PyPI](https://pypi.org/project/khoca/) for integration into Sage (see LLewark/khoca#3). Currently, there is still an issue with the PyPI package regarding integration with Sage, as both use the Pari library (see LLewark/khoca#3 (comment)). This occurs sporadically and triggers a system error that occurs at `cypari2/stack.pyx`. The error disappears when the last action is re- executed. I'm leaving this PR as a draft until the issue is resolved. ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [x] I have created tests covering the changes. - [x] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on. For example, --> <!-- - sagemath#12345: short description why this is a dependency --> <!-- - sagemath#34567: ... --> URL: sagemath#40081 Reported by: Sebastian Oehms Reviewer(s): Sebastian Oehms, Travis Scrimshaw
Purpose
The goal of this PR is to provide a stable PyPI package for the Khoca code. The underlying branch from November 2021 already resulted in a prerelease PyPI package that can be installed by
This goal is a first step towards integrating Khoca into the SageMath khovanov_homology method.
Open tasks
Unfortunately, this version has a memory management issues that need to be fixed before a stable release can be announced. In addition, the current PyPI package is installed from source, which takes more than 10 minutes. This should be upgraded to a manylinux binary distribution.
Memory management issues
Since I'm not really familiar with C++ templates, I need help regarding the memory issues. As far as I understand the problem, it is related to using global variables. Specifically, there must be a gap in the
ComplexStack
class destructor. As a naive attempt to fix it I add this line to the destructor, but to no avail. This bug shows up in the following lines because the Frobenius algebra doesn't remain resistant when the instantiation ofInteractiveCalculator
is repeated:Another issue occurs when using
InteractiveCalculator
in Sage (i.e. aftersage -pip install khoca
). This appears to be an interference with Pari functionality, but I'm not sure if it has the same origin:The segmentation error does not occur if you call the methods in reverse:
The intreference shows up in the inline function
zeromatcopy
of the Pari interface, explicitly here: