Skip to content

libgap: Fix GC crashes on aarch64 #34701

@collares

Description

@collares

A refactor in #27946 introduced "unprotected" (not surrounded by GAP_Enter/GAP_Leave) GAP_ValueGlobalVariable calls. I believe this might be a GC hazard, because after updating to GAP 4.12.1 I started seeing aarch64 crashes on NixOS infrastructure such as:

#0  0x0000fffff79740e8 in wait4 ()
#1  0x0000fffff5dc6b78 in print_enhanced_backtrace ()
#2  0x0000fffff5dc8190 in sigdie ()
#3  0x0000fffff5dcb1c0 in cysigs_signal_handler ()
#4  0x0000fffff7ffb7cc in __kernel_rt_sigreturn ()
#5  0x0000ffff99a0bf28 in ConvString ()
#6  0x0000000000000000 in ?? ()
#7  0x0000000000000000 in ?? ()
#8  0x0000000000000000 in ?? ()
#9  0x0000ffff99989930 in Pr ()
#10 0x0000ffff9998aa18 in CloseOutput ()
#11 0x0000ffff99884828 in capture_stdout () at /build/sage-src-9.7/pkgs/sagemath-standard/sage/libs/gap/element.pyx:154
...

I also see cases where capture_stdout throws errors such as sage.libs.gap.util.GAPError: Error, Length: <list> must be a list (not the integer 255) and then crashes. Both types of errors are fixed by this ticket.

Note that I am nesting GAP_Enter/GAP_Leave calls because I didn't remove the preexisting calls inside capture_stdout. That's because I feared removing the innermost calls might create a new footgun (and I believe nested GAP_Enter/GAP_Leave calls are explicitly supported), but removing them should cause no problem. Removing them might even be preferable for performance reasons, I don't know.

This ticket's branch is based on #34391.

(Edit: I previously thought that holding onto a char* past the GAP_Enter/GAP_Leave blocks, as is done in GapElement's __str__ and _repr_ methods, could also cause GC hazards. It doesn't seem to be a problem in practice, though, and I am not qualified to tell if it's a problem in theory.)

Depends on #34391

CC: @embray @videlec @dimpase

Component: interfaces

Author: Mauricio Collares

Issue created by migration from https://trac.sagemath.org/ticket/34701

Edit: Removed branch. The code changes are now at #35114.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions