Skip to content

Conversation

ojwb
Copy link
Member

@ojwb ojwb commented Mar 3, 2022

Previously code in the SWIG tool didn't handle allocation failures
well. Most places didn't check for NULL return from
malloc()/realloc()/calloc() at all, typically resulting in undefined
behaviour, and some places used assert() to check for a NULL return
(which is a misuse of assert() and such checks disappear if built with
NDEBUG defined leaving us back with undefined behaviour).

All C allocations are now done via wrapper functions (Malloc(),
Realloc() and Calloc()) which emit and error and exit with non-zero
status on failure, so a non-NULL return can be relied upon.

Fixes #1901.

Previously code in the SWIG tool didn't handle allocation failures
well.  Most places didn't check for NULL return from
malloc()/realloc()/calloc() at all, typically resulting in undefined
behaviour, and some places used assert() to check for a NULL return
(which is a misuse of assert() and such checks disappear if built with
NDEBUG defined leaving us back with undefined behaviour).

All C allocations are now done via wrapper functions (Malloc(),
Realloc() and Calloc()) which emit and error and exit with non-zero
status on failure, so a non-NULL return can be relied upon.

Fixes swig#1901.
@ojwb
Copy link
Member Author

ojwb commented Mar 3, 2022

Possibly we should try to call SWIG_exit() instead of exit() to at least try to delete any files that were being created.

I haven't tried that for now, mostly because that doesn't currently happen in any of the cases I've changed so my patch doesn't make that aspect worse at all.

It's a bit awkward to do if we carry on with what seems to be the current assumption that DOH is a layer below SWIG which might get used elsewhere - I think we'd need to add a way to register a function with DOH which it would call on out of memory conditions.

Perhaps a better alternative would be to register a cleanup function via atexit(). The awkward part there is that the cleanup function doesn't get told what the exit status is, but we could keep SWIG_exit() but now have it set a "clean exit" flag (which defaults false in case exit isn't via SWIG_exit()) and the cleanup function can then check that to decide whether to remove files.

@wsfulton
Copy link
Member

wsfulton commented Mar 3, 2022

Nice tidyup @ojwb.

How about adding an exit function in DOH, so near the startup, SWIG registers SWIG_exit as the exit function for DOH to call in the event of an error. If the register function has not been called, then default to exit().

@ojwb
Copy link
Member Author

ojwb commented Mar 3, 2022

I can take a look at that, but I'll get this merged first as it's better or the same in all ways than the current code and touches quite a lot of places and so liable to conflict if left unapplied for long.

I also think most if not all of the calls to abort(); should change to SWIG_exit(EXIT_FAILURE); (with an error message output first in cases where there isn't one already) so that generated files get removed.

The developer docs should really mention "use Malloc(x) instead of malloc(x)", use SWIG_exit(x) instead of exit(x), etc, though I suspect people often don't look at them so I'm also going to look at using #pragma GCC poison on identifiers like malloc so that patches which try to add new uses will fail CI.

@ojwb ojwb merged commit e38847f into swig:master Mar 3, 2022
@ojwb ojwb deleted the fail-cleanly-on-alloc-failure branch March 3, 2022 22:47
@ojwb
Copy link
Member Author

ojwb commented Mar 5, 2022

55377bd adds DOH Exit() and SetExitHandler() functions (and uses them everywhere).

@ojwb
Copy link
Member Author

ojwb commented Mar 6, 2022

735732d replaces calls to abort() with Exit(EXIT_FAILURE) so that output files get removed.

@ojwb
Copy link
Member Author

ojwb commented Mar 6, 2022

747a51f uses #pragma GCC poison to prevent direct uses of malloc(), realloc(), calloc(), free() and exit() being introduced. There's an awkward issue with Bison's template which check defined malloc which triggers GCC's poisoned id checks, so in parser.y malloc and free are not poisoned. That's probably not a likely place to see them introduced (there are currently no explicit calls to Malloc() or Free() in that file), but we could post-process the generated parser.c if we wanted full identifier poisoning everywhere.

I think that completes the follow-on work here. The only other thing I wondered is whether it was worth having our own Assert(x) macro which was like the standard assert(x) but called Exit() on failure so that generated files get removed if an assertion fails, but if that's really a concern then I'd argue we're misusing assert().

ojwb added a commit that referenced this pull request Mar 7, 2022
@wsfulton
Copy link
Member

wsfulton commented Mar 8, 2022

In 55377bd, the error count has been replaced by 1. I'm intrigued how it is possible that an int on most platforms will return 0 if the count is 256? How about keeping the error count with a ceiling of whatever the max value that will work if this is a problem?

@wsfulton
Copy link
Member

wsfulton commented Mar 8, 2022

Sorry, I only just saw your comment @ojwb:

Also some error statuses have special meanings e.g.
those defined by <sysexits.h>.

I wasn't aware of anything following a convention other than exiting with zero or non-zero. So new behaviour seems reasonable.

@ojwb
Copy link
Member Author

ojwb commented Mar 8, 2022

The full background is that if you pass an int to exit() (or main's return) but at least on POSIX platforms:

The value of status may be 0, EXIT_SUCCESS, EXIT_FAILURE, [CX] [Option Start] or any other value, though only the least significant 8 bits (that is, status & 0377) shall be available from wait() and waitpid(); the full value shall be available from waitid() and in the siginfo_t passed to a signal handler for SIGCHLD.

The shell exit status at least uses the traditional wait() value:

$ cat exit.c
#include <stdlib.h>
int main() { exit(256); }
$ gcc exit.c
$ ./a.out
$ echo $?
0

Seems Microsoft copied this feature as https://docs.microsoft.com/en-us/previous-versions/6wdz5232(v=vs.140) says:

the low-order byte of status is made available to the host environment or waiting calling process, if one exists, after the process exits

There's actually an additional complication as a process which is killed by a signal traditionally exits with status 128+signal_number:

$ cat
^C
$ echo $?
130

POSIX says "Implementations are encouraged to choose exit values greater than 256 to indicate programs that terminate by a signal so that the exit status cannot be confused with an exit status generated by a normal termination" but it seems that's not caught on - I guess because it risks breaking assumptions in existing working code.

So in practice values from 0 to 127 seem to be pretty portably but all C++ actually guarantees about exit() is:

If exit_code is 0 or EXIT_SUCCESS, an implementation-defined status indicating successful termination is returned. If exit_code is EXIT_FAILURE, an implementation-defined status indicating unsuccessful termination is returned. In other cases implementation-defined status value is returned.

@wsfulton
Copy link
Member

Hi @ojwb, I had some problems with the gcc poison macros when building SWIG and so have commented this bit out. See 7785377. I think it needs a rethink making sure the poison macros only apply to SWIG code and not external/system/library headers. Sorry, I've just got too much to do to have a go at fixing myself.

@ojwb
Copy link
Member Author

ojwb commented Mar 13, 2022

Sorry, I didn't really consider this possibility but I can see the issue now. Unfortunately I don't think there's a way to say "poison but not in system headers" so we'd have to very carefully restructure the code such that the poisoning always happens after any non-SWIG headers have been included just in case a system/external header tries to reference one of these symbols. I think that's just going to end up brittle in the face of future changes though.

So probably the way to go to help prevent direct calls to these functions being added in future changes is to have a CI build with these pragmas on, since it seems fine with modern glibc. If that proves problematic too we can rethink or give up the idea, but I think some sort of automatic enforcement of this would be better than relying on reviewers spotting attempts to introduce direct calls to these functions.

Disabling for now definitely makes sense. I'll take a look.

ojwb added a commit that referenced this pull request Mar 14, 2022
It seems too brittle to enable by default as we'd have to avoid
including any system headers after doh.h, which is hard to enforce,
but just having it enabled for one CI job should avoid uses of the
poisoned symbols from being accidentally introduced.

See #2223
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

swig 4.0.2 assertion failed on OpenABM-Covid19 project
2 participants