Skip to content

Fix ndarray dimension signedness, fix ndarray length overflow, close #3519 #3521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

fangq
Copy link
Contributor

@fangq fangq commented Jun 6, 2022

hi @nlohmann, sorry for the new bug from my code.

I debugged this, and found there were two issues:

  1. the dimension vector elements were not written in the correct signedness
  2. when the total length overflew, the length resets to 0 and did not raise an error.

with this patch, I was able to fix both, an additional test was added to verify both, the fuzzer data now raise an "excessive ndarray size caused overflow" error.

One thing I want to verify with you is the error code, now I am using 113, let me know if there is a better code. if the length is non-zero and passed to sax functions, it raise a 408 error "excessive array size". but I am capturing this overflow before that.

@fangq fangq requested a review from nlohmann as a code owner June 6, 2022 17:42
@coveralls
Copy link

coveralls commented Jun 6, 2022

Coverage Status

Coverage decreased (-0.07%) to 99.933% when pulling a6946ca on NeuroJSON:issue3519 into b6d00d1 on nlohmann:develop.

@nlohmann
Copy link
Owner

nlohmann commented Jun 7, 2022

There is still an issue with MSVC 2019.

@falbrechtskirchinger
Copy link
Contributor

There is still an issue with MSVC 2019.

More accurately, 32bit builds, which should be a clue.

if (JSON_HEDLEY_UNLIKELY(!sax->number_integer(static_cast<number_integer_t>(i))))
if (result == 0) // because dim elements shall not have zeros, result = 0 means overflow happened
{
return sax->parse_error(chars_read, get_token_string(), parse_error::create(113, chars_read, exception_message(input_format, "excessive ndarray size caused overflow", "size"), nullptr));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change it to 408

@fangq
Copy link
Contributor Author

fangq commented Jun 7, 2022

More accurately, 32bit builds, which should be a clue.

I think this is what happened: when reading the uint64 number in the dimension vector on a 32 bit windows, the overflow happened earlier, i.e when reading the uint64 M... record to a 4-byte size_t dim element, it is stored as 0, and then the below code returned an empty array

for (auto i : dim) // test if any dimension in an ndarray is 0, if so, return a 1D empty container
{
if ( i == 0 )
{
result = 0;
return true;
}
}

I am wondering if there is a way for me to tell when an overflow happen when reading a large number on win32?

@falbrechtskirchinger
Copy link
Contributor

Because the size of std::size_t is dependent on architecture/platform, when assigning a value to std::size_t, you'll have to manually check if a given value fits.

Somehting like this should suffice:

#include <limits>
if(value > std::numeric_limits<std::size_t>::max()) throw ...;

@fangq
Copy link
Contributor Author

fangq commented Jun 7, 2022

Because the size of std::size_t is dependent on architecture/platform, when assigning a value to std::size_t, you'll have to manually check if a given value fits.

now I am wondering why this was not tested in other binary formats, for example UBJSON?

upon reading this part of the test, is it true that the test simply compares the overflowed value and allow overflow to happen on 32 bit system?

https://github.com/nlohmann/json/blob/develop/tests/src/unit-ubjson.cpp#L510-L558

@falbrechtskirchinger
Copy link
Contributor

Because the size of std::size_t is dependent on architecture/platform, when assigning a value to std::size_t, you'll have to manually check if a given value fits.

now I am wondering why this was not tested in other binary formats, for example UBJSON?

upon reading this part of the test, is it true that the test simply compares the overflowed value and allow overflow to happen on 32 bit system?

https://github.com/nlohmann/json/blob/develop/tests/src/unit-ubjson.cpp#L510-L558

The test uses number_integer_t and std::uint64_t both are 64bit types. The problem only arises when std::size_t is used, because that is most likely going to be an unsigned 32bit integer on 32bit systems.

Whenever a 64bit value is read and used as the size for a container, that's where things can go wrong.

I'm curious if the UBJSON code accounts for this or why it doesn't need to.

@fangq
Copy link
Contributor Author

fangq commented Jun 7, 2022

I added the below if (number > std::numeric_limits<std::size_t>::max()){...} block in the 'L' and 'M' marker handling in get_ubjson_size_value(). I also added a if(sizeof(size_t)==4) {...} else {...} in the test because there will be two different error messages. The test runs fine on my 64bit machine, but I am not entirely sure how this change will affect UBJSON's tests on win32. should I commit and test?

            case 'L':
            {
                std::int64_t number{};
                if (JSON_HEDLEY_UNLIKELY(!get_number(input_format, number)))
                {
                    return false;
                }
                if (number < 0)
                {
                    return sax->parse_error(chars_read, get_token_string(), parse_error::create(113, chars_read,
                                            exception_message(input_format, "count in an optimized container must be positive", "size"), nullptr));
                }
                if (number > std::numeric_limits<std::size_t>::max())
                {
                    return sax->parse_error(chars_read, get_token_string(), parse_error::create(408, chars_read,
                                            exception_message(input_format, "integer value overflow", "size"), nullptr));
                }
                result = static_cast<std::size_t>(number);
                return true;
            }

@falbrechtskirchinger
Copy link
Contributor

falbrechtskirchinger commented Jun 7, 2022

Depending on your compiler, you may be able to build 32bit executables by configuring the CMake project with -DCMAKE_CXX_FLAGS="-m32".

(Edit: On my system that does result in sizeof(std::size_t) == 4.)

Otherwise, just commit and run it through CI.

@fangq
Copy link
Contributor Author

fangq commented Jun 7, 2022

@falbrechtskirchinger, thanks, after installing cross-compilation packages, I was able to test it on -m32. Just commited the change, but realized that coverage will be a problem ...

@falbrechtskirchinger
Copy link
Contributor

/__w/json/json/include/nlohmann/detail/input/binary_reader.hpp:2082:28: error: result of comparison 'std::int64_t' (aka 'long') > 18446744073709551615 is always false [-Werror,-Wtautological-type-limit-compare]
                if (number > std::numeric_limits<std::size_t>::max())
                    ~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I figured that would be a problem but couldn't think of the proper warning flag when I tested. Let me see if I can come up with a better solution ...

@fangq fangq marked this pull request as draft June 7, 2022 19:00
@falbrechtskirchinger
Copy link
Contributor

I have a proof of concept. This might be overengineered.

@fangq I'll sleep on it and submit a PR against your branch tomorrow (for reference, it's past 9 pm here) if I can't come up with a less involved solution.

template<typename T, typename U, bool = (std::numeric_limits<U>::max() < std::numeric_limits<T>::max()), typename = std::enable_if_t<std::is_unsigned_v<T> && std::is_unsigned_v<U>>>
struct value_in_range_of_impl;

template<typename T, typename U>
struct value_in_range_of_impl<T, U, false>
{
    static constexpr bool test(U val) { return val < std::numeric_limits<T>::max(); }
};

template<typename T, typename U>
struct value_in_range_of_impl<T, U, true>
{
    static constexpr bool test(U /*val*/) { return true; }
};

template<typename T>
constexpr bool value_in_range_of_size_t(T val)
{
    return value_in_range_of_impl<std::size_t, T>::test(val);
}

@falbrechtskirchinger
Copy link
Contributor

falbrechtskirchinger commented Jun 8, 2022

I tried to open a PR and NeuroJSON/json does not appear in the list (I've opened a GH support ticket about that).

Meanwhile, you can git remote add my repository and git cherry-pick the commits from my branch issue3519-trait (https://github.com/falbrechtskirchinger/json/commits/issue3519-trait). Don't know about your git knowledge - happy to help if you need it.
(Edit: @fangq I've opened a new PR instead: #3523)

There's still a problem with the UBJSON unit test. I assume that's because the range check changes the exception type.

@falbrechtskirchinger
Copy link
Contributor

falbrechtskirchinger commented Jun 8, 2022

@nlohmann How would like to address coverage?

if (number > std::numeric_limits<std::size_t>::max())
{
    return sax->parse_error(chars_read, get_token_string(), parse_error::create(408, chars_read,
        exception_message(input_format, "integer value overflow", "size"), nullptr));
}

(Edit: This code was copied from the old coverage report.)
Are you okay with excluding this line, as it's only reachable on 32bit targets?

If that's acceptable, I might as well open another PR instead of having @fangq go through the trouble of cherry-picking from my branch.

@nlohmann
Copy link
Owner

nlohmann commented Jun 8, 2022

@falbrechtskirchinger I would rather have a test suite compiled with 32 bits that test exactly this issue.

@falbrechtskirchinger
Copy link
Contributor

How would that work with coverage/coveralls? I'm not super familiar with lcov/gcov. Is it possible to collect coverage information from multiple builds?

@nlohmann
Copy link
Owner

nlohmann commented Jun 8, 2022

How would that work with coverage/coveralls? I'm not super familiar with lcov/gcov. Is it possible to collect coverage information from multiple builds?

We collect coverage reports for anything executed by CTest. If we have another test binary that contains tests relevant for 32 bits and add it to the test suite, the results should appear in the coverage report automatically.

@falbrechtskirchinger
Copy link
Contributor

Ok, a separate test shouldn't be too difficult. I just need to figure out, when -m32 is available (probably via cmake_try_compile or whatever it's called).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants