Skip to content

scheduler test causing pthread_cond_timedwait: Invalid argument error  #18227

@amitiuttarwar

Description

@amitiuttarwar

Problem

In #18037 I introduced code to mock the scheduler from functional tests. I added a scheduler#MockForward method & associated scheduler_tests#mockforward unit test. The unit test caused sporadic failures on one of the Travis machines & failures on Bitcoin builds.

After a couple attempts at small fixes [1] [2], we decided to disable the test as we investigate the underlying issue.

Failures

Some examples of failing builds

Travis: [1] [2]
Note that it is always the travis machine with x86_64 Linux [GOAL: install] [bionic] [no wallet]

Bitcoin Builds: [1] [2] [3]

The test always fails in the same way:

terminate called after throwing an instance of 'boost::wrapexcept<boost::condition_error>'
  what():  boost::condition_variable::do_wait_until failed in pthread_cond_timedwait: Invalid argument
unknown location(0): fatal error: in "scheduler_tests/mockforward": signal: SIGABRT (application abort requested)

Reproducing the issue

One of the issues that makes this really difficult to debug is the failure is hard to reproduce. I have not been able to reproduce it myself. In this section, I'm compiling the information from others that were able to reproduce.

  • @jonasschnelli found that enabling ccache meant consistent failure, and clearing ccache meant consistent success. [1] [2]

  • @MarcoFalke was able to reproduce on a single CPU instance, but unable to extract much debugging information. [1].

Relevant debugging information

The boost:condition_variable triggers the SIGABRT as a result of the pthread_cond_timedwait returning an Invalid argument, or EINVAL error. According to the docs [1] & [2], this indicates that either The value specified by cond, mutex, or abstime is invalid or Different mutexes were supplied for concurrent pthread_cond_timedwait() or pthread_cond_wait() operations on the same condition variable.

Next steps

This bug doesn't seem urgent, but should be investigated and resolved. I'm opening this issue to track any relevant comments or findings.

Thank you to everyone who has jumped in to take a look & thanks in advance to anybody willing to help continue the investigation 🙏🏽

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions