Fix random failure to resume playback position, #3939 #3952

low-batt · 2022-09-24T21:21:52Z

This commit will:

Add the package Swift Atomics
Add AtomicBool and AtomicBoolSemaphore property wrappers
Add isStopped and isShutdown properties to PlayerCore
Add playbackStopped and mpvHasShutdown methods to PlayerCore
Add waitForStopToFinish and waitForTermination methods to PlayerCore
Change savePlaybackPosition to not send a command to write the watch later file if playback has been stopped
Move code in MainWindowController method windowWillClose related to stopping playback to the PlayerCore stop method
Change MPVController to inform PlayerCore when the stop command finishes and when mpv shuts down
Change MPVController to remove IINA preference observers before sending a quit command to mpv
Add an isDisabled property to MenuController
Add disableAllMenus and disableAllMenuItems methods to MenuController
Change menuWillOpen to not update menus if menus are disabled
Add a disableAllCommands method to RemoteCommandController
Add isTerminated and terminateQueue properties to AppDelegate
Add a removeAllMenuItems method to AppDelegate
Change AppDelegate method applicationShouldHandleReopen to not permit reopening during termination
Change AppDelegate method applicationShouldTerminate to return terminateLater to Cocoa to delay termination until mpv has finished shutting down
Change applicationShouldTerminate to shutdown further user input and coordinate application shutdown using a background thread to wait for asynchronous mpv commands to complete

From a high level perspective these changes:

Stop further input from the user once application termination starts
Disallow reopening during termination
Wait for the asynchronous mpv stop command to complete before sending the mpv quit command
Wait for the asynchronous quit command to complete before allowing Cocoa to proceed with termination

These changes are needed because mpv may decide to write to the watch later file and then fail to store the playback position if the quit command is sent to mpv while a stop command is still executing. This breaks the "Resume last playback position" IINA feature. To correct this IINA must move to a controlled clean shutdown.

This change has been discussed with the author.
It implements / fixes issue Random failure to resume last playback position #3939.

Description:
Reviewers will need to decide if this approach is good enough for now, or if extensive changes should be made to provide a more deterministic shutdown.

low-batt · 2022-09-24T21:23:32Z

Currently IINA tells Cocoa to proceed with termination even though mpv is still potentially processing asynchronous commands. Frequently this breaks "Resume last playback position", making that feature unreliable. IINA must implement a controlled shutdown.

Cocoa usually terminates IINA before one of the many potential crashes occurs as a result of accessing mpv while it is quitting. Introducing changes to how IINA shuts down brings with it the risk that it makes these race conditions more likely to occur. For that reason changes in this area require a detailed review.

Ideally IINA's termination sequence should be fully deterministic. Reviewers should be aware that the changes in this PR do not guarantee a clean shutdown. Many more changes would be needed to do that. For example, shutdown can be initiated by mpv due to the user typing "q". When that happens IINA has no knowledge that mpv is shutting down until that is complete and IINA processes the shutdown event from mpv. That leaves a window where IINA can be accessing mpv while it is shutting down. I've not seen such a crash, but it may be possible to trigger one. To insure IINA does not access mpv once the quit command has been sent probably requires using a lock to single thread all access and always check to see if termination has been started. Doing that probably requires a good bit of refactoring to avoid accessing mpv in so many different places.

The proposed fix tries to avoid all the changes required for a fully deterministic solution.
It tries to shutdown IINA activity before initiating termination. However tasks may have already been queued and end up executing after mpv has been instructed to quit. Additional checks may need to be added to catch this.

The proposed changes are thread based, introducing a background thread used by AppDelegate to coordinate termination. Another approach would be to avoid the introduction of a thread and have AppDelegate listen for events and use a state machine to coordinate termination progress. Reviewers please let me know if you would prefer an event driven implementation over the proposed thread based one.

It is desirable to review and merge this PR soon (assuming the approach is not rejected) because the new AtomicBool property wrapper can be used to solve some of the problems reported by the thread sanitizer involving boolean flags.

This is what shutdown of IINA with a single window open looks like in the log with these changes:

22:03:58.281 [iina][d] App should terminate
22:03:58.281 [iina][d] Disabling all menus
22:03:58.281 [iina][d] Disabling remote commands
22:03:58.282 [iina][d] Closing all windows
22:03:58.282 [player0][d] Write watch later config
22:03:58.283 [player0][d] Stopping playback
22:03:58.286 [player0][d] Waiting for playback to stop
22:03:58.320 [player0][d] Playback has stopped
22:03:58.518 [player0][d] Waiting for player termination
22:03:58.521 [player0][d] Player has terminated
22:03:58.521 [iina][d] Proceeding with application termination
22:03:58.521 [iina][d] App will terminate

low-batt · 2022-09-25T01:00:06Z

I forgot to ask, is there anything that needs to be done for termination in regards to the plugin system?

I didn't see a way to shutdown the VM. Plugins running while mpv is shutting down could also cause problems.

This commit will: - Add the package Swift Atomics - Add AtomicBool and AtomicBoolSemaphore property wrappers - Add isStopped and isShutdown properties to PlayerCore - Add playbackStopped and mpvHasShutdown methods to PlayerCore - Add waitForStopToFinish and waitForTermination methods to PlayerCore - Change savePlaybackPosition to not send a command to write the watch later file if playback has been stopped - Move code in MainWindowController method windowWillClose related to stopping playback to the PlayerCore stop method - Change MPVController to inform PlayerCore when the stop command finishes and when mpv shuts down - Change MPVController to remove IINA preference observers before sending a quit command to mpv - Add an isDisabled property to MenuController - Add disableAllMenus and disableAllMenuItems methods to MenuController - Change menuWillOpen to not update menus if menus are disabled - Add a disableAllCommands method to RemoteCommandController - Add isTerminated and terminateQueue properties to AppDelegate - Add a removeAllMenuItems method to AppDelegate - Change AppDelegate method applicationShouldHandleReopen to not permit reopening during termination - Change AppDelegate method applicationShouldTerminate to return terminateLater to Cocoa to delay termination until mpv has finished shutting down - Change applicationShouldTerminate to shutdown further user input and coordinate application shutdown using a background thread to wait for asynchronous mpv commands to complete From a high level perspective these changes: - Stop further input from the user once application termination starts - Disallow reopening during termination - Wait for the asynchronous mpv stop command to complete before sending the mpv quit command - Wait for the asynchronous quit command to complete before allowing Cocoa to proceed with termination These changes are needed because mpv may decide to write to the watch later file and then fail to store the playback position if the quit command is sent to mpv while a stop command is still executing. This breaks the "Resume last playback position" IINA feature. To correct this IINA must move to a controlled clean shutdown.

saagarjha · 2022-09-26T05:54:12Z

iina.xcodeproj/project.pbxproj

@@ -4022,6 +4025,14 @@
 /* End XCConfigurationList section */

 /* Begin XCRemoteSwiftPackageReference section */
+		51D34D8A28DD49C1006C2C87 /* XCRemoteSwiftPackageReference "swift-atomics" */ = {
+			isa = XCRemoteSwiftPackageReference;
+			repositoryURL = "https://github.com/apple/swift-atomics.git";


I don't think this deserves bringing in a new dependency. A Bool behind a Lock should be fine.

Surprisingly the Swift Atomics README does not explain why you would want to use that package instead of just using mutexes. The reason is that using OS locks is several orders of magnitude slower than the atomic operations provided by Swift Atomics. That is due to this:

all atomic operations map directly to dedicated CPU instructions where available -- to the extent supported by llvm & Clang.

This post from the ArangoDB site shows the high cost of OS locks vs. atomic operations: Comparison: Lockless programming with atomics in C++ 11 vs. mutex and RW-locks

For this issue there is no need for optimal performance. But I was expecting atomics to be used to solve some of the various data races being reported by the ThreadSanitizer. Some of those involve operations that occur frequently. I've not dug into those issues yet. It may be that those could be solved by consistently queuing to a particular thread, rather than sharing between threads.

I can pull the use of Swift Atomics and change AtomicBoolean to use a lock. If we find a case were simple typed data is frequently accessed between two threads we could then decide if we should add use of Swift Atomics.

Does that sound good to you?

I worked as an iOS performance engineer in the past, so I'm familiar with atomics and how they are used :) In this case I just don't think it is worth bringing in a dependency. When Swift adds these natively we can switch over to them, and until then we can always drop down to C if necessary to access these.

Understood. Right now I'm thinking with the event driven approach and main thread dispatching access to the new flags can be restricted to the main thread, eliminating the need for this class in this case.

If it proves to be needed, I'll propose a version that uses the new Lock class and not Swift Atomics.

It would be good to get PR #3938 merged (logger improvements) as that one adds the Lock class.

saagarjha · 2022-09-26T05:56:56Z

iina/AtomicBoolSemaphore.swift

+///
+/// This class is expected to be used as follows:
+/// - This boolean is set to `false`, recreating the semaphore
+/// - A single thread waits for this boolean to be set to `true`


Blocking a thread on a semaphore is typically not very polite on Darwin. Can we try rethinking this API and having work be queued instead?

I too have been questioning my own decision to implement it this way. I don't think queuing would work. The correct way to avoid this is to toss the current thread based code in AppDelegate.applicationShouldTerminate in the trash and implement an event driven state machine to control shutdown. This is the offending code:

Thread Based Code:

terminateQueue.async { // When a window is closed it stops playback. That involves sending a stop command to mpv. // That command executes asynchronously. MUST wait for that command to complete. If the quit // command is sent to mpv while the stop command is still executing the commands can // interfere and cause the playback position to not be saved in the watch later file. See // issue #3939 for details. for pc in PlayerCore.playerCores { pc.waitForStopToFinish(timeout: timeout) } // Now that playback has been stopped terminate the players. for pc in PlayerCore.playerCores { pc.terminateMPV() } // Player termination involves sending a quit command to mpv. That command executes // asynchronously. Wait for all the players to finish terminating before proceeding. for pc in PlayerCore.playerCores { pc.waitForTermination(timeout: timeout) } // Tell Cocoa to proceed with termination. This has to be done on the main thread. // The main dispatch queue MUST NOT be used to avoid a deadlock when quitting is // initiated by mpv in response to the user pressing "q". When that happens // MPVController submits a task to the main queue that calls terminate. That call // blocks the main dispatch queue. if #available(macOS 10.12, *) { RunLoop.main.perform(inModes: [.common, .eventTracking, .modalPanel]) { self.proceedWithTermination() } } else { RunLoop.main.perform(#selector(self.proceedWithTermination), target: self, argument: nil, order: Int.min, modes: [.common]) } }

The advantage of this thread style solution is that it is very clear what the shutdown process is doing as the sequence is right there in the code. An event driven solution is definitely better, but frequently not immediately obvious as to how the code works.

For an event driven system I envision something like the following...

PlayerCore would post notifications when the stop and quit commands finish. Early on AppDelegate.applicationShouldTerminate would add observers for those actions. When an event indicating stop as completed the handler in AppDelegate would have PlayerCore send a quit command. When an event indicating quit has completed the in AppDelegate would poll all the cores to see if they all have finished terminating and if so then proceed with termination. Something like that.

Unless you have second thoughts I'm thinking I should go ahead and change the implementation to use such an event driven approach.

I take back my comments about waiting on a semaphore, the application is going to quit anyways and I don't really think it matters that this doesn't perform all that well. That said, I don't really like exposing this primitive as API, as we did with Lock (which I consider to be generally useful and desirable). Can we set up an ad-hoc dispatch group here and use it to control the forward progress of this thread?

Originally I was thinking an event driven solution would require more changes and be more harder to understand. After thinking more on how that could be done I'm convinced event driven is the proper approach. That plus smarter use of dispatching to the main thread should be less complex than the current approach. The event driven approach eliminates the need for AtomicBoolSemaphore. By dispatching to the main thread I should be able to eliminate the need for AtomicBool.

Busy today, but should be able to post such a change later this week.

Re: AppDelegate.applicationShouldTerminate():
If you haven't yet invested time in refactoring it again, consider this dispatch group example. You might want to create a DispatchQueue with attributes: .concurrent for it, because DispatchQueues are serial by default. It shouldn't be too hard to implement, and you can also call wait() with a timeout.

But you must guarantee that exactly one enter() is called for every leave() - an error will be thrown when leave() is called if { numEnters - numLeaves } is negative. (I have taken to using defer {} blocks around the leave() in case there's any possibility of exceptions occurring).

So you can probably do something like this:

// do the same for stop above // ... let terminateQueue = DispatchQueue(label: "terminate", attributes: .concurrent) let terminateGroup = DispatchGroup() for pc in PlayerCore.playerCores { terminateGroup.enter() pc.terminateMPV() terminateQueue.async { defer { terminateGroup.leave() } pc.waitForTermination() } } terminateGroup.wait(timeout: .now() + 10)

Note that in this design, you'll still need the semaphores so that you can wait on them.

low-batt · 2022-10-02T19:59:44Z

Closing this PR in favor of a fix that is event driven and uses main queue dispatching as suggested in review comments. This eliminates the need for waiting on a DispatchSemaphore and coordinating thread access to flags.

I will be opening a replacement PR shortly.

svobs

Whoops - didn't realize I had to submit it!

svobs · 2022-09-28T12:28:17Z

iina/PlayerCore.swift

    guard !isMpvTerminated else { return }
+    isMpvTerminated = true
    savePlaybackPosition()
    invalidateTimer()


A little worried about savePlaybackPosition() getting called here (or at all - I thought the mpv core was already handling the save?) But there are some holes here:

It's a good start that you are setting isMpvTerminated = true right after guard !isMpvTerminated else { return }. But what you really want is both of those in one atomic operation (e.g., getAndSet()), because having them apart leaves open the possibility of two threads operating at nearly exactly the same time, where ThreadA does the guard (false), then ThreadB does the guard (false), then both threads set the bool to false and continue executing the rest of the method in duplicate. This works in some languages but is generally considered an anti-pattern: see ("The Double-Checked Locking is Broken" Declaration)

You've got two independent flags, isStopped and isMpvTerminated, but you're mixing the logic between them. Although terminateMPV() first checks isMpvTerminated and then savePlaybackPosition() checks isStopped before saving the file, the latter flag doesn't get set until playbackStopped() is called. So a call to terminateMPV() which is quickly followed by a call to stop() can result in two calls to mpv.command(.writeWatchLaterConfig) in quick succession.

Also keep in mind that isStopped can change back from true to false, if fileStarted() is called. So you must consider the possibility that this flag may change values at any time and in any direction, which further complicates things.

And unlike isMpvTerminated, there is a long delay between when the stop is started and isStopped becomes true, which increases the chances for race conditions.

Rambling suggestions:

Try to decouple the stop logic shutdown logic. If savePlaybackPosition() must be called, make it private, and only called by stop(). Have terminateMPV() call stop() before it does its own work - where stop() either does nothing (if isStopped is true) or does a regular stop (if false).

The call to stop() should do nothing if either isStopped or isMpvTerminated is true.

The call to terminateMPV() should be a composite operation. It should (a) do nothing if isMpvTerminated is true, or (b) if it is false, call stop(), which either (c) does stop if isStopped is true, or (d) does nothing if it is false.

The method mpvHasShutdown() should set isStopped - it is outside its domain.

Ideally try to just put a lock (or DQ task block) which encompasses both the "get" for a flag in the beginning and the "set" for (possibly) each at the end. Generally, any time you go between locked blocks, you need to check all the state flags all over again.

Actually, I realized you actually have 2 more tasks, because you need to wait for both stop and termination... if you're doing a state machine, it would be a lot easier... Now this is getting really rambly.

Example state flow which includes a stop:
NOT_STARTED -> fileStarted() -> STARTED -> stop() -> WAITING_FOR_STOP -> playbackStopped() -> STOPPED -> fileStarted() -> STARTED

Example state flow which includes a termination:
NOT_STARTED -> fileStarted() -> STARTED -> terminateMPV() {isTerminating = true; stop()}-> WAITING_FOR_STOP -> playbackStopped() { state = (isTerminating ? TERMINATION_REQUESTED : STOPPED) } -> TERMINATION_REQUESTED -> mpvHasShutdown() -> TERMINATED

A call from an invalid state (for example, if fileStarted() was called when already in the STARTED state, I suppose the easiest thing for now would be to spit out an error message but otherwise ignore it. There seems to be a danger of massive scope creep.

But what you really want is both of those in one atomic operation (e.g., getAndSet()

YES! You are pointing out one of the reasons why I mentioned I'm worried the "fix" is not deterministic. To make this fully deterministic requires locking similar to the proposed logger fix in PR #3938. I started down that path, but was concerned about all the changes required to do that. In particular, how do you handle "get" operations that occur after mpv has been told to shutdown? If designing from scratch probably you would make all results optional and return nothing. A lot of work to retrofit that.

A different approach can be seen in the replacement PR. That one takes the attitude that this method must only be called by the main thread. So the thread in MPVController processing mpv events has to queue calls to PlayerCore to the main dispatch queue. Probably some internal checks should be added to ensure use of the main thread.

I too am used to making a class "iron clad". Written from scratch a call to stop after quit would be an internal error. Always painful to try and add such enforcement after the fact. In this case AppDelegate is supposed to ensure that does not happen. I am concerned about the flow if a file is being opened right as IINA starts to shutdown.

Initially I was thinking an event driven state machine and ended up with the thread based termination sequence seen in this PR. The new PR is event driven, but does not have a state machine.

At the moment there is no coordination with a high chance a race condition causes a crash. I'm trying to fix that enough so failures are unlikely without all the changes required to fully refactor the code. Basically trying to avoid the massive increase in the scope of changes needed to really nail this down.

For a fully "safe" system we'd have to figure out what to do about mpv deciding on its own to shutdown, as in when the user sends a quit command directly to mpv. That opens a window where mpv is shutting down and IINA doesn't know that until mpv sends the shutdown event. IINA might access mpv in that window triggering the same kind of crash we are trying to fix.

svobs · 2022-09-28T12:45:24Z

iina/AtomicBoolSemaphore.swift

+///
+/// This class is expected to be used as follows:
+/// - This boolean is set to `false`, recreating the semaphore
+/// - A single thread waits for this boolean to be set to `true`


Re: AppDelegate.applicationShouldTerminate():
If you haven't yet invested time in refactoring it again, consider this dispatch group example. You might want to create a DispatchQueue with attributes: .concurrent for it, because DispatchQueues are serial by default. It shouldn't be too hard to implement, and you can also call wait() with a timeout.

But you must guarantee that exactly one enter() is called for every leave() - an error will be thrown when leave() is called if { numEnters - numLeaves } is negative. (I have taken to using defer {} blocks around the leave() in case there's any possibility of exceptions occurring).

So you can probably do something like this:

// do the same for stop above // ... let terminateQueue = DispatchQueue(label: "terminate", attributes: .concurrent) let terminateGroup = DispatchGroup() for pc in PlayerCore.playerCores { terminateGroup.enter() pc.terminateMPV() terminateQueue.async { defer { terminateGroup.leave() } pc.waitForTermination() } } terminateGroup.wait(timeout: .now() + 10)

Note that in this design, you'll still need the semaphores so that you can wait on them.

low-batt · 2022-10-04T03:14:33Z

Have a look at the replacement proposed fix in PR #3961. The terminateQueue is gone. Instead applicationShouldTerminate establishes observers for new notifications posted by PlayerCore. An event driven termination instead of thread based. The semaphores are gone.

uiryuu force-pushed the fix-3939 branch from 069a02a to 7da64be Compare September 25, 2022 02:20

saagarjha requested changes Sep 26, 2022

View reviewed changes

low-batt mentioned this pull request Oct 2, 2022

Random failure to resume last playback position #3939

Closed

1 task

low-batt closed this Oct 2, 2022

low-batt mentioned this pull request Oct 2, 2022

Fix random failure to resume playback position, #3939 #3961

Merged

2 tasks

svobs reviewed Oct 3, 2022

View reviewed changes

uiryuu deleted the fix-3939 branch October 17, 2022 02:40

Uh oh!

Fix random failure to resume playback position, #3939 #3952

Fix random failure to resume playback position, #3939 #3952

Uh oh!

Conversation

low-batt commented Sep 24, 2022

Uh oh!

low-batt commented Sep 24, 2022

Uh oh!

low-batt commented Sep 25, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

low-batt commented Oct 2, 2022

Uh oh!

svobs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

low-batt commented Oct 4, 2022

Uh oh!

Uh oh!