-
Notifications
You must be signed in to change notification settings - Fork 722
[sshfs-mount-handler] ensure signals are disconnected upon destruction #3978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sshfs-mount-handler] ensure signals are disconnected upon destruction #3978
Conversation
Right now, the signal handlers outlive the process and the mount handler which causes object lifetime related undefined behavior to happen. This caused a crash when I was trying to unmount a sshfs mount in Windows. This patch fixes that by ensuring all connected signals are disconnected in destructor. Signed-off-by: Mustafa Kemal Gilor <mustafa.gilor@canonical.com>
Signed-off-by: Mustafa Kemal Gilor <mustafa.gilor@canonical.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3978 +/- ##
==========================================
+ Coverage 89.14% 89.30% +0.16%
==========================================
Files 257 259 +2
Lines 14640 14649 +9
==========================================
+ Hits 13051 13083 +32
+ Misses 1589 1566 -23 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this fix @xmkg, it makes sense to me, although I agree we could improve the logic further.
I would like Andrei's feedback as well and I have a question for him too inline. Other than that, just small suggestions.
if (process->terminate(); !process->wait_for_finished(5000)) | ||
{ | ||
auto err = fmt::format("Failed to terminate SSHFS mount process: {}", process->read_all_standard_error()); | ||
if (force) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we process->kill()
in this case? Question for @andrei-toterman too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I have no clue. I reckon it would depend on whether destroying the process object also implies that it's going to be forcefully killed or not. If not, it might be a good idea to kill it. I think @andrei-toterman can better reason than I can here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we should manually kill here. Yes, destroying the object implies killing the process, but in our case, we store the process object inside a qt_delete_later_unique_ptr
, so the process won't be killed when we request it to, but only when the control goes back to the Qt event loop. If we were using a regular unique pointer, the .reset()
below would kill the process immediately.
"SSHFSMountHandler is going to be destroyed, but Process still has `finished` signal connected. " | ||
"Disconnected it to prevent errors."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making this a little shorter, while avoiding signal/slot technicalities?
"SSHFSMountHandler is going to be destroyed, but Process still has `finished` signal connected. " | |
"Disconnected it to prevent errors."); | |
"Stopped listening to sshfs_server process only upon SSHFSMountHandler destruction"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could work too. I'm okay with whatever allows us to trace back to here when there's trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed message does not imply there's an abnormality though. Should we also phrase it like it's not a normal situation, or message's level being "warn" would be sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. OK, feel free to rework it or leave it as you prefer.
- wording and typing changes Signed-off-by: Mustafa Kemal Gilor <mustafa.gilor@canonical.com>
* | ||
* FIXME: Find a way to associate slot lifetime with the process itself. | ||
*/ | ||
if (process) | ||
{ | ||
constexpr std::string_view warn_msg_fmtstr = | ||
"Stopped listening to sshfs_server process only upon SSHFSMountHandler destruction"; | ||
// The disconnect() is a no-op when nothing is connected. | ||
if (QObject::disconnect(process.get(), &Process::finished, nullptr, nullptr)) | ||
{ | ||
mpl::warn(category, warn_msg_fmtstr); | ||
} | ||
if (QObject::disconnect(process.get(), &Process::error_occurred, nullptr, nullptr)) | ||
{ | ||
mpl::warn(category, warn_msg_fmtstr); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is already the case, technically. The connection is broken automatically when the process is destroyed. But that is not aligned with the destruction of the mount handler, since the process is behind a qt_delete_later_unique_ptr
. So the mount handler is destroyed, it calls Qt's deleteLater
on the process object, and after control goes back to the Qt event loop, the process object is destroyed, which kills the process, which emits the finished
signal, whose slot references this mount handler, and that is what leads to the crash, IIUC. So what we need to do is tie the lifetime of those connections to the lifetime of this mount handler, since the slots reference the mount handlers. And this is basically what you are doing here with the manual disconnects. But that makes it so when the process is actually killed later, we won't get those log messages (maybe we don't even need them in this case, so that's fine). One other avenue worth exploring is to move to a regular unique pointer instead of using deleteLater
. That way, the process object would be destroyed (so disconnected and killed too) together with the mount handler. Unfortunately I do not remember why we're using the deleteLater
option in this case, so aother option would be to see if the issue still occurs if the slots do not reference the mount handler (i.e. no [this]
in the slot lambdas by pre-computing the part of the log message that access the mount handler data), and that way we won't need to manually disconnect those things, and the slots should be able to run when Qt actually deletes the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything makes more sense now @andrei-toterman, thanks for the explanation!
One other avenue worth exploring is to move to a regular unique pointer instead of using deleteLater
Yeah, that seems a good option, but I wonder about the same question you've asked: "why we opted to use the qt_delete_later_unique_ptr
in the first place?"
But that makes it so when the process is actually killed later, we won't get those log messages (maybe we don't even need them in this case, so that's fine)
The log messages are just here to indicate that an abnormal thing has just happened. This kind of detachment shouldn't happen normally.
EDIT: I realized that you're referring to the signal handler log messages. Good point.
After reading both of your comments I think we're better off using std::unique_ptr<>
here because it would also solve the kill()
problem:
If we were using a regular unique pointer, the .reset() below would kill the process immediately.
I'll dig into what's the reason behind using the delete_later thingy. It might be a legacy QT GUI requirement, and if that's the case we can ditch it for good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it seems like the QT objects' destruction should happen in the event loop to ensure signal-slot safety for the QT side of things. We can use shared_from_this(), or a weak_ptr to track the liveliness of the mount handler, but that would mean the mount handler object has to be a shared_ptr. That might make things more complicated than they need to be.
Back to the drawing board. A good part of the problem comes from the "is_active()" implementation. The base class uses the is_active function to check if the mount handler has called the activate_impl, and succeeded. But SSHFSMountHandler's is_active implementation does something else entirely:
a-) checks whether the process object is alive
b-) checks whether the process is still running
c-) checks whether the mount is present in the guest or not
The current implementation looks more like a "health check" function than an is_active function. One of them being not true is enough to prevent deactivate_impl from running, which contradicts the design of the MountHandler. In reality, the handler can be "active" while being in an error state, or being disfunctional. As far as my understanding goes, every activate() must be eventually finalized by a deactivate(). Hence, I think we're better off without the SSHFSMountHandler::is_active() override. We can introduce an is_healthy() function if that's something needed, but I didn't see any use of is_active() other than the base class activate() and deactivate() functions.
Let me know what you think @ricab @andrei-toterman.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that makes much more sense to me too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I've simplified things quite a lot:
- Moved the disconnect() to the QtDeleteLater custom deleter itself:
Rationale: In all cases, the owner scope of the qt_delete_later_unique_ptr is not a part of the QT object hierarchy, and the signal handlers are also part of the owner scope, which means signal handlers' lifetime is bound to the owner scope. Therefore it's better to ensure that all signals are disconnected before deferring deletion to the QT event loop. - Added process->kill() to the deactivate_impl's force{} block.
- Added a comment explaining a potential
gotcha
for the one and only other user of the qt_delete_later_unique_ptr, NewReleaseMonitor.
@ricab @andrei-toterman let me know what you think of the new approach.
- Moved the disconnect() to the QtDeleteLater custom deleter. - Removed SSHFSMountHandler::is_active override - Removed a redundant process.reset() - SSHFSMountHandler::deactivate_impl will now kill() when force==true Signed-off-by: Mustafa Kemal Gilor <mustafa.gilor@canonical.com>
EXPECT_CALL(mock_vm, ssh_port()).Times(3); | ||
EXPECT_CALL(mock_vm, ssh_hostname()).Times(3); | ||
EXPECT_CALL(mock_vm, ssh_username()).Times(3); | ||
EXPECT_CALL(mock_vm, ssh_port()).Times(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because is_active no longer creates an SSHSession.
@Sploder12 I saw the issue you mentioned. So it seems like two problems are happening at once here: a.) Signals are not being disconnected before the handlers are destroyed The a-) is addressed in this PR, so the daemon won't crash regardless of whether sshfs_server ended gracefully or not. But, b-) is not addressed here, and I think we should fix it too, since killing a filesystem process non-gracefully is never a good idea. We were discussing the b-) with @ricab and he mentioned that there was a relevant patch: https://github.com/canonical/multipass-private/pull/672/files#diff-ca7f8e14e6bf98510802d0c2c17845c94ca23025f9d0cc8302a45ce743c496cb . The patch touches the sshfs_server indirectly by changing the make_quit_watchdog implementation, and as far as my understanding goes, this patch ensures that the receiving side (sshfs_server) does a graceful shutdown when the Ctrl-C is received. But the reality is the daemon never sends a Ctrl-C in the first place -- I'll explain that part below. The code calls for a graceful termination, but right now the Fixing that bug requires us to hijack the QProcess's CreateProcess Win32 API call, and then create a dedicated console group for the process so we can reliably send a Ctrl-C. The other avenue is to give a proper name to the event here, so another process (in this case it's the multipassd) can signal it directly, without having to rely on Ctrl-C. But that means it will be only applicable to the child processes that we own the source code of, not so practical. What do you think? |
Hmm, this looks promising: https://doc.qt.io/qt-6/qprocess-createprocessarguments.html |
Giving the event a name would fix that specific problem and probably be a pretty quick fix. But if the crash is mitigated from this PR then I think a proper fix would be more appropriate. Would you be able to hijack that call without modifying Qt? Or perhaps a more recent version of Qt fixes this or provides a better alternative? Either way, I think fixing that Ctrl-C bug would be the best solution. |
I agree.
So they indeed made it possible to pass additional flags to the CreateProcess in Windows, so it's possible to do: process.setCreateProcessArgumentsModifier(
[](QProcess::CreateProcessArguments* args) { args->flags |= CREATE_NEW_PROCESS_GROUP; }); I'll try this to see if it works. |
After debugging the thing for 3 hours straight, I confirm that it works. By creating the child process with: process.setCreateProcessArgumentsModifier(
[](QProcess::CreateProcessArguments* args) { args->flags |= CREATE_NEW_PROCESS_GROUP; }); ... we can directly signal it from the parent, without worrying about affecting others (parent included): // Child being in its own dedicated console group means only the child will get the signal.
// Hence, there's no need to modify the parent's signal handling
BOOL result = GenerateConsoleCtrlEvent(CTRL_C_EVENT, child->process_id()); And finally, in the child: // This is crucial. Otherwise, the Ctrl-C will be swallowed by the default handler
// and it won't reach to the registered handler.
// The CREATE_NEW_PROCESS_GROUP documentation slightly mentions this and
// it doesn't tell how to disable it:
//"If this flag is specified, CTRL+C signals will be disabled for all processes within
// the new process group."
SetConsoleCtrlHandler(nullptr, false);
// Now register the real handler.
SetConsoleCtrlHandler(signal_handler, TRUE); |
Great work! Windows documentation can be pretty rough! |
b.) sshfs_server is not being gracefully shut down Implemented in a separate PR in private side: https://github.com/canonical/multipass-private/pull/733 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I didn't crash when unmounting! There is a very minor typo.
Co-authored-by: trevor-shoe <trevorschupbach@gmail.com> Signed-off-by: Mustafa Kemal Gılor <mustafagilor@gmail.com>
The test for part b-) in the private side is pretty similar too. The child executable simply should not crash. This can be validated by attaching a debugger to the sshfs_server after mounting and setting a few breakpoints in the graceful shutdown path. |
Hi @andrei-toterman, this one is good to go. Could you review the latest changes and let me know if it looks OK? |
@georgeliao @ricab this one just needs a secondary review to go. I'd appreciate it if one of you could review it. |
@andrei-toterman @Sploder12 thanks for the reviews! |
Currently, the signal handlers outlast the process and the mount handler, which leads to undefined behavior.
This resulted in a crash when I attempted to unmount an sshfs mount in Windows.
This patch resolves the issue by ensuring that all connected signals are disconnected in the destructor.
Also modernized the mpl::log calls while at it.
MULTI-1873