Skip to content

[sshfs-mount-handler] ensure signals are disconnected upon destruction #3978

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 8, 2025

Conversation

xmkg
Copy link
Member

@xmkg xmkg commented Mar 12, 2025

Currently, the signal handlers outlast the process and the mount handler, which leads to undefined behavior.

This resulted in a crash when I attempted to unmount an sshfs mount in Windows.

This patch resolves the issue by ensuring that all connected signals are disconnected in the destructor.

Also modernized the mpl::log calls while at it.

MULTI-1873

xmkg added 2 commits March 12, 2025 19:37
Right now, the signal handlers outlive the process and the mount handler
which causes object lifetime related undefined behavior to happen.

This caused a crash when I was trying to unmount a sshfs mount in Windows.

This patch fixes that by ensuring all connected signals are disconnected
in destructor.

Signed-off-by: Mustafa Kemal Gilor <mustafa.gilor@canonical.com>
Signed-off-by: Mustafa Kemal Gilor <mustafa.gilor@canonical.com>
@xmkg xmkg requested review from georgeliao, ricab and Sploder12 March 12, 2025 16:48
@xmkg xmkg changed the title Bugfix/sshfs mount handler ensure signal disconnect [sshfs-mount-handler] ensure signals are disconnected upon destruction Mar 12, 2025
Copy link

codecov bot commented Mar 12, 2025

Codecov Report

Attention: Patch coverage is 91.42857% with 3 lines in your changes missing coverage. Please review.

Project coverage is 89.30%. Comparing base (d729b67) to head (c051cb1).
Report is 3580 commits behind head on main.

Files with missing lines Patch % Lines
src/sshfs_mount/sshfs_mount_handler.cpp 91.17% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3978      +/-   ##
==========================================
+ Coverage   89.14%   89.30%   +0.16%     
==========================================
  Files         257      259       +2     
  Lines       14640    14649       +9     
==========================================
+ Hits        13051    13083      +32     
+ Misses       1589     1566      -23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@ricab ricab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix @xmkg, it makes sense to me, although I agree we could improve the logic further.

I would like Andrei's feedback as well and I have a question for him too inline. Other than that, just small suggestions.

if (process->terminate(); !process->wait_for_finished(5000))
{
auto err = fmt::format("Failed to terminate SSHFS mount process: {}", process->read_all_standard_error());
if (force)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we process->kill() in this case? Question for @andrei-toterman too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I have no clue. I reckon it would depend on whether destroying the process object also implies that it's going to be forcefully killed or not. If not, it might be a good idea to kill it. I think @andrei-toterman can better reason than I can here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should manually kill here. Yes, destroying the object implies killing the process, but in our case, we store the process object inside a qt_delete_later_unique_ptr, so the process won't be killed when we request it to, but only when the control goes back to the Qt event loop. If we were using a regular unique pointer, the .reset() below would kill the process immediately.

Comment on lines 273 to 274
"SSHFSMountHandler is going to be destroyed, but Process still has `finished` signal connected. "
"Disconnected it to prevent errors.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making this a little shorter, while avoiding signal/slot technicalities?

Suggested change
"SSHFSMountHandler is going to be destroyed, but Process still has `finished` signal connected. "
"Disconnected it to prevent errors.");
"Stopped listening to sshfs_server process only upon SSHFSMountHandler destruction");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work too. I'm okay with whatever allows us to trace back to here when there's trouble.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed message does not imply there's an abnormality though. Should we also phrase it like it's not a normal situation, or message's level being "warn" would be sufficient?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. OK, feel free to rework it or leave it as you prefer.

- wording and typing changes

Signed-off-by: Mustafa Kemal Gilor <mustafa.gilor@canonical.com>
Comment on lines 262 to 278
*
* FIXME: Find a way to associate slot lifetime with the process itself.
*/
if (process)
{
constexpr std::string_view warn_msg_fmtstr =
"Stopped listening to sshfs_server process only upon SSHFSMountHandler destruction";
// The disconnect() is a no-op when nothing is connected.
if (QObject::disconnect(process.get(), &Process::finished, nullptr, nullptr))
{
mpl::warn(category, warn_msg_fmtstr);
}
if (QObject::disconnect(process.get(), &Process::error_occurred, nullptr, nullptr))
{
mpl::warn(category, warn_msg_fmtstr);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is already the case, technically. The connection is broken automatically when the process is destroyed. But that is not aligned with the destruction of the mount handler, since the process is behind a qt_delete_later_unique_ptr. So the mount handler is destroyed, it calls Qt's deleteLater on the process object, and after control goes back to the Qt event loop, the process object is destroyed, which kills the process, which emits the finished signal, whose slot references this mount handler, and that is what leads to the crash, IIUC. So what we need to do is tie the lifetime of those connections to the lifetime of this mount handler, since the slots reference the mount handlers. And this is basically what you are doing here with the manual disconnects. But that makes it so when the process is actually killed later, we won't get those log messages (maybe we don't even need them in this case, so that's fine). One other avenue worth exploring is to move to a regular unique pointer instead of using deleteLater. That way, the process object would be destroyed (so disconnected and killed too) together with the mount handler. Unfortunately I do not remember why we're using the deleteLater option in this case, so aother option would be to see if the issue still occurs if the slots do not reference the mount handler (i.e. no [this] in the slot lambdas by pre-computing the part of the log message that access the mount handler data), and that way we won't need to manually disconnect those things, and the slots should be able to run when Qt actually deletes the process.

Copy link
Member Author

@xmkg xmkg Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything makes more sense now @andrei-toterman, thanks for the explanation!

One other avenue worth exploring is to move to a regular unique pointer instead of using deleteLater

Yeah, that seems a good option, but I wonder about the same question you've asked: "why we opted to use the qt_delete_later_unique_ptr in the first place?"

But that makes it so when the process is actually killed later, we won't get those log messages (maybe we don't even need them in this case, so that's fine)

The log messages are just here to indicate that an abnormal thing has just happened. This kind of detachment shouldn't happen normally.
EDIT: I realized that you're referring to the signal handler log messages. Good point.

After reading both of your comments I think we're better off using std::unique_ptr<> here because it would also solve the kill() problem:

If we were using a regular unique pointer, the .reset() below would kill the process immediately.

I'll dig into what's the reason behind using the delete_later thingy. It might be a legacy QT GUI requirement, and if that's the case we can ditch it for good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems like the QT objects' destruction should happen in the event loop to ensure signal-slot safety for the QT side of things. We can use shared_from_this(), or a weak_ptr to track the liveliness of the mount handler, but that would mean the mount handler object has to be a shared_ptr. That might make things more complicated than they need to be.

Back to the drawing board. A good part of the problem comes from the "is_active()" implementation. The base class uses the is_active function to check if the mount handler has called the activate_impl, and succeeded. But SSHFSMountHandler's is_active implementation does something else entirely:

a-) checks whether the process object is alive
b-) checks whether the process is still running
c-) checks whether the mount is present in the guest or not

The current implementation looks more like a "health check" function than an is_active function. One of them being not true is enough to prevent deactivate_impl from running, which contradicts the design of the MountHandler. In reality, the handler can be "active" while being in an error state, or being disfunctional. As far as my understanding goes, every activate() must be eventually finalized by a deactivate(). Hence, I think we're better off without the SSHFSMountHandler::is_active() override. We can introduce an is_healthy() function if that's something needed, but I didn't see any use of is_active() other than the base class activate() and deactivate() functions.

Let me know what you think @ricab @andrei-toterman.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that makes much more sense to me too!

Copy link
Member Author

@xmkg xmkg Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I've simplified things quite a lot:

  • Moved the disconnect() to the QtDeleteLater custom deleter itself:
    Rationale: In all cases, the owner scope of the qt_delete_later_unique_ptr is not a part of the QT object hierarchy, and the signal handlers are also part of the owner scope, which means signal handlers' lifetime is bound to the owner scope. Therefore it's better to ensure that all signals are disconnected before deferring deletion to the QT event loop.
  • Added process->kill() to the deactivate_impl's force{} block.
  • Added a comment explaining a potential gotcha for the one and only other user of the qt_delete_later_unique_ptr, NewReleaseMonitor.

@ricab @andrei-toterman let me know what you think of the new approach.

- Moved the disconnect() to the QtDeleteLater custom deleter.
- Removed SSHFSMountHandler::is_active override
- Removed a redundant process.reset()
- SSHFSMountHandler::deactivate_impl will now kill() when force==true

Signed-off-by: Mustafa Kemal Gilor <mustafa.gilor@canonical.com>
EXPECT_CALL(mock_vm, ssh_port()).Times(3);
EXPECT_CALL(mock_vm, ssh_hostname()).Times(3);
EXPECT_CALL(mock_vm, ssh_username()).Times(3);
EXPECT_CALL(mock_vm, ssh_port()).Times(2);
Copy link
Member Author

@xmkg xmkg Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because is_active no longer creates an SSHSession.

@xmkg
Copy link
Member Author

xmkg commented Mar 21, 2025

@Sploder12 I saw the issue you mentioned. So it seems like two problems are happening at once here:

a.) Signals are not being disconnected before the handlers are destroyed
b.) sshfs_server is not being gracefully shut down

The a-) is addressed in this PR, so the daemon won't crash regardless of whether sshfs_server ended gracefully or not. But, b-) is not addressed here, and I think we should fix it too, since killing a filesystem process non-gracefully is never a good idea.

We were discussing the b-) with @ricab and he mentioned that there was a relevant patch: https://github.com/canonical/multipass-private/pull/672/files#diff-ca7f8e14e6bf98510802d0c2c17845c94ca23025f9d0cc8302a45ce743c496cb . The patch touches the sshfs_server indirectly by changing the make_quit_watchdog implementation, and as far as my understanding goes, this patch ensures that the receiving side (sshfs_server) does a graceful shutdown when the Ctrl-C is received. But the reality is the daemon never sends a Ctrl-C in the first place -- I'll explain that part below.

The code calls for a graceful termination, but right now the terminate() implementation in Windows is just a kill(), and it's not sending the Ctrl-C signal that sshfs_server expects for a graceful shutdown. The reason why is the implementation that originally designed to send the Ctrl-C has a bug, because the child and the parent share the same console group.

Fixing that bug requires us to hijack the QProcess's CreateProcess Win32 API call, and then create a dedicated console group for the process so we can reliably send a Ctrl-C. The other avenue is to give a proper name to the event here, so another process (in this case it's the multipassd) can signal it directly, without having to rely on Ctrl-C. But that means it will be only applicable to the child processes that we own the source code of, not so practical.

What do you think?

@xmkg
Copy link
Member Author

xmkg commented Mar 21, 2025

Fixing that bug requires us to hijack the QProcess's CreateProcess Win32 API call, and then create a dedicated console group for the process so we can reliably send a Ctrl-C

Hmm, this looks promising: https://doc.qt.io/qt-6/qprocess-createprocessarguments.html

@Sploder12
Copy link
Contributor

Fixing that bug requires us to hijack the QProcess's CreateProcess Win32 API call, and then create a dedicated console group for the process so we can reliably send a Ctrl-C. The other avenue is to give a proper name to the event here, so another process (in this case it's the multipassd) can signal it directly, without having to rely on Ctrl-C. But that means it will be only applicable to the child processes that we own the source code of, not so practical.

What do you think?

Giving the event a name would fix that specific problem and probably be a pretty quick fix. But if the crash is mitigated from this PR then I think a proper fix would be more appropriate. Would you be able to hijack that call without modifying Qt? Or perhaps a more recent version of Qt fixes this or provides a better alternative? Either way, I think fixing that Ctrl-C bug would be the best solution.

@xmkg
Copy link
Member Author

xmkg commented Mar 21, 2025

But if the crash is mitigated from this PR then I think a proper fix would be more appropriate.

I agree.

Would you be able to hijack that call without modifying Qt? Or perhaps a more recent version of Qt fixes this or provides a better alternative?

So they indeed made it possible to pass additional flags to the CreateProcess in Windows, so it's possible to do:

 process.setCreateProcessArgumentsModifier(
        [](QProcess::CreateProcessArguments* args) { args->flags |= CREATE_NEW_PROCESS_GROUP; });

I'll try this to see if it works.

@xmkg
Copy link
Member Author

xmkg commented Mar 21, 2025

I'll try this to see if it works.

After debugging the thing for 3 hours straight, I confirm that it works. By creating the child process with:

 process.setCreateProcessArgumentsModifier(
        [](QProcess::CreateProcessArguments* args) { args->flags |= CREATE_NEW_PROCESS_GROUP; });

... we can directly signal it from the parent, without worrying about affecting others (parent included):

// Child being in its own dedicated console group means only the child will get the signal.
// Hence, there's no need to modify the parent's signal handling
BOOL result = GenerateConsoleCtrlEvent(CTRL_C_EVENT, child->process_id());

And finally, in the child:

// This is crucial. Otherwise, the Ctrl-C will be swallowed by the default handler
// and it won't reach to the registered handler.
// The CREATE_NEW_PROCESS_GROUP documentation slightly mentions this and 
// it doesn't tell how to disable it:
//"If this flag is specified, CTRL+C signals will be disabled for all processes within
// the new process group."
SetConsoleCtrlHandler(nullptr, false);
// Now register the real handler.
SetConsoleCtrlHandler(signal_handler, TRUE);

@Sploder12
Copy link
Contributor

After debugging the thing for 3 hours straight, I confirm that it works.

Great work! Windows documentation can be pretty rough!

@xmkg
Copy link
Member Author

xmkg commented Mar 21, 2025

b.) sshfs_server is not being gracefully shut down

Implemented in a separate PR in private side: https://github.com/canonical/multipass-private/pull/733

Sploder12
Sploder12 previously approved these changes Mar 26, 2025
Copy link
Contributor

@Sploder12 Sploder12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I didn't crash when unmounting! There is a very minor typo.

Co-authored-by: trevor-shoe <trevorschupbach@gmail.com>
Signed-off-by: Mustafa Kemal Gılor <mustafagilor@gmail.com>
@xmkg
Copy link
Member Author

xmkg commented Mar 26, 2025

LGTM, I didn't crash when unmounting! There is a very minor typo.

The test for part b-) in the private side is pretty similar too. The child executable simply should not crash. This can be validated by attaching a debugger to the sshfs_server after mounting and setting a few breakpoints in the graceful shutdown path.

@xmkg
Copy link
Member Author

xmkg commented Mar 26, 2025

Hi @andrei-toterman, this one is good to go. Could you review the latest changes and let me know if it looks OK?

@xmkg
Copy link
Member Author

xmkg commented Apr 2, 2025

@georgeliao @ricab this one just needs a secondary review to go. I'd appreciate it if one of you could review it.

@ricab ricab requested review from andrei-toterman and removed request for georgeliao April 4, 2025 15:07
@xmkg xmkg added this pull request to the merge queue Apr 8, 2025
@xmkg
Copy link
Member Author

xmkg commented Apr 8, 2025

@andrei-toterman @Sploder12 thanks for the reviews!

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 8, 2025
@xmkg xmkg added this pull request to the merge queue Apr 8, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 8, 2025
@xmkg xmkg added this pull request to the merge queue Apr 8, 2025
Merged via the queue into main with commit e932d5f Apr 8, 2025
16 checks passed
@xmkg xmkg deleted the bugfix/sshfs-mount-handler-ensure-signal-disconnect branch April 8, 2025 22:49
@ricab ricab added this to the 1.16.0 milestone Jun 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants