[sshfs-mount-handler] ensure signals are disconnected upon destruction #3978

xmkg · 2025-03-12T16:48:43Z

Currently, the signal handlers outlast the process and the mount handler, which leads to undefined behavior.

This resulted in a crash when I attempted to unmount an sshfs mount in Windows.

This patch resolves the issue by ensuring that all connected signals are disconnected in the destructor.

Also modernized the mpl::log calls while at it.

MULTI-1873

Right now, the signal handlers outlive the process and the mount handler which causes object lifetime related undefined behavior to happen. This caused a crash when I was trying to unmount a sshfs mount in Windows. This patch fixes that by ensuring all connected signals are disconnected in destructor. Signed-off-by: Mustafa Kemal Gilor <mustafa.gilor@canonical.com>

Signed-off-by: Mustafa Kemal Gilor <mustafa.gilor@canonical.com>

codecov · 2025-03-12T17:10:25Z

Codecov Report

Attention: Patch coverage is 91.42857% with 3 lines in your changes missing coverage. Please review.

Project coverage is 89.30%. Comparing base (d729b67) to head (c051cb1).
Report is 3580 commits behind head on main.

Files with missing lines	Patch %	Lines
src/sshfs_mount/sshfs_mount_handler.cpp	91.17%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3978      +/-   ##
==========================================
+ Coverage   89.14%   89.30%   +0.16%     
==========================================
  Files         257      259       +2     
  Lines       14640    14649       +9     
==========================================
+ Hits        13051    13083      +32     
+ Misses       1589     1566      -23

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ricab

Thanks for this fix @xmkg, it makes sense to me, although I agree we could improve the logic further.

I would like Andrei's feedback as well and I have a question for him too inline. Other than that, just small suggestions.

src/sshfs_mount/sshfs_mount_handler.cpp

ricab · 2025-03-17T19:29:28Z

src/sshfs_mount/sshfs_mount_handler.cpp

+        if (process->terminate(); !process->wait_for_finished(5000))
+        {
+            auto err = fmt::format("Failed to terminate SSHFS mount process: {}", process->read_all_standard_error());
+            if (force)


Shouldn't we process->kill() in this case? Question for @andrei-toterman too.

Honestly, I have no clue. I reckon it would depend on whether destroying the process object also implies that it's going to be forcefully killed or not. If not, it might be a good idea to kill it. I think @andrei-toterman can better reason than I can here.

Yes, I think we should manually kill here. Yes, destroying the object implies killing the process, but in our case, we store the process object inside a qt_delete_later_unique_ptr, so the process won't be killed when we request it to, but only when the control goes back to the Qt event loop. If we were using a regular unique pointer, the .reset() below would kill the process immediately.

src/sshfs_mount/sshfs_mount_handler.cpp

ricab · 2025-03-17T20:43:13Z

src/sshfs_mount/sshfs_mount_handler.cpp

+                      "SSHFSMountHandler is going to be destroyed, but Process still has `finished` signal connected. "
+                      "Disconnected it to prevent errors.");


How about making this a little shorter, while avoiding signal/slot technicalities?

Suggested change

"SSHFSMountHandler is going to be destroyed, but Process still has `finished` signal connected. "

"Disconnected it to prevent errors.");

"Stopped listening to sshfs_server process only upon SSHFSMountHandler destruction");

That could work too. I'm okay with whatever allows us to trace back to here when there's trouble.

The proposed message does not imply there's an abnormality though. Should we also phrase it like it's not a normal situation, or message's level being "warn" would be sufficient?

Good point. OK, feel free to rework it or leave it as you prefer.

- wording and typing changes Signed-off-by: Mustafa Kemal Gilor <mustafa.gilor@canonical.com>

andrei-toterman · 2025-03-19T17:28:47Z

src/sshfs_mount/sshfs_mount_handler.cpp

+     *
+     * FIXME: Find a way to associate slot lifetime with the process itself.
+     */
+    if (process)
+    {
+        constexpr std::string_view warn_msg_fmtstr =
+            "Stopped listening to sshfs_server process only upon SSHFSMountHandler destruction";
+        // The disconnect() is a no-op when nothing is connected.
+        if (QObject::disconnect(process.get(), &Process::finished, nullptr, nullptr))
+        {
+            mpl::warn(category, warn_msg_fmtstr);
+        }
+        if (QObject::disconnect(process.get(), &Process::error_occurred, nullptr, nullptr))
+        {
+            mpl::warn(category, warn_msg_fmtstr);
+        }
+    }


That is already the case, technically. The connection is broken automatically when the process is destroyed. But that is not aligned with the destruction of the mount handler, since the process is behind a qt_delete_later_unique_ptr. So the mount handler is destroyed, it calls Qt's deleteLater on the process object, and after control goes back to the Qt event loop, the process object is destroyed, which kills the process, which emits the finished signal, whose slot references this mount handler, and that is what leads to the crash, IIUC. So what we need to do is tie the lifetime of those connections to the lifetime of this mount handler, since the slots reference the mount handlers. And this is basically what you are doing here with the manual disconnects. But that makes it so when the process is actually killed later, we won't get those log messages (maybe we don't even need them in this case, so that's fine). One other avenue worth exploring is to move to a regular unique pointer instead of using deleteLater. That way, the process object would be destroyed (so disconnected and killed too) together with the mount handler. Unfortunately I do not remember why we're using the deleteLater option in this case, so aother option would be to see if the issue still occurs if the slots do not reference the mount handler (i.e. no [this] in the slot lambdas by pre-computing the part of the log message that access the mount handler data), and that way we won't need to manually disconnect those things, and the slots should be able to run when Qt actually deletes the process.

Everything makes more sense now @andrei-toterman, thanks for the explanation!

One other avenue worth exploring is to move to a regular unique pointer instead of using deleteLater

Yeah, that seems a good option, but I wonder about the same question you've asked: "why we opted to use the qt_delete_later_unique_ptr in the first place?"

But that makes it so when the process is actually killed later, we won't get those log messages (maybe we don't even need them in this case, so that's fine)

The log messages are just here to indicate that an abnormal thing has just happened. This kind of detachment shouldn't happen normally.
EDIT: I realized that you're referring to the signal handler log messages. Good point.

After reading both of your comments I think we're better off using std::unique_ptr<> here because it would also solve the kill() problem:

If we were using a regular unique pointer, the .reset() below would kill the process immediately.

I'll dig into what's the reason behind using the delete_later thingy. It might be a legacy QT GUI requirement, and if that's the case we can ditch it for good.

So it seems like the QT objects' destruction should happen in the event loop to ensure signal-slot safety for the QT side of things. We can use shared_from_this(), or a weak_ptr to track the liveliness of the mount handler, but that would mean the mount handler object has to be a shared_ptr. That might make things more complicated than they need to be.

Back to the drawing board. A good part of the problem comes from the "is_active()" implementation. The base class uses the is_active function to check if the mount handler has called the activate_impl, and succeeded. But SSHFSMountHandler's is_active implementation does something else entirely:

a-) checks whether the process object is alive
b-) checks whether the process is still running
c-) checks whether the mount is present in the guest or not

The current implementation looks more like a "health check" function than an is_active function. One of them being not true is enough to prevent deactivate_impl from running, which contradicts the design of the MountHandler. In reality, the handler can be "active" while being in an error state, or being disfunctional. As far as my understanding goes, every activate() must be eventually finalized by a deactivate(). Hence, I think we're better off without the SSHFSMountHandler::is_active() override. We can introduce an is_healthy() function if that's something needed, but I didn't see any use of is_active() other than the base class activate() and deactivate() functions.

Let me know what you think @ricab @andrei-toterman.

Yep, that makes much more sense to me too!

So, I've simplified things quite a lot:

Moved the disconnect() to the QtDeleteLater custom deleter itself:
Rationale: In all cases, the owner scope of the qt_delete_later_unique_ptr is not a part of the QT object hierarchy, and the signal handlers are also part of the owner scope, which means signal handlers' lifetime is bound to the owner scope. Therefore it's better to ensure that all signals are disconnected before deferring deletion to the QT event loop.

Added process->kill() to the deactivate_impl's force{} block.

Added a comment explaining a potential gotcha for the one and only other user of the qt_delete_later_unique_ptr, NewReleaseMonitor.

@ricab @andrei-toterman let me know what you think of the new approach.

- Moved the disconnect() to the QtDeleteLater custom deleter. - Removed SSHFSMountHandler::is_active override - Removed a redundant process.reset() - SSHFSMountHandler::deactivate_impl will now kill() when force==true Signed-off-by: Mustafa Kemal Gilor <mustafa.gilor@canonical.com>

xmkg · 2025-03-21T13:10:27Z

tests/test_sshfs_mount_handler.cpp

-    EXPECT_CALL(mock_vm, ssh_port()).Times(3);
-    EXPECT_CALL(mock_vm, ssh_hostname()).Times(3);
-    EXPECT_CALL(mock_vm, ssh_username()).Times(3);
+    EXPECT_CALL(mock_vm, ssh_port()).Times(2);


Because is_active no longer creates an SSHSession.

xmkg · 2025-03-21T14:08:49Z

@Sploder12 I saw the issue you mentioned. So it seems like two problems are happening at once here:

a.) Signals are not being disconnected before the handlers are destroyed
b.) sshfs_server is not being gracefully shut down

The a-) is addressed in this PR, so the daemon won't crash regardless of whether sshfs_server ended gracefully or not. But, b-) is not addressed here, and I think we should fix it too, since killing a filesystem process non-gracefully is never a good idea.

We were discussing the b-) with @ricab and he mentioned that there was a relevant patch: https://github.com/canonical/multipass-private/pull/672/files#diff-ca7f8e14e6bf98510802d0c2c17845c94ca23025f9d0cc8302a45ce743c496cb . The patch touches the sshfs_server indirectly by changing the make_quit_watchdog implementation, and as far as my understanding goes, this patch ensures that the receiving side (sshfs_server) does a graceful shutdown when the Ctrl-C is received. But the reality is the daemon never sends a Ctrl-C in the first place -- I'll explain that part below.

The code calls for a graceful termination, but right now the terminate() implementation in Windows is just a kill(), and it's not sending the Ctrl-C signal that sshfs_server expects for a graceful shutdown. The reason why is the implementation that originally designed to send the Ctrl-C has a bug, because the child and the parent share the same console group.

Fixing that bug requires us to hijack the QProcess's CreateProcess Win32 API call, and then create a dedicated console group for the process so we can reliably send a Ctrl-C. The other avenue is to give a proper name to the event here, so another process (in this case it's the multipassd) can signal it directly, without having to rely on Ctrl-C. But that means it will be only applicable to the child processes that we own the source code of, not so practical.

What do you think?

xmkg · 2025-03-21T14:26:39Z

Fixing that bug requires us to hijack the QProcess's CreateProcess Win32 API call, and then create a dedicated console group for the process so we can reliably send a Ctrl-C

Hmm, this looks promising: https://doc.qt.io/qt-6/qprocess-createprocessarguments.html

Sploder12 · 2025-03-21T14:35:05Z

Fixing that bug requires us to hijack the QProcess's CreateProcess Win32 API call, and then create a dedicated console group for the process so we can reliably send a Ctrl-C. The other avenue is to give a proper name to the event here, so another process (in this case it's the multipassd) can signal it directly, without having to rely on Ctrl-C. But that means it will be only applicable to the child processes that we own the source code of, not so practical.

What do you think?

Giving the event a name would fix that specific problem and probably be a pretty quick fix. But if the crash is mitigated from this PR then I think a proper fix would be more appropriate. Would you be able to hijack that call without modifying Qt? Or perhaps a more recent version of Qt fixes this or provides a better alternative? Either way, I think fixing that Ctrl-C bug would be the best solution.

xmkg · 2025-03-21T14:42:29Z

But if the crash is mitigated from this PR then I think a proper fix would be more appropriate.

I agree.

Would you be able to hijack that call without modifying Qt? Or perhaps a more recent version of Qt fixes this or provides a better alternative?

So they indeed made it possible to pass additional flags to the CreateProcess in Windows, so it's possible to do:

 process.setCreateProcessArgumentsModifier(
        [](QProcess::CreateProcessArguments* args) { args->flags |= CREATE_NEW_PROCESS_GROUP; });

I'll try this to see if it works.

xmkg · 2025-03-21T16:40:12Z

I'll try this to see if it works.

After debugging the thing for 3 hours straight, I confirm that it works. By creating the child process with:

 process.setCreateProcessArgumentsModifier(
        [](QProcess::CreateProcessArguments* args) { args->flags |= CREATE_NEW_PROCESS_GROUP; });

... we can directly signal it from the parent, without worrying about affecting others (parent included):

// Child being in its own dedicated console group means only the child will get the signal.
// Hence, there's no need to modify the parent's signal handling
BOOL result = GenerateConsoleCtrlEvent(CTRL_C_EVENT, child->process_id());

And finally, in the child:

// This is crucial. Otherwise, the Ctrl-C will be swallowed by the default handler
// and it won't reach to the registered handler.
// The CREATE_NEW_PROCESS_GROUP documentation slightly mentions this and 
// it doesn't tell how to disable it:
//"If this flag is specified, CTRL+C signals will be disabled for all processes within
// the new process group."
SetConsoleCtrlHandler(nullptr, false);
// Now register the real handler.
SetConsoleCtrlHandler(signal_handler, TRUE);

Sploder12 · 2025-03-21T16:44:58Z

After debugging the thing for 3 hours straight, I confirm that it works.

Great work! Windows documentation can be pretty rough!

xmkg · 2025-03-21T20:15:50Z

b.) sshfs_server is not being gracefully shut down

Implemented in a separate PR in private side: https://github.com/canonical/multipass-private/pull/733

Sploder12

LGTM, I didn't crash when unmounting! There is a very minor typo.

include/multipass/qt_delete_later_unique_ptr.h

Co-authored-by: trevor-shoe <trevorschupbach@gmail.com> Signed-off-by: Mustafa Kemal Gılor <mustafagilor@gmail.com>

xmkg · 2025-03-26T15:13:53Z

LGTM, I didn't crash when unmounting! There is a very minor typo.

The test for part b-) in the private side is pretty similar too. The child executable simply should not crash. This can be validated by attaching a debugger to the sshfs_server after mounting and setting a few breakpoints in the graceful shutdown path.

xmkg · 2025-03-26T15:40:37Z

Hi @andrei-toterman, this one is good to go. Could you review the latest changes and let me know if it looks OK?

xmkg · 2025-04-02T08:33:33Z

@georgeliao @ricab this one just needs a secondary review to go. I'd appreciate it if one of you could review it.

xmkg · 2025-04-08T14:39:49Z

@andrei-toterman @Sploder12 thanks for the reviews!

xmkg added 2 commits March 12, 2025 19:37

[sshfs-mount-handler] chore: modernize log calls

a651bca

Signed-off-by: Mustafa Kemal Gilor <mustafa.gilor@canonical.com>

xmkg requested review from georgeliao, ricab and Sploder12 March 12, 2025 16:48

xmkg changed the title ~~Bugfix/sshfs mount handler ensure signal disconnect~~ [sshfs-mount-handler] ensure signals are disconnected upon destruction Mar 12, 2025

Sploder12 mentioned this pull request Mar 17, 2025

SSHFS mount failing to terminate crashes daemon on instance delete #3799

Open

ricab reviewed Mar 17, 2025

View reviewed changes

[sshfs-mount-handler] review fixes

69b228e

- wording and typing changes Signed-off-by: Mustafa Kemal Gilor <mustafa.gilor@canonical.com>

andrei-toterman reviewed Mar 19, 2025

View reviewed changes

xmkg commented Mar 21, 2025

View reviewed changes

Sploder12 previously approved these changes Mar 26, 2025

View reviewed changes

include/multipass/qt_delete_later_unique_ptr.h Outdated Show resolved Hide resolved

Update include/multipass/qt_delete_later_unique_ptr.h

c051cb1

Co-authored-by: trevor-shoe <trevorschupbach@gmail.com> Signed-off-by: Mustafa Kemal Gılor <mustafagilor@gmail.com>

xmkg dismissed Sploder12’s stale review via c051cb1 March 26, 2025 15:03

Sploder12 approved these changes Mar 26, 2025

View reviewed changes

xmkg mentioned this pull request Mar 28, 2025

Multipass daemon is being shutdown non-gracefully on snap stop/snap restart #4015

Closed

ricab requested review from andrei-toterman and removed request for georgeliao April 4, 2025 15:07

andrei-toterman approved these changes Apr 8, 2025

View reviewed changes

xmkg added this pull request to the merge queue Apr 8, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 8, 2025

xmkg added this pull request to the merge queue Apr 8, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 8, 2025

xmkg added this pull request to the merge queue Apr 8, 2025

Merged via the queue into main with commit e932d5f Apr 8, 2025
16 checks passed

xmkg deleted the bugfix/sshfs-mount-handler-ensure-signal-disconnect branch April 8, 2025 22:49

ricab added this to the 1.16.0 milestone Jun 13, 2025

		"SSHFSMountHandler is going to be destroyed, but Process still has `finished` signal connected. "
		"Disconnected it to prevent errors.");

	"SSHFSMountHandler is going to be destroyed, but Process still has `finished` signal connected. "
	"Disconnected it to prevent errors.");
	"Stopped listening to sshfs_server process only upon SSHFSMountHandler destruction");

[sshfs-mount-handler] ensure signals are disconnected upon destruction #3978

[sshfs-mount-handler] ensure signals are disconnected upon destruction #3978

Uh oh!

Conversation

xmkg commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ricab left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xmkg Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xmkg Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xmkg Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xmkg commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xmkg commented Mar 21, 2025

Uh oh!

Sploder12 commented Mar 21, 2025

Uh oh!

xmkg commented Mar 21, 2025

Uh oh!

xmkg commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sploder12 commented Mar 21, 2025

Uh oh!

xmkg commented Mar 21, 2025

Uh oh!

Sploder12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xmkg commented Mar 26, 2025

Uh oh!

xmkg commented Mar 26, 2025

Uh oh!

xmkg commented Apr 2, 2025

Uh oh!

xmkg commented Apr 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xmkg commented Mar 12, 2025 •

edited

Loading

codecov bot commented Mar 12, 2025 •

edited

Loading

xmkg Mar 19, 2025 •

edited

Loading

xmkg Mar 21, 2025 •

edited

Loading

xmkg Mar 21, 2025 •

edited

Loading

xmkg commented Mar 21, 2025 •

edited

Loading

xmkg commented Mar 21, 2025 •

edited

Loading