-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Feature description
Task backtrace
Task backtrace show:
- task id
- all futures (starting from root async fn) with it name, information about on which await point it’s waiting for, information about which future it's waiting for and serial number in futures stack
New async functions
Create async backtrace
, async backtrace all
and async task {id}
command.
async backtrace
- print info aboutblock_on
threads and async workers (with current task backtrace)async backtrace all
- same asasync backtrace
but prints backtrace for all tasks in the systemasync task {id}
- prints the task with the selected id (or current task if id is not set)
Example (async backtrace all
):
Thread #1 (pid: 28406) block on:
#0 async fn tokio_tcp::main suspended at await point 1
#1 async fn tokio::net::tcp::listener::accept suspended at await point 0
Async worker #2 (pid: 28796, local queue length: 3)
Async worker #3 (pid: 28797, local queue length: 4)
Task: 4
#0 async fn tokio_tcp::main::{async_block#0} suspended at await point 1
#1 sleep future, sleeping already happened 18 seconds ago
Task: 5
#0 async fn tokio_tcp::main::{async_block#0} suspended at await point 1
#1 sleep future, sleeping already happened 20 seconds ago
Implementation notes
Currently we will focus on tokio
v1.41 async multithread runtime.
Explore the block_on
threads
Thread is a block_on
if contains a CachedParkThread::block_on
function in stack trace. For searching a task that currently executed bs
exlpore f
variable. Then this task exploring using task backtrace (described below) algorithm.
Explore the worker threads
Each worker
thread in tokio contains a CONTEXT
thread local value inside. So if thread is a worker it must:
- contains a not null
CONTEXT
global variable required - contains a
run_task
function in a stack trace (means this worker in a running state) - contains a
Context::park
function in a stack trace (means this worker in a parked state)
Information about worker queue extracted by take this value from CONTEXT
(using bs
function). Path to this data structure is CONTEXT.scheduler.worker.core.run_queue
(using bs
DQE: var (*(*(*CONTEXT.scheduler.inner).__0.core.value.__0).run_queue.inner).data
)
Information about current worker tasks taken from task backtrace (described below) algorithm.
Explore tasks
Tokio have a something like "task registry": owned tasks. This structure can be found from CONTEXT
variable in special Shared
structure. Path to this data is CONTEXT.current.handle.shared.owned
(using bs
DQE: (*CONTEXT.current.handle.value.__0.__0).data.shared.ownder.list
). After extracting tasks they exploring by task backtrace (described below) algorithm.
Create a task backtrace
After receiving the task one way or another BugStalker
build a future backtrace. bs
extract task id from task header and then try to find poll
function in task vtable. This poll
function using for determing root future type.
Futures for async functions always is a rust enums (in DWARF point of view) generated by compiler. This enums has states like "Unresumed", "Suspend", "Panicked" and so on. It is important to mention that the future types generated for asynchronous functions may have an __awaitee
attribute that reflects the waiting of the asynchronous function at a particular await point. If future is not an async fn - than its may have another DWARF representation.
So, root future of any task is an async function for which task was created (using tokio::spawn
for example). If the root future is waiting for another future to be executed, BugStalker
adds it to the backtrace trace, and so on.
old (and sometimes wrong) notes
"Root" of our batcktrace is a tokio
tasks. Any suspended task must wait some future, that future wait another and so on. The place where we can find root futures (tasks) is the local queues (TODO may be not only local queues, need rnd) of each worker (worker is a system thread). Place where we can find current task - is a frame where run_task
function executed.
Unfortunately Tokyo does not provide some sort of explicit metadata exposed for debuggers. So there is no easy way to get tasks-queues addresses in memory. Lets describe possible solutions:
1. Set breakpoints in task constructors/destructors and store task information at these points
This solution good for tokio oracle, because give us "real-time" information. Visualization of this information we can use in tui interface, and it's pretty good. But this solution give us a big overhead - we stop whole program when task created or drop'd.
2. Set breakpoints at tokio
runtime initialization process and save pointers task queues
This solution is better then previous one cause we set breakpoint only once, so overhead is minimal. But what if debugee program not using a tokio runtime? What should be the reaction to an error when setting a breakpoint? It looks like we need to check some characters before starting the program to answer the question - "is tokio
runtime used?".
3. Move down the call stack at every async backtrace
command and find pointers into task queues
When user enter async backtrace
command we can to move down the call stack until the worker loop acting as the scheduler is reached (currently this is a worker::Context::run
function). Then we can observe local queue of the worker. Then we do the same for all threads. Point to locals queues can be cached.
It looks like we should stick with solution number 3; it allows us to perform computations only in response to a
async backtrace
command and does not have any additional overhead.