Skip to content

async backtrace - like backtrace command but shows futures await-stack #27

@godzie44

Description

@godzie44

Feature description

Task backtrace

Task backtrace show:

  • task id
  • all futures (starting from root async fn) with it name, information about on which await point it’s waiting for, information about which future it's waiting for and serial number in futures stack

New async functions

Create async backtrace, async backtrace all and async task {id} command.

  • async backtrace - print info about block_on threads and async workers (with current task backtrace)
  • async backtrace all - same as async backtrace but prints backtrace for all tasks in the system
  • async task {id} - prints the task with the selected id (or current task if id is not set)

Example (async backtrace all):

Thread #1 (pid: 28406) block on:
#0 async fn tokio_tcp::main suspended at await point 1
#1 async fn tokio::net::tcp::listener::accept suspended at await point 0

Async worker #2 (pid: 28796, local queue length: 3)
Async worker #3 (pid: 28797, local queue length: 4)

Task: 4
#0 async fn tokio_tcp::main::{async_block#0} suspended at await point 1
#1 sleep future, sleeping already happened 18 seconds ago 
Task: 5
#0 async fn tokio_tcp::main::{async_block#0} suspended at await point 1
#1 sleep future, sleeping already happened 20 seconds ago 

Implementation notes

Currently we will focus on tokio v1.41 async multithread runtime.

Explore the block_on threads

Thread is a block_on if contains a CachedParkThread::block_on function in stack trace. For searching a task that currently executed bs exlpore f variable. Then this task exploring using task backtrace (described below) algorithm.

Explore the worker threads

Each worker thread in tokio contains a CONTEXT thread local value inside. So if thread is a worker it must:

  • contains a not null CONTEXT global variable required
  • contains a run_task function in a stack trace (means this worker in a running state)
  • contains a Context::park function in a stack trace (means this worker in a parked state)

Information about worker queue extracted by take this value from CONTEXT (using bs function). Path to this data structure is CONTEXT.scheduler.worker.core.run_queue (using bs DQE: var (*(*(*CONTEXT.scheduler.inner).__0.core.value.__0).run_queue.inner).data)

Information about current worker tasks taken from task backtrace (described below) algorithm.

Explore tasks

Tokio have a something like "task registry": owned tasks. This structure can be found from CONTEXT variable in special Shared structure. Path to this data is CONTEXT.current.handle.shared.owned (using bs DQE: (*CONTEXT.current.handle.value.__0.__0).data.shared.ownder.list). After extracting tasks they exploring by task backtrace (described below) algorithm.

Create a task backtrace

After receiving the task one way or another BugStalker build a future backtrace. bs extract task id from task header and then try to find poll function in task vtable. This poll function using for determing root future type.

Futures for async functions always is a rust enums (in DWARF point of view) generated by compiler. This enums has states like "Unresumed", "Suspend", "Panicked" and so on. It is important to mention that the future types generated for asynchronous functions may have an __awaitee attribute that reflects the waiting of the asynchronous function at a particular await point. If future is not an async fn - than its may have another DWARF representation.

So, root future of any task is an async function for which task was created (using tokio::spawn for example). If the root future is waiting for another future to be executed, BugStalker adds it to the backtrace trace, and so on.

old (and sometimes wrong) notes

"Root" of our batcktrace is a tokio tasks. Any suspended task must wait some future, that future wait another and so on. The place where we can find root futures (tasks) is the local queues (TODO may be not only local queues, need rnd) of each worker (worker is a system thread). Place where we can find current task - is a frame where run_task function executed.

Unfortunately Tokyo does not provide some sort of explicit metadata exposed for debuggers. So there is no easy way to get tasks-queues addresses in memory. Lets describe possible solutions:

1. Set breakpoints in task constructors/destructors and store task information at these points

This solution good for tokio oracle, because give us "real-time" information. Visualization of this information we can use in tui interface, and it's pretty good. But this solution give us a big overhead - we stop whole program when task created or drop'd.

2. Set breakpoints at tokio runtime initialization process and save pointers task queues

This solution is better then previous one cause we set breakpoint only once, so overhead is minimal. But what if debugee program not using a tokio runtime? What should be the reaction to an error when setting a breakpoint? It looks like we need to check some characters before starting the program to answer the question - "is tokio runtime used?".

3. Move down the call stack at every async backtrace command and find pointers into task queues

When user enter async backtrace command we can to move down the call stack until the worker loop acting as the scheduler is reached (currently this is a worker::Context::run function). Then we can observe local queue of the worker. Then we do the same for all threads. Point to locals queues can be cached.

It looks like we should stick with solution number 3; it allows us to perform computations only in response to a
async backtrace command and does not have any additional overhead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    0.3featureNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions