Skip to content

Easy-access getter functions #184

@grst

Description

@grst

With the implementation of the new datastructure (#327), it becomes rather tricky to get information
from the airr rearrangment schema (e.g. what is the "c_call" of the primary "VJ" chain?).

Previously this was possible simply with

adata.obs["IR_VJ_1_c_call"]

With the new datastructure

adata[:, "c_call"].X

just yields an awkward array with a variable number of chains per cell. The information which
chain is VJ/VDJ and primary or secondary is hidden away in adata.obsm.

This motivates the implementation of long-proposed easy-access getter/setter functions. At the very least to

  • retrieve AIRR rearrangement variables for a certain chain.

But possibly also with convenience functions, e.g.

The latter is of less importance, but the interface needs to be designed jointly, therefore this is also a topic in this issue.


To get AIRR data, we need something like

ir.get(adata, "locus", "VJ_1") -> pd.Series

We need it regulary to get the top n of a category, e.g. v-gene or clonotype.

This can be achieved in a pandas onliner, if one knows how to do it...

# This is probably hacky, we might think about a better way, but we need the most abundant clonotypes

top_clonotypes = adata.obs.clonotype.value_counts()[:8].index.tolist() # A better way might be needed especailly to take normalization into account
top_vgenes = adata.obs.TRB_1_v_gene.value_counts()[:8].index.tolist()

It would be more user-friendly to have a convenience function to this,
for instance:

ir.tl.top_n(col="clonotype", n=10)

Use it for plotting:

sc.pl.umap(adata, color="clonotype", groups=ir.tl.top_n("clonotype", 10))

To discuss

  • retrieve multiple columns at once / vectorization?
  • what about "extra" chains?
  • plotting

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions