Skip to content

Straightforward way for getting most abundant categories #51

@grst

Description

@grst

In GitLab by @grst on Mar 31, 2020, 16:52

We need it regulary to get the top n of a category, e.g. v-gene or clonotype.

This can be achieved in a pandas onliner, if one knows how to do it...

# This is probably hacky, we might think about a better way, but we need the most abundant clonotypes

top_clonotypes = adata.obs.clonotype.value_counts()[:8].index.values.tolist() # A better way might be needed especailly to take normalization into account
top_vgenes = adata.obs.TRB_1_v_gene.value_counts()[:8].index.values.tolist()

It would be more user-friendly to have a convenience function to this,
for instance:

ir.tl.top_clonotypes(n=10)
ir.tl.top_genes(gene="V", chain="TRA_1")

or maybe even more flexible

ir.tl.top_n(col="clonotype", n=10)

Use it for plotting:

sc.pl.umap(adata, color="clonotype", groups=ir.tl.top_n("clonotype", 10))

@szabogtamas, what do you think of this approach?

This could actually become a scanpy feature as it's not scirpy specific.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions