Straightforward way for getting most abundant categories

In GitLab by @grst on Mar 31, 2020, 16:52

We need it regulary to get the top n of a category, e.g. v-gene or clonotype. 

This can be achieved in a pandas onliner, if one knows how to do it...

```python
# This is probably hacky, we might think about a better way, but we need the most abundant clonotypes

top_clonotypes = adata.obs.clonotype.value_counts()[:8].index.values.tolist() # A better way might be needed especailly to take normalization into account
top_vgenes = adata.obs.TRB_1_v_gene.value_counts()[:8].index.values.tolist()
```

It would be more user-friendly to have a convenience function to this, 
for instance: 

```python
ir.tl.top_clonotypes(n=10)
ir.tl.top_genes(gene="V", chain="TRA_1")
```

or maybe even more flexible
```python
ir.tl.top_n(col="clonotype", n=10)
```
Use it for plotting: 
```python
sc.pl.umap(adata, color="clonotype", groups=ir.tl.top_n("clonotype", 10))
```

@szabogtamas, what do you think of this approach? 

This could actually become a `scanpy` feature as it's not `scirpy` specific.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Straightforward way for getting most abundant categories #51

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Straightforward way for getting most abundant categories #51

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions