Easy-access getter functions

With the implementation of the new datastructure (#327), it becomes rather tricky to get information 
from the airr rearrangment schema (e.g. what is the "c_call" of the primary "VJ" chain?). 

Previously this was possible simply with
```python
adata.obs["IR_VJ_1_c_call"]
```

With the new datastructure
```python
adata[:, "c_call"].X
```
just yields an awkward array with a variable number of chains per cell. The information which 
chain is VJ/VDJ and primary or secondary is hidden away in `adata.obsm.`

This motivates the implementation of long-proposed easy-access getter/setter functions. At the very least to

 * retrieve AIRR rearrangement variables for a certain chain.

But possibly also with convenience functions, e.g.
 * to retrieve the most abundant categories (previously discussed in #51). 

The latter is of less importance, but the interface needs to be designed jointly, therefore this is also a topic in this issue. 

---

To get AIRR data, we need something like

```python
ir.get(adata, "locus", "VJ_1") -> pd.Series
```
---

We need it regulary to get the top n of a category, e.g. v-gene or clonotype. 

This can be achieved in a pandas onliner, if one knows how to do it...

```python
# This is probably hacky, we might think about a better way, but we need the most abundant clonotypes

top_clonotypes = adata.obs.clonotype.value_counts()[:8].index.tolist() # A better way might be needed especailly to take normalization into account
top_vgenes = adata.obs.TRB_1_v_gene.value_counts()[:8].index.tolist()
```

It would be more user-friendly to have a convenience function to this, 
for instance: 

```python
ir.tl.top_n(col="clonotype", n=10)
```

Use it for plotting: 
```python
sc.pl.umap(adata, color="clonotype", groups=ir.tl.top_n("clonotype", 10))
```

---

To discuss

 - [ ] retrieve multiple columns at once / vectorization? 
 - [ ] what about "extra" chains? 
 - [x] plotting 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Easy-access getter functions #184

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Easy-access getter functions #184

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions