MuData API considerations

### Description of feature

In the course of implementing the new data structure (https://github.com/scverse/scirpy/issues/327), I plan to make [MuData](https://muon.readthedocs.io/en/latest/notebooks/quickstart_mudata.html) the default way 
of interacting with paired single-cell gene expression/AIRR data. 

I'm thinking about how the API should be adapted for this. 

## Data structure recap
We are talking about a MuData object that looks like this: 
```
MuData object with n_obs × n_vars = 3000 × 30727
  2 modalities
    gex:	3000 x 30727
      obs:	'cluster_orig', 'patient', 'sample', 'source'
      uns:	'cluster_orig_colors'
      obsm:	'X_umap_orig'
    airr:	3000 x 0
      obs:	'high_confidence', 'is_cell', 'clonotype_orig'
      obsm:	'airr', 'chain_indices'
```

The `gex` modality contains the gene expression data, the `airr` modaility the 
receptor data. The `airr` modality has no `.X`, the relevant data are stored in `.obsm`. 

 * Most scirpy functions only operate on the `airr` modality. 
 * Some functions use both `airr` and `gex` data. 
 * For visualization, it is useful to plot `airr.obs` on top of `gex` embeddings, or use columns from both `gex.obs` and `airr.obs` in a single plot.

Since the `airr` modality only has `obs` and `obsm`, it would be thinkable to
(additionally) support the use of a single `AnnData` object with gene expression datain `.X` and receptor data in `.obsm`. 



## API consideration for unimodal data
*(i.e. scirpy functions that only use the `airr` modality)*

**1. For a function that only operates on the AIRR data, what is the preferred option to interact with mudata?**

   ```python
   ir.tl.chain_qc(mdata, airr_key="airr", **kwargs)
   ```
   or
   ```python
   ir.tl.chain_qc(mdata['airr'], **kwargs)
   ```

**2. Should a function that only operates on the AIRR data add columns to `mdata` or `adata`?**

   ```python
   def chain_qc(mdata, airr_key="airr", **kwargs):
       adata = mdata[airr_key]
       adata.obs["new_col"] = np.zeros((adata.n_obs, ))
       # should this be called by the function automatically? 
       mdata.update_obs()
   ```

**3. Use muon for plotting or scanpy?**

   Is it preferable to call

   ```python
   mu.pl.umap(mdata, color="gex:cluster")
   ```
   or 
   ```python
   sc.pl.umap(mdata['gex'], color="cluster")
   ```

   If the former, is there a recommended way to transfer `.obsm` from the GEX AnnData to MuData (similar to `update_obs` for `.obs`)? 


## API considerations for multimodal data
*(i.e. functions that consume both the `airr` and `gex` modalities)*

I have a function that depends on a gene expression neighborhood graph and `.obs` annotations based on AIRR data. 

API options
 1. pass both modalities (probably not)
    ```python
    ir.tl.clonotype_modularity(mdata['gex'], mdata['airr'], airr_col="clone_id")
    ```
 2. pass mdata and mod_keys
    ```python
    ir.tl.clonotype_modularity(mdata, gex_mod="gex", airr_col="airr:clone_id")
    ```
 3. Store the gene expression neighborhood graph in mudata
    ```python
    # is there something like mdata.update_obsm() ? 
    mdata.obsp["connectivities"] = mdata["gex"].obsp["connectivities"]
    ir.tl.clonotype_modularity(mdata, airr_col="airr:clone_id")
    ```
    
---

## Possible solution

I'm leaning towards having all functions operate on `MuData` directly, 
i.e. 
```python
ir.tl.something(mdata, airr_key="airr", col="airr:xxx")
```

with the option to also pass an anndata object for backwards-compatibility (in that case, `airr_key` will be ignored). 
```python
ir.tl.something(adata, col="xxx")
```









Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MuData API considerations #383

Description of feature

Data structure recap

API consideration for unimodal data

API considerations for multimodal data

Possible solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MuData API considerations #383

Description

Description of feature

Data structure recap

API consideration for unimodal data

API considerations for multimodal data

Possible solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions