Additional modes (surface proteins, methylation, …)

In cell hashing or CITE-seq datasets, an additional count matrix is produced and it stores how many ADT/HTO barcodes (see this for details https://genomebiology.biomedcentral.com/track/pdf/10.1186/s13059-018-1603-1) a cell has. Then depending on the experiment type we use these barcodes to either demultiplex cells into their original "sources" (HTO case) or to quantify protein expression (ADT case). Here is the PBMC HTO file from Seurat tutorial (rows are sample names i.e. HTO barcode IDs, columns are cells):

![image](https://user-images.githubusercontent.com/1140359/66957841-f885e180-f034-11e9-8831-bd1b13f925a5.png)

Link to file: https://www.dropbox.com/sh/c5gcjm35nglmvcv/AABGz9VO6gX9bVr5R2qahTZha?dl=0&preview=pbmc_hto_mtx.rds

Right now, we can store these counts in `adata.obsm` as @wflynny suggested, however there is no place to store barcode strings (or protein names or sample names depending on what barcodes represent) in `adata.obsm`. One hack would be to store them in adata.uns but that'd be very ugly. Alternatively one can store everything in `adata.obs` but that'd also pollute the obs and ignore the multivariate nature of the barcodes.

What would be a good solution here? Would it make sense to add "column names" to an `adata.obsm`? Since they're currently stored as numpy arrays this seems infeasible but how much effort is it to store them as dataframes (like obs) instead of matrices? Alternatively, `sc.get` might allow us to access a group of columns in `adata.obs` for convenience.

Let me know what you think. (Btw, related discussion: https://github.com/theislab/scanpy/issues/351)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Additional modes (surface proteins, methylation, …) #237

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Additional modes (surface proteins, methylation, …) #237

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions