-
Notifications
You must be signed in to change notification settings - Fork 17
Description
irlba
seems to behave inconsistently depending on whether column-centring is performed explicitly or via center
. To demonstrate, I've mocked up some single-cell RNA-seq data:
set.seed(1000)
ncells <- 100
ngenes <- 10000
counts <- matrix(as.double(rpois(ncells*ngenes, lambda=100)), ncol=ncells)
centers <- rowMeans(counts)
If I apply irlba
on the transposed matrix (i.e., genes are now columns, cells are rows) with explicit centring outside the function or via center
, I get substantially different results:
library(irlba)
set.seed(100)
out <- irlba(t(counts - centers), nu=10, nv=10)
head(out$d)
## [1] 1105.339 1091.932 1086.880 1085.875 1083.415 1080.327
set.seed(100)
out2 <- irlba(t(counts), center=centers, nu=10, nv=10)
head(out2$d)
## [1] 3961.623 2629.205 1221.687 1190.174 1170.183 1165.110
I might have expected some small differences due to vagaries of random initialization or numerical precision, but these differences in the singular values seem to be rather large. On a related note, running the following code in a fresh R session results in a segfault ("memory not mapped"):
set.seed(1000)
ncells <- 100
ngenes <- 10000
counts <- matrix(rpois(ncells*ngenes, lambda=100), ncol=ncells)
centers <- rowMeans(counts)
out <- irlba::irlba(t(counts), center=centers, nu=10, nv=10)
Presumably, it's something to do with the integer nature of counts
, as coercion to double-precision avoids the problem. Anyway, here's my session information:
R version 3.4.0 Patched (2017-04-24 r72627)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS
Matrix products: default
BLAS: /home/cri.camres.org/lun01/Software/R/R-3-4-branch_devel/lib/libRblas.so
LAPACK: /home/cri.camres.org/lun01/Software/R/R-3-4-branch_devel/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] irlba_2.2.1 Matrix_1.2-11
loaded via a namespace (and not attached):
[1] compiler_3.4.0 grid_3.4.0 lattice_0.20-35