Skip to content

Parameter settings for ECCITE-seq like approaches  #531

@rfarouni

Description

@rfarouni

Hi,
I would like quantify guide-RNAs (based on 5'-tagged scRNAseq 10X feature barcoding) using Alevin. Read 1 is 26bps long (16 CB +10 UMI) and Read 2 is 58bps long (19 constant region + 21 guide sequence). Now, when I use the following settings

salmon alevin -l ISR --barcodeLength 16 --umiLength 10 --end 5 --featureStart 19 --featureLength 21

I get this error

Transcript to Gene Map File not provided.

However, when I use the following instead

salmon alevin -l ISR --citeseq --featureStart 19 --featureLength 21

It works but since --citeseq assumes --umiLength=12, I get the following output

`[2020-06-03 13:53:30.298] [alevinLog] [info] set CITE-seq minScoreFraction parameter to : 0.797619

[2020-06-03 13:53:30.298] [alevinLog] [info] Found 64 transcripts(+0 decoys, +0 short and +0 duplicate names in the index)
[2020-06-03 13:53:30.298] [alevinLog] [info] Filled with 64 txp to gene entries
[2020-06-03 13:53:30.298] [alevinLog] [info] Found all transcripts to gene mappings
[2020-06-03 13:53:30.304] [alevinLog] [info] Processing barcodes files (if Present)

processed 52 Million barcodes

[2020-06-03 13:54:43.733] [alevinLog] [info] Done barcode density calculation.
[2020-06-03 13:54:43.733] [alevinLog] [info] # Barcodes Used: 52200250 / 52200250.
[2020-06-03 13:54:43.826] [alevinLog] [info] Forcing to use 100000 cells
[2020-06-03 13:54:43.964] [alevinLog] [info] Throwing 49909 barcodes with < 10 reads
[2020-06-03 13:54:43.984] [alevinLog] [info] Total 50092(has 201 low confidence) barcodes
[2020-06-03 13:54:44.191] [alevinLog] [info] Done True Barcode Sampling
[2020-06-03 13:54:44.285] [alevinLog] [info] Total 1.70493% reads will be thrown away because of noisy Cellular barcodes.
[2020-06-03 13:54:45.790] [alevinLog] [info] Done populating Z matrix
[2020-06-03 13:54:45.790] [alevinLog] [info] Total 0 CB got sequence corrected
[2020-06-03 13:54:45.790] [alevinLog] [info] Done indexing Barcodes
[2020-06-03 13:54:45.790] [alevinLog] [info] Total Unique barcodes found: 604589
[2020-06-03 13:54:45.790] [alevinLog] [info] Used Barcodes except Whitelist: 0
[2020-06-03 13:54:46.493] [jointLog] [info] There is 1 library.


[2020-06-03 13:54:46.551] [jointLog] [info] Loading pufferfish index
[2020-06-03 13:54:46.551] [jointLog] [info] Loading dense pufferfish index.
[2020-06-03 13:54:46.552] [jointLog] [info] done
[2020-06-03 13:54:46.552] [jointLog] [info] Index contained 64 targets
[2020-06-03 13:54:46.552] [jointLog] [info] Number of decoys : 0

[2020-06-03 13:54:46.493] [alevinLog] [info] Done with Barcode Processing; Moving to Quantify

processed 52 Million fragmentsvinLog] [info] parsing read library format
hits: 0, hits per frag: 0

[2020-06-03 13:55:42.905] [alevinLog] [info] Starting optimizer

[2020-06-03 13:55:42.931] [alevinLog] [warning] mrna file not provided; using is 1 less feature for whitelisting
[2020-06-03 13:55:42.931] [alevinLog] [warning] rrna file not provided; using is 1 less feature for whitelisting
[2020-06-03 13:55:42.933] [alevinLog] [info] Total 0.00 UMI after deduplicating.
[2020-06-03 13:55:42.933] [alevinLog] [info] Total 0 BiDirected Edges.
[2020-06-03 13:55:42.933] [alevinLog] [info] Total 0 UniDirected Edges.
[2020-06-03 13:55:42.933] [alevinLog] [warning] Skipped 50091 barcodes due to No mapped read
[2020-06-03 13:55:42.934] [alevinLog] [info] Clearing EqMap; Might take some time.
[2020-06-03 13:55:42.940] [alevinLog] [warning] Num Low confidence barcodes too less 1 < 200.Can't performing whitelisting; Skipping
[2020-06-03 13:55:42.940] [alevinLog] [info] Finished optimizer
`

I also tried

salmon alevin -l ISR --chromium --featureStart 19 --featureLength 21 --tgMap guide_to_gene.tsv

But I get the following output

`
[2020-06-03 13:47:17.330] [alevinLog] [info] Found 64 transcripts(+0 decoys, +0 short and +0 duplicate names in the index)
[2020-06-03 13:47:17.330] [alevinLog] [info] Filled with 64 txp to gene entries
[2020-06-03 13:47:17.330] [alevinLog] [info] Found all transcripts to gene mappings
[2020-06-03 13:47:17.336] [alevinLog] [info] Processing barcodes files (if Present)

processed 52 Million barcodes

[2020-06-03 13:48:30.047] [alevinLog] [info] Done barcode density calculation.
[2020-06-03 13:48:30.047] [alevinLog] [info] # Barcodes Used: 52200250 / 52200250.
[2020-06-03 13:48:33.285] [alevinLog] [info] Knee found left boundary at 1174
[2020-06-03 13:48:34.501] [alevinLog] [info] Gauss Corrected Boundary at 148
[2020-06-03 13:48:34.501] [alevinLog] [info] Learned InvCov: 985.935 normfactor: 763.254
[2020-06-03 13:48:34.501] [alevinLog] [info] Total 349(has 201 low confidence) barcodes
[2020-06-03 13:48:35.369] [alevinLog] [info] Done True Barcode Sampling
[2020-06-03 13:48:35.441] [alevinLog] [warning] Total 73.3629% reads will be thrown away because of noisy Cellular barcodes.
[2020-06-03 13:48:35.454] [alevinLog] [info] Done populating Z matrix
[2020-06-03 13:48:35.455] [alevinLog] [info] Total 4286 CB got sequence corrected
[2020-06-03 13:48:35.455] [alevinLog] [info] Done indexing Barcodes
[2020-06-03 13:48:35.455] [alevinLog] [info] Total Unique barcodes found: 604589
[2020-06-03 13:48:35.455] [alevinLog] [info] Used Barcodes except Whitelist: 4282
[2020-06-03 13:48:35.558] [alevinLog] [info] Done with Barcode Processing; Moving to Quantify
...
processed 52 Million fragments
hits: 0, hits per frag: 0

[2020-06-03 13:49:37.892] [jointLog] [info] Computed 0 rich equivalence classes for further processing
[2020-06-03 13:49:37.892] [jointLog] [info] Counted 0 total reads in the equivalence classes
[2020-06-03 13:49:37.893] [jointLog] [info] Number of fragments discarded because they are best-mapped to decoys : 0
[2020-06-03 13:49:37.893] [jointLog] [warning] Found 370 reads with N in the UMI sequence and ignored the reads.
Please report on github if this number is too large
[2020-06-03 13:49:37.893] [jointLog] [info] Mapping rate = 0%

[2020-06-03 13:49:37.893] [jointLog] [info] finished quantifyLibrary()
[2020-06-03 13:49:37.899] [alevinLog] [info] Starting optimizer

[2020-06-03 13:49:38.613] [alevinLog] [warning] mrna file not provided; using is 1 less feature for whitelisting
[2020-06-03 13:49:38.613] [alevinLog] [warning] rrna file not provided; using is 1 less feature for whitelisting
[2020-06-03 13:49:38.614] [alevinLog] [info] Total 0.00 UMI after deduplicating.
[2020-06-03 13:49:38.614] [alevinLog] [info] Total 0 BiDirected Edges.
[2020-06-03 13:49:38.614] [alevinLog] [info] Total 0 UniDirected Edges.
[2020-06-03 13:49:38.614] [alevinLog] [warning] Skipped 348 barcodes due to No mapped read
[2020-06-03 13:49:38.614] [alevinLog] [info] Clearing EqMap; Might take some time.
[2020-06-03 13:49:38.620] [alevinLog] [warning] Num Low confidence barcodes too less 1 < 200.Can't performing whitelisting; Skipping
[2020-06-03 13:49:38.620] [alevinLog] [info] Finished optimizer
Floating point exception (core dumped)
`

Any suggestions on how to get this working are highly appreciated!

Thanks

Metadata

Metadata

Assignees

Labels

alevinissue is primarily related to alevinbugfixed in developthis bug has been fixed in develop and the issue will be closed when merged into master

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions