-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Hi, I'm a little confused about how to adjust the searching window for positions near the end of the sequence.
I know that these positions need to be kept for producing an equal number of strobemers as k-mers. But in sequence mapping/searching scenarios, unlike k-mers, the strobemers near the end of the sequences would be different from these in the reference sequence, because of the incomplete searching window.
In the function seq_to_randstrobes2_iter
:
window_p_start = p + strobe_w_min_offset if p + strobe_w_max_offset <= len(hash_seq_list) else max( (p + strobe_w_min_offset) - (p + strobe_w_max_offset - len(hash_seq_list)), p )
window_p_end = min(p + strobe_w_max_offset, len(hash_seq_list))
For positions near the end of the sequence (p + strobe_w_max_offset > len(hash_seq_list)
),
max( (p + strobe_w_min_offset) - (p + strobe_w_max_offset - len(hash_seq_list)), p )
equals to
max( len(hash_seq_list) - (strobe_w_max_offset - strobe_w_in_offset), p)
As I understand, it keeps the size of the searching window and moves the window to the left (box A in the figure below), am I right? Have you tried the way in box B?
Besides, for order 3, the windows of m2
and m3
would have some duplicated regions, is this OK?