Skip to content

How to adjust the searching window for positions near the end of the sequence #2

@shenwei356

Description

@shenwei356

Hi, I'm a little confused about how to adjust the searching window for positions near the end of the sequence.

I know that these positions need to be kept for producing an equal number of strobemers as k-mers. But in sequence mapping/searching scenarios, unlike k-mers, the strobemers near the end of the sequences would be different from these in the reference sequence, because of the incomplete searching window.

In the function seq_to_randstrobes2_iter:

window_p_start = p + strobe_w_min_offset if p + strobe_w_max_offset <= len(hash_seq_list) else max( (p + strobe_w_min_offset) -  (p + strobe_w_max_offset - len(hash_seq_list)), p )
window_p_end = min(p + strobe_w_max_offset, len(hash_seq_list))

For positions near the end of the sequence (p + strobe_w_max_offset > len(hash_seq_list)),

max( (p + strobe_w_min_offset) -  (p + strobe_w_max_offset - len(hash_seq_list)), p )

equals to

max( len(hash_seq_list) - (strobe_w_max_offset - strobe_w_in_offset), p)

As I understand, it keeps the size of the searching window and moves the window to the left (box A in the figure below), am I right? Have you tried the way in box B?

Screenshot_20210413_230241

Besides, for order 3, the windows of m2 and m3 would have some duplicated regions, is this OK?

Screenshot_20210413_230706

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions