Skip to content

Conversation

apcamargo
Copy link
Contributor

@apcamargo apcamargo commented Feb 15, 2025

This PR introduces the decode_phred function that takes in a string representing Phred-encoded quality scores and returns a tuple of integers representing the quality scores themselves. The function uses Phred+33 by default and allows the user to use Phred+64 via the base_64 parameter.

>>> from needletail import decode_phred
>>> decode_phred("#</</BBFFFBF<")
(2, 27, 14, 27, 14, 33, 33, 37, 37, 37, 33, 37, 27)

Copy link
Contributor

@audy audy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apcamargo I think it's probably worth adding this to SequenceRecord in the underlying Rust implementation.

@apcamargo
Copy link
Contributor Author

Thanks, @audy. I don't have much familiarity with the Rust library. Could you provide any pointers of where this should go? Maybe a method for SequenceRecord in record.rs?

@audy
Copy link
Contributor

audy commented Mar 7, 2025

Thanks, @audy. I don't have much familiarity with the Rust library. Could you provide any pointers of where this should go? Maybe a method for SequenceRecord in record.rs?

I think the best place is probably parser/record.rs. parser/fastq.rs has methods for returning data straight from the file buffer. But since this requires some transformation of that data, I think parser/record.rs makes the most sense.

@apcamargo
Copy link
Contributor Author

@audy I updated this PR to use the functions from the Rust library. I also moved the function out of the Record class to keep it consistent with reverse_complement. Tests were updated accordingly.

normalize is still the odd one out since it also exists as a method that modifies the object. Maybe we can think about making things more consistent in a future breaking update?

Copy link
Contributor

@audy audy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@audy
Copy link
Contributor

audy commented Apr 7, 2025

normalize is still the odd one out since it also exists as a method that modifies the object. Maybe we can think about making things more consistent in a future breaking update?

I think I would prefer a method that returns a new SequenceRecord with a normalized sequence (same in the underlying Rust implementation) rather than modifying the record in place. What do you think @apcamargo?

@audy audy merged commit d1fdc35 into onecodex:master Apr 7, 2025
16 checks passed
@apcamargo
Copy link
Contributor Author

Agreed! Let me know how you prefer to do this, since it would break the API

@apcamargo apcamargo deleted the python-phred-quality-score branch April 18, 2025 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants