generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Closed
Labels
🐛 bugSomething isn't workingSomething isn't working📚 documentationImprovements or additions to documentationImprovements or additions to documentation
Description
The docstring says
- `"ffd"` (First Fit Decreasing): Slower but preserves sequence boundaries. Sequences are never cut in the
middle.
but the implemented algorithm searches the bin with the least remaining capacity, which is called Best-Fit-Decreasing in the paper https://arxiv.org/abs/2404.10830.
- The algorithm before the optimization:
Lines 507 to 517 in fef915e
# Optimized bin packing using a priority queue approach bins_by_remaining = defaultdict(list) # remaining_space -> [bin_indices] bins = [] # [(current_length, seq_indices)] for i, (seq_len, _start, _end) in enumerate(sequences): # Find bins with enough space using the dictionary placed = False for remaining in range(seq_len, seq_length + 1): if bins_by_remaining[remaining]: # Use the first available bin with this remaining space bin_idx = bins_by_remaining[remaining].pop() - In v0.19.0 https://github.com/huggingface/trl/blob/v0.19.0/trl/data_utils.py#L551-L561, it looks like Appendix B (An Illustration of optimized BFD Algorithm) of https://arxiv.org/pdf/2404.10830v2
Metadata
Metadata
Assignees
Labels
🐛 bugSomething isn't workingSomething isn't working📚 documentationImprovements or additions to documentationImprovements or additions to documentation