Releases: huggingface/xet-core
[v1.1.9] Bug Fixes: Parallelism optimizations, metadata updates
🚀 Performance Improvements:
• Improve parallelism in parutils by removing async_scoped
• Increase soft file limits for MacOS
🐛 Bug Fixes:
• Update hf_xet PyPI metadata
🔧 Reliability & Maintenance:
• Improved debuggability with tokio console support
• Add CI builds for MacOS
What's Changed
- parutils makeover remove async_scoped by @assafvayner in #454
- tokio console setup by @assafvayner in #458
- enforce linting on hf_xet by @assafvayner in #462
- Raise soft file handle limits to hard limits on OSX. by @hoytak in #453
- run_and_extract_custom: remove use of explicit tokio_retry without utility by @assafvayner in #460
- Use a valid SPDX identifier as license classifier by @ecederstrand in #464
- CI test on macos by @seanses in #473
- Update PyPI package metadata for
hf-xet
by @rajatarya in #472 - Update hf_xet/README.md for hf_xet project by @rajatarya in #475
- Bumping version to 1.1.9 by @rajatarya in #476
New Contributors
- @ecederstrand made their first contribution in #464
Full Changelog: v1.1.8...v1.1.9
v1.1.8 Bug Fixes
🚀 Performance Improvements:
• Client Caching - Reuses reqwest Client across RemoteClient objects to share connection pools
• Connection Limits - Limits idle connections to prevent resource exhaustion
🐛 Bug Fixes:
• Singleflight Fix - Critical fix preventing permanent error caching when owner tasks are dropped
• DataHash Serialization - Ensures consistent little-endian byte order across platforms
🔧 Reliability & Maintenance:
• Retry Logic Restoration - Restores retry logic accidentally removed in versions 1.1.6 and 1.1.7
What's Changed
- fix: singleflight owner task not removing Call from Group if dropped by @jgodlew in #447
- Add back retry for connection setup and sending request by @seanses in #455
- Fix DataHash hex string serde to little endian by @seanses in #445
- Clean up dependencies (no functionality change) by @seanses in #456
- Cache and reuse reqwest Client by @seanses in #457
- Limit number of idle connections by @hoytak in #459
- update version by @assafvayner in #461
Full Changelog: v1.1.7...v1.1.8
v1.1.7
[v1.1.6] Bug Fixes: Proxy support, process safety, and more
✨ New Features and Improvements
- Proxy support, easing use behind corporate networks. (#413 by @hoytak; addresses #400 - thanks @albertodepaola and @goodsonjr for the initial reports)
- Improvements to
hf_xet
logging; providing facility to log events to a formatted file (#428 by @hoytak)
🐛 Bug Fixes
- Process safety: make running after
os.fork()
safer. (#429 by @hoytak; addresses #415 - thanks @John6666cat for the report) - Respect XDG_CACHE_HOME and ~/ when setting cache directories. (#426 by @hoytak; addresses #417 - thanks @half-duplex for the initial report)
- Lower the default
NUM_RANGE_CONCURRENT_GETS
value to 64 to better respect file descriptor limits (#438 by @assafvayner; addresses #436 - thanks @djholt and @gary149 for the reports) - JWT token handling hardened with a buffer before expiration. (#405 by @jgodlew; addresses #404)
What's Changed
- Streaming shard interface updates by @assafvayner in #392
- WASM poc by @seanses in #272
- Generic retry wrapper to consolidate and streamline retry logic. by @hoytak in #397
- Fix for retry failure due to non-clonability by @hoytak in #402
- Adding buffer to JWT token expiration check by @jgodlew in #405
- Updating chunk and shard cache default sizes by @jsulz in #406
- Simplified Client interface. by @hoytak in #408
- Add correctness tests for aggregate hash functions. by @hoytak in #412
- Enabling proxy support for reqwest by @hoytak in #413
- Thin wasm by @assafvayner in #411
- Move MDB v1 to reference test code; add standalone hash functions by @hoytak in #414
- Add verification hash and file hash functions by @assafvayner in #416
- Use v1 api paths by @assafvayner in #421
- Set shard size limit as max, not target min by @assafvayner in #420
- Remove footer from upload shard payload by @assafvayner in #419
- Errors on shard reading are now logged and ignored. by @hoytak in #424
- Add whether chunk should be checked against global dedup by @coyotte508 in #423
- Logging improvements by @hoytak in #428
- Export hmac function in thin wasm by @coyotte508 in #427
- Make hf_xet fork-exec safe by @hoytak in #429
- Revert use of v1 api paths by @assafvayner in #432
- Limit number of async worker threads on large CPUs by @hoytak in #431
- Respect XDG_CACHE_HOME and ~/ when setting cache directory. by @hoytak in #426
- Associate static semaphores with runtime by @hoytak in #433
- Remove logging from wasm lib by @coyotte508 in #434
Full Changelog: v1.1.5...v1.1.6
[v1.1.5] Bug Fixes: Cert issue fixes & optimizations
This release includes a fix for certificate issues in certain network environments and loading optimizations for dedup lookups.
🧱 Improvements
- Background shard loading (#384): Loads shard lookup tables in the background to reduce
upload_files
startup time. Author: Hoyt Koepke
🐛 Bug Fixes
- Certificate loading (#393): Switched to
load_native_certs()
for efficiency. Author: Hoyt Koepke
What's Changed
- Shard interface updates by @assafvayner in #382
- Background loading for shards by @hoytak in #384
- fix MDBFileInfo::deserialize_async in case of no verification entries by @assafvayner in #388
- Switch cert loading to use load_native_certs(); by @hoytak in #393
- Cargo.toml+lock version update by @rajatarya in #395
Full Changelog: v1.1.4...v1.1.5
[v1.1.4] Bug Fixes: Network Resilience and Performance Optimizations
📶 DNS Resolution & Network Connectivity
- Fixed DNS resolution issues: Implemented custom DNS resolver to force absolute DNS name resolution, addressing issues where DNS resolvers struggled with CAS server addresses and fell back to local search domains
- Enhanced TLS configuration: Updated reqwest to use rustls-tls by default with configurable TLS backends (native-tls, native-tls-vendored options available)
🚀 Performance Optimizations
- Global download concurrency control: Changed download currency limit from per-file to global (default: 128 simultaneous connections) to prevent file handle exhaustion on macOS
- Optimized chunking operations: Converted core Chunk data type from
Arc<[u8]>
tobytes::Bytes
for better memory efficiency and reduced copying. Separated boundary calculation logic from chunk building for future optimization work - Updated shard cache size: Increased default shard cache limit to 16GB, effectively allowing deduplication against 16TB of data
- Streamlined upload payload: Removed footer serialization from upload xorb payload in remote_client for improved efficiency
🤗 Developer Experience
- Issue templates: Added comprehensive GitHub issue templates including bug report forms, feature request forms, and helpful links for better community engagement
What's Changed
- Update chunker to separate out calculation of next boundary by @hoytak in #368
- remove footer serialized from upload xorb payload on remote_client by @assafvayner in #372
- Adding issue templates to repo by @jsulz in #374
- Small optimizations for chunking / upload path by @hoytak in #371
- Switch reqwest to rustls-tls from default; use hickory-dns for dns resolution. by @hoytak in #378
- add ci steps to check cargo.lock is up to date by @assafvayner in #377
- Update shard cache default size. by @hoytak in #381
- Remove hickory-dns and use system dns provider by @hoytak in #380
- Fix/dns resolution by @Hugoch in #383
- Change download currency limit from local to global. by @hoytak in #385
- hf_xet Cargo.toml 1.1.4 by @assafvayner in #387
New Contributors
Full Changelog: v1.1.3...v1.1.4
[v1.1.3] Bug Fixes: Resumable Uploads, Shard cache limit, and more
✨ New Features and Improvements
- Resume Uploads for interrupted sessions
- Shard Cache now has size limits (thanks for filing @danielhanchen) Fixes #350
- Improved XORB compression method determination
- Debug symbols as a single artifact (with instructions on how to apply them!)
- Reduced binary sizes by collapsing versions of dependent rust crates
🐛 Bug Fixes
- Sync up Cargo.lock/Cargo.toml (thanks for filing @lahwaacz!) Fixes #342
- Improved Download resiliency
What's Changed
- Updates out-of-sync Cargo.lock in hf_xet/ by @hoytak in #341
- Incremental progress on upload_xorb with retry_wrapper by @hoytak in #333
- Track total processed bytes and total transferred bytes by @hoytak in #328
- Streamline and aggregate file updates for reporting to python by @hoytak in #340
- Merging Cargo.toml dependencies into workspace Cargo.toml by @jgodlew in #339
- Resume capability for interrupted upload file sessions. by @hoytak in #346
- download terms once and write where needed by @assafvayner in #320
- Allow disabling progress aggregation. by @hoytak in #347
- Drop unnecessarily strict check causing intermittent test failure. by @hoytak in #354
- Added tests for session resuming + fixed reporting issues. by @hoytak in #352
- Improved XORB compression method determination by @hoytak in #355
- Fix for abort-on-quit issue. by @hoytak in #359
- Limit shard cache directory by @hoytak in #353
- CI builds with dev, alpha, or beta tags have debug symbols and debug_assertions enabled by @hoytak in #356
- limit concurrent gets and update segment size tuner config by @assafvayner in #360
- Report completion speed with progress updates by @hoytak in #361
- Move batch file uploads to FileUploadSession. by @hoytak in #362
- Bugfix for issue with shard resume consolidation by @hoytak in #364
- Make shard session resume robust to multiple sessions. by @hoytak in #365
- chunk deserialization special error when chunk header is a xorb footer by @assafvayner in #363
- Debug Symbol cleanup and instructions by @bpronan in #348
- Rename Chunk in the MerkleDB implementation to ChunkInfo. by @hoytak in #367
- Reference correctness tests for chunker by @hoytak in #366
- Release 1.1.3 version bump by @rajatarya in #370
Full Changelog: v1.1.2...v1.1.3
v1.1.3-dev0
What's Changed
- Updates out-of-sync Cargo.lock in hf_xet/ by @hoytak in #341
- Incremental progress on upload_xorb with retry_wrapper by @hoytak in #333
- Track total processed bytes and total transferred bytes by @hoytak in #328
- Streamline and aggregate file updates for reporting to python by @hoytak in #340
- Merging Cargo.toml dependencies into workspace Cargo.toml by @jgodlew in #339
Full Changelog: v1.1.2...v1.1.3-dev0
[v1.1.2] Smol binaries, sdist, bug fixes
✨ New Features and Improvements
- Much Smaller Binaries: In this release we’ve dropped the installed binary size across all platforms (ex. Linux went from ~96MB → ~14MB).
- sdist installation support: Now
hf-xet
can be compiled using best practices for Python package sdist installation. Thanks @tiran and @szalpal for the original bug reports!
🐛 Bug Fixes (retries, open-files, sdist, smaller binaries)
- More resilient uploads & downloads by adding retries to many error paths through download and upload. Fixes #300 #322 #311
- Optimizations around model compression selection, object serialization.
- Prevent "Too Many Open Files" error by limiting concurrent downloads.
- Build & release updates to support sdist, dbg symbols. Fixes #255 #304
- Code cleanup and refactoring around progress reporting.
What's Changed
- Updating hf-xet version to 1.1.1 by @bpronan in #313
- Switch chunk cache to use async RWlock instead of std::sync mutex. by @hoytak in #306
- Fix windows build failure by @hoytak in #316
- Switch file size / byte tracking variables from usize to u64 by @hoytak in #314
- Added opt-test profile option to hf_xet by @hoytak in #307
- Optimize bg4 prediction by @hoytak in #308
- Chunk Partitioning by @ylow in #299
- Move CAS object serialization to before parallel upload gate. by @hoytak in #315
- Consolidate progress reporting code to one crate by @hoytak in #318
- Fixing the sdist install and adding better project information by @bpronan in #319
- Removed unused git-style progress_reporting crate. by @hoytak in #317
- don't bump num concurrent downloads live by @assafvayner in #326
- SDist fix v2 by @bpronan in #325
- Add incremental upload progress and total bytes to progress tracking. by @hoytak in #323
- Back out streaming upload of xorb. by @hoytak in #327
- Extracting debug symbols from wheels by @bpronan in #330
- add retries on body read errors during download by @assafvayner in #331
- Fix release build issues by @bpronan in #332
- hf-xet 1.1.2 release by @rajatarya in #334
- MacOS dbg symbols zipped in release workflow by @rajatarya in #335
- Update release.yml by @rajatarya in #336
- Update dbg assets to upload zip by @rajatarya in #337
- Removing zip step for dbg assets on windows by @rajatarya in #338
Full Changelog: v1.1.1...v1.1.2
v1.1.1 - reduced binary size
✨ New Features and Improvements
In this release, we've halved our installed binary size on Linux distributions and added some performance improvements during chunking and compression evaluation.
🐛 Bug Fixes
- Our installed binaries were bloated and consuming most of the AWS Lambda size budget (thanks to @jp-agenta and @ggiallo28 for the original issues here and here)
What's Changed
- Updating hf-xet version to 1.1.0 by @bpronan in #285
- Make dedup critical crates compilation-compat with wasm by @seanses in #271
- Completion tracking for accurate upload and download progress reporting. by @hoytak in #219
- Adding session_id to requests and spans by @jgodlew in #291
- Simplify chunking backgrounding code. by @hoytak in #292
- Fix clippy issues in next rust version. by @hoytak in #298
- Replace passed-around threadpool refs with thread local variable by @hoytak in #297
- Connect detailed upload progress to hub by @hoytak in #301
- Revert "Revert "Reduce Usage of Compression Format Detection"" by @rajatarya in #279
- xtool query command by @seanses in #305
- Fix compilation issue due to api change by @seanses in #309
- Fixed race condition in dependency tracking. by @hoytak in #302
- Changed debug to minimal for python wheel. by @hoytak in #312
Full Changelog: v1.1.0...v1.1.1