Skip to content

Releases: huggingface/xet-core

[v1.1.9] Bug Fixes: Parallelism optimizations, metadata updates

27 Aug 23:04
7f53907
Compare
Choose a tag to compare

🚀 Performance Improvements:
• Improve parallelism in parutils by removing async_scoped
• Increase soft file limits for MacOS

🐛 Bug Fixes:
• Update hf_xet PyPI metadata

🔧 Reliability & Maintenance:
• Improved debuggability with tokio console support
• Add CI builds for MacOS

What's Changed

New Contributors

Full Changelog: v1.1.8...v1.1.9

v1.1.8 Bug Fixes

18 Aug 22:00
48be7b0
Compare
Choose a tag to compare

🚀 Performance Improvements:
• Client Caching - Reuses reqwest Client across RemoteClient objects to share connection pools
• Connection Limits - Limits idle connections to prevent resource exhaustion

🐛 Bug Fixes:
• Singleflight Fix - Critical fix preventing permanent error caching when owner tasks are dropped
• DataHash Serialization - Ensures consistent little-endian byte order across platforms

🔧 Reliability & Maintenance:
• Retry Logic Restoration - Restores retry logic accidentally removed in versions 1.1.6 and 1.1.7

What's Changed

  • fix: singleflight owner task not removing Call from Group if dropped by @jgodlew in #447
  • Add back retry for connection setup and sending request by @seanses in #455
  • Fix DataHash hex string serde to little endian by @seanses in #445
  • Clean up dependencies (no functionality change) by @seanses in #456
  • Cache and reuse reqwest Client by @seanses in #457
  • Limit number of idle connections by @hoytak in #459
  • update version by @assafvayner in #461

Full Changelog: v1.1.7...v1.1.8

v1.1.7

06 Aug 00:30
9bbc0c6
Compare
Choose a tag to compare

What's Changed

  • Remove telemetry code; eliminate Mutex on logging setup. by @hoytak in #441
  • Changed default number of parallel downloads from 64 to 48. by @hoytak in #442
  • Updated version to v1.1.7 by @hoytak in #443

Full Changelog: v1.1.6...v1.1.7

[v1.1.6] Bug Fixes: Proxy support, process safety, and more

05 Aug 22:44
7becae3
Compare
Choose a tag to compare

✨ New Features and Improvements

  • Proxy support, easing use behind corporate networks. (#413 by @hoytak; addresses #400 - thanks @albertodepaola and @goodsonjr for the initial reports)
  • Improvements to hf_xet logging; providing facility to log events to a formatted file (#428 by @hoytak)

🐛 Bug Fixes

  • Process safety: make running after os.fork() safer. (#429 by @hoytak; addresses #415 - thanks @John6666cat for the report)
  • Respect XDG_CACHE_HOME and ~/ when setting cache directories. (#426 by @hoytak; addresses #417 - thanks @half-duplex for the initial report)
  • Lower the default NUM_RANGE_CONCURRENT_GETS value to 64 to better respect file descriptor limits (#438 by @assafvayner; addresses #436 - thanks @djholt and @gary149 for the reports)
  • JWT token handling hardened with a buffer before expiration. (#405 by @jgodlew; addresses #404)

What's Changed

Full Changelog: v1.1.5...v1.1.6

[v1.1.5] Bug Fixes: Cert issue fixes & optimizations

20 Jun 21:47
d55c6a2
Compare
Choose a tag to compare

This release includes a fix for certificate issues in certain network environments and loading optimizations for dedup lookups.

🧱 Improvements

  • Background shard loading (#384): Loads shard lookup tables in the background to reduce upload_files startup time. Author: Hoyt Koepke

🐛 Bug Fixes

  • Certificate loading (#393): Switched to load_native_certs() for efficiency. Author: Hoyt Koepke

What's Changed

Full Changelog: v1.1.4...v1.1.5

[v1.1.4] Bug Fixes: Network Resilience and Performance Optimizations

16 Jun 21:20
8f7e9c8
Compare
Choose a tag to compare

📶 DNS Resolution & Network Connectivity

  • Fixed DNS resolution issues: Implemented custom DNS resolver to force absolute DNS name resolution, addressing issues where DNS resolvers struggled with CAS server addresses and fell back to local search domains
  • Enhanced TLS configuration: Updated reqwest to use rustls-tls by default with configurable TLS backends (native-tls, native-tls-vendored options available)

🚀 Performance Optimizations

  • Global download concurrency control: Changed download currency limit from per-file to global (default: 128 simultaneous connections) to prevent file handle exhaustion on macOS
  • Optimized chunking operations: Converted core Chunk data type from Arc<[u8]> to bytes::Bytes for better memory efficiency and reduced copying. Separated boundary calculation logic from chunk building for future optimization work
  • Updated shard cache size: Increased default shard cache limit to 16GB, effectively allowing deduplication against 16TB of data
  • Streamlined upload payload: Removed footer serialization from upload xorb payload in remote_client for improved efficiency

🤗 Developer Experience

  • Issue templates: Added comprehensive GitHub issue templates including bug report forms, feature request forms, and helpful links for better community engagement

What's Changed

  • Update chunker to separate out calculation of next boundary by @hoytak in #368
  • remove footer serialized from upload xorb payload on remote_client by @assafvayner in #372
  • Adding issue templates to repo by @jsulz in #374
  • Small optimizations for chunking / upload path by @hoytak in #371
  • Switch reqwest to rustls-tls from default; use hickory-dns for dns resolution. by @hoytak in #378
  • add ci steps to check cargo.lock is up to date by @assafvayner in #377
  • Update shard cache default size. by @hoytak in #381
  • Remove hickory-dns and use system dns provider by @hoytak in #380
  • Fix/dns resolution by @Hugoch in #383
  • Change download currency limit from local to global. by @hoytak in #385
  • hf_xet Cargo.toml 1.1.4 by @assafvayner in #387

New Contributors

Full Changelog: v1.1.3...v1.1.4

[v1.1.3] Bug Fixes: Resumable Uploads, Shard cache limit, and more

04 Jun 00:46
437f5fc
Compare
Choose a tag to compare

✨ New Features and Improvements

  • Resume Uploads for interrupted sessions
  • Shard Cache now has size limits (thanks for filing @danielhanchen) Fixes #350
  • Improved XORB compression method determination
  • Debug symbols as a single artifact (with instructions on how to apply them!)
  • Reduced binary sizes by collapsing versions of dependent rust crates

🐛 Bug Fixes

  • Sync up Cargo.lock/Cargo.toml (thanks for filing @lahwaacz!) Fixes #342
  • Improved Download resiliency

What's Changed

  • Updates out-of-sync Cargo.lock in hf_xet/ by @hoytak in #341
  • Incremental progress on upload_xorb with retry_wrapper by @hoytak in #333
  • Track total processed bytes and total transferred bytes by @hoytak in #328
  • Streamline and aggregate file updates for reporting to python by @hoytak in #340
  • Merging Cargo.toml dependencies into workspace Cargo.toml by @jgodlew in #339
  • Resume capability for interrupted upload file sessions. by @hoytak in #346
  • download terms once and write where needed by @assafvayner in #320
  • Allow disabling progress aggregation. by @hoytak in #347
  • Drop unnecessarily strict check causing intermittent test failure. by @hoytak in #354
  • Added tests for session resuming + fixed reporting issues. by @hoytak in #352
  • Improved XORB compression method determination by @hoytak in #355
  • Fix for abort-on-quit issue. by @hoytak in #359
  • Limit shard cache directory by @hoytak in #353
  • CI builds with dev, alpha, or beta tags have debug symbols and debug_assertions enabled by @hoytak in #356
  • limit concurrent gets and update segment size tuner config by @assafvayner in #360
  • Report completion speed with progress updates by @hoytak in #361
  • Move batch file uploads to FileUploadSession. by @hoytak in #362
  • Bugfix for issue with shard resume consolidation by @hoytak in #364
  • Make shard session resume robust to multiple sessions. by @hoytak in #365
  • chunk deserialization special error when chunk header is a xorb footer by @assafvayner in #363
  • Debug Symbol cleanup and instructions by @bpronan in #348
  • Rename Chunk in the MerkleDB implementation to ChunkInfo. by @hoytak in #367
  • Reference correctness tests for chunker by @hoytak in #366
  • Release 1.1.3 version bump by @rajatarya in #370

Full Changelog: v1.1.2...v1.1.3

v1.1.3-dev0

20 May 20:06
c465076
Compare
Choose a tag to compare
v1.1.3-dev0 Pre-release
Pre-release

What's Changed

  • Updates out-of-sync Cargo.lock in hf_xet/ by @hoytak in #341
  • Incremental progress on upload_xorb with retry_wrapper by @hoytak in #333
  • Track total processed bytes and total transferred bytes by @hoytak in #328
  • Streamline and aggregate file updates for reporting to python by @hoytak in #340
  • Merging Cargo.toml dependencies into workspace Cargo.toml by @jgodlew in #339

Full Changelog: v1.1.2...v1.1.3-dev0

[v1.1.2] Smol binaries, sdist, bug fixes

16 May 20:44
b6bb555
Compare
Choose a tag to compare

✨ New Features and Improvements

  • Much Smaller Binaries: In this release we’ve dropped the installed binary size across all platforms (ex. Linux went from ~96MB → ~14MB).
  • sdist installation support: Now hf-xet can be compiled using best practices for Python package sdist installation. Thanks @tiran and @szalpal for the original bug reports!

🐛 Bug Fixes (retries, open-files, sdist, smaller binaries)

  • More resilient uploads & downloads by adding retries to many error paths through download and upload. Fixes #300 #322 #311
  • Optimizations around model compression selection, object serialization.
  • Prevent "Too Many Open Files" error by limiting concurrent downloads.
  • Build & release updates to support sdist, dbg symbols. Fixes #255 #304
  • Code cleanup and refactoring around progress reporting.

What's Changed

Full Changelog: v1.1.1...v1.1.2

v1.1.1 - reduced binary size

12 May 21:33
6f24934
Compare
Choose a tag to compare

✨ New Features and Improvements

In this release, we've halved our installed binary size on Linux distributions and added some performance improvements during chunking and compression evaluation.

🐛 Bug Fixes

  • Our installed binaries were bloated and consuming most of the AWS Lambda size budget (thanks to @jp-agenta and @ggiallo28 for the original issues here and here)

What's Changed

  • Updating hf-xet version to 1.1.0 by @bpronan in #285
  • Make dedup critical crates compilation-compat with wasm by @seanses in #271
  • Completion tracking for accurate upload and download progress reporting. by @hoytak in #219
  • Adding session_id to requests and spans by @jgodlew in #291
  • Simplify chunking backgrounding code. by @hoytak in #292
  • Fix clippy issues in next rust version. by @hoytak in #298
  • Replace passed-around threadpool refs with thread local variable by @hoytak in #297
  • Connect detailed upload progress to hub by @hoytak in #301
  • Revert "Revert "Reduce Usage of Compression Format Detection"" by @rajatarya in #279
  • xtool query command by @seanses in #305
  • Fix compilation issue due to api change by @seanses in #309
  • Fixed race condition in dependency tracking. by @hoytak in #302
  • Changed debug to minimal for python wheel. by @hoytak in #312

Full Changelog: v1.1.0...v1.1.1