Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: huggingface/xet-core
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.1.2
Choose a base ref
...
head repository: huggingface/xet-core
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v1.1.3-dev0
Choose a head ref
  • 5 commits
  • 49 files changed
  • 3 contributors

Commits on May 19, 2025

  1. Updates out-of-sync Cargo.lock in hf_xet/ (#341)

    The version in hf_xet/Cargo.lock was not updated in the current main/
    hoytak authored May 19, 2025
    Configuration menu
    Copy the full SHA
    b364582 View commit details
    Browse the repository at this point in the history
  2. Incremental progress on upload_xorb with retry_wrapper (#333)

    This PR implements incremental progress reporting on the upload_xorb
    function, reporting progress every 512KB of data uploaded.
    
    In addition, errors are retried using the same retry policy as the other
    clients. To get around Body::wrap_stream preventing retries due to
    cloning failing, this PR adds a simple retry wrapper utility that allows
    the entire request to be retried instead of doing it as part of the
    middleware layer.
    hoytak authored May 19, 2025
    Configuration menu
    Copy the full SHA
    0d17409 View commit details
    Browse the repository at this point in the history
  3. Track total processed bytes and total transferred bytes (#328)

    With deduplication, just tracking the total processed bytes can give a
    false impression of actual progress when uploading files; the user would
    often see a huge jump when hitting a deduplicated part. This PR allows
    us to report bytes uploaded or downloaded separately from the bytes
    processed, which would allow us to correctly surface network
    utilization.
    
    For example, one possible way to show this would be:  
    
    ```
    Data Processed      #####################╺━━━━━━━━━━━━━━━━  58% • 1.8/3.1 GB     • 731.6 MB/s • 0:00:10
    New Data Uploaded   ######╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   9% • 65.9/731.7 MB  • 131.0 MB/s         • -:--:--
    data_up/d2.dat      #############################╸━━━━━━━━  67% • 67.1/100.0 MB  • 568.1 MB/s • 0:00:01
    ```
    hoytak authored May 19, 2025
    Configuration menu
    Copy the full SHA
    bef6a9a View commit details
    Browse the repository at this point in the history

Commits on May 20, 2025

  1. Streamline and aggregate file updates for reporting to python (#340)

    With the current incremental progress updates, the amount of updates
    going to python is substantial, and each has to acquire a global GIL
    lock. This negatively affects the upload speed on fast connections.
    
    This PR introduces an intermediate aggregation class that quickly
    aggregates all the incoming progress updates, then sends the aggregated
    update list to hf_xet once every 200 ms. With this, the thread
    contention experienced by the frequent incremental updates is eliminated
    while still reporting accurate progress to the user.
    hoytak authored May 20, 2025
    Configuration menu
    Copy the full SHA
    4faec0b View commit details
    Browse the repository at this point in the history
  2. Merging Cargo.toml dependencies into workspace Cargo.toml (#339)

    Co-authored-by: Brian Ronan <brian.ronan@huggingface.co>
    jgodlew and bpronan authored May 20, 2025
    Configuration menu
    Copy the full SHA
    c465076 View commit details
    Browse the repository at this point in the history
Loading