Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: huggingface/xet-core
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.1.0
Choose a base ref
...
head repository: huggingface/xet-core
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v1.1.1
Choose a head ref
  • 13 commits
  • 70 files changed
  • 6 contributors

Commits on Apr 29, 2025

  1. Configuration menu
    Copy the full SHA
    6b5e280 View commit details
    Browse the repository at this point in the history
  2. Make dedup critical crates compilation-compat with wasm (#271)

    1. Upgrade `rand` so we can use the new "wasm_js" feature for the
    underlying `getrandom` dependency.
    2. Change deprecated rand `gen`, `gen_range` functions.
    3. Restrict tokio features.
    4. Change `tempdir` to `tempfile` because `tempdir` uses a very old
    non-wasm compat version of `rand`, and is merged into the latter and
    archived.
    5. Clean out some unnecessary dependencies.
    6. Moved a file from parutils to utils.
    seanses authored Apr 29, 2025
    Configuration menu
    Copy the full SHA
    0474cd7 View commit details
    Browse the repository at this point in the history

Commits on May 1, 2025

  1. Configuration menu
    Copy the full SHA
    8c9c34d View commit details
    Browse the repository at this point in the history

Commits on May 2, 2025

  1. Adding session_id to requests and spans (#291)

    * Creates a session_id whenever the `data_client` is used or a
    FileUploadSession/FileDownloader is created to be propagated to the new
    remote clients.
    * Adds a middleware to http clients to push the session_id into the
    `X-Xet-Session-Id` header for outgoing requests (CAS is already
    configured to accept this header).
    * Adds info-level spans to the key parts of xet-core for cases where
    xet-core is used as a library by long-running systems (e.g. migration
    service or internal systems) for aid in debugging / tracing (essentially
    bringing: #82 up to current
    minus the hf_xet logging changes).
    jgodlew authored May 2, 2025
    Configuration menu
    Copy the full SHA
    f3edaa3 View commit details
    Browse the repository at this point in the history

Commits on May 5, 2025

  1. Simplify chunking backgrounding code. (#292)

    PR to simplify the code that backgrounds the chunking process. Should
    have no functionality change.
    hoytak authored May 5, 2025
    Configuration menu
    Copy the full SHA
    26331d3 View commit details
    Browse the repository at this point in the history

Commits on May 6, 2025

  1. Fix clippy issues in next rust version. (#298)

    This PR simply fixes clippy issues that are present in the next rust
    version. No functionality change.
    hoytak authored May 6, 2025
    Configuration menu
    Copy the full SHA
    8d4958d View commit details
    Browse the repository at this point in the history

Commits on May 7, 2025

  1. Replace passed-around threadpool refs with thread local variable (#297)

    Currently, we pass references to the threadpool around the code in order
    to use it. However, all of this code is currently on a worker thread of
    the tokio runtime used to create the threadpool.
    
    This PR simplifies this by using thread local storage; each worker
    thread sets a reference to the runtime on start that can be accessed at
    any time using ThreadPool::current().
    
    Fallback for running within an existing tokio runtime (E.g. with
    tokio::test) is also handled using the from_external() mechanism.
    
    There should be no functionality change, just code simplification.
    hoytak authored May 7, 2025
    Configuration menu
    Copy the full SHA
    31beb80 View commit details
    Browse the repository at this point in the history

Commits on May 8, 2025

  1. Connect detailed upload progress to hub (#301)

    This PR connects detailed upload progress to the hub in a backwards
    compatible way.
    
    It works by testing the number of arguments and argument names on the
    progress updating function. If the progress reporting function takes a
    single argument, this function calls it using the old method; if it has
    the appropriate arguments for detailed reporting -- `item_id,
    completed_bytes, total_bytes, update_increment` -- then it calls it
    using the new method. Additionally, if None is passed in, the progress
    reporting is disabled.
    hoytak authored May 8, 2025
    Configuration menu
    Copy the full SHA
    0ba75fe View commit details
    Browse the repository at this point in the history

Commits on May 9, 2025

  1. Revert "Revert "Reduce Usage of Compression Format Detection"" (#279)

    Reverts #275
    
    Co-authored-by: Joseph Godlewski <jgodlewski@huggingface.co>
    rajatarya and jgodlew authored May 9, 2025
    Configuration menu
    Copy the full SHA
    719f367 View commit details
    Browse the repository at this point in the history

Commits on May 10, 2025

  1. xtool query command (#305)

    A debugging utility to get file reconstruction info.
    seanses authored May 10, 2025
    Configuration menu
    Copy the full SHA
    dfc7f0e View commit details
    Browse the repository at this point in the history

Commits on May 12, 2025

  1. Fix compilation issue due to api change (#309)

    Fix compilation issue that caused by merging
    #305 that didn't see an api
    change.
    seanses authored May 12, 2025
    Configuration menu
    Copy the full SHA
    6b6dd70 View commit details
    Browse the repository at this point in the history
  2. Fixed race condition in dependency tracking. (#302)

    Testing discovered two dependency tracking issues: 
    
    The first is when a file references a new xorb multiple times in
    non-contiguous locations. In this case, the logic will cause an
    assertion failure when debug_assertions are enabled, though likely the
    underlying logic is actually correct.
    
    The second is that tracking which xorbs are part of a given session and
    which xorbs have been uploaded previously was done by recording the xorb
    hashes when they are registered for upload. However, it turns out this
    needs to happen before registering the xorb as available for dedup;
    otherwise a race condition could cause a xorb to be incorrectly
    registered as already uploaded when in reality the xorb hash was in the
    queue and not actually added to the session registry yet. This causes
    the progress to be incorrectly counted as already completed.
    
    A clean fix for this is to actually get that information directly from
    the shard_manager instead of tracking it separately by returning whether
    it's deduped against the session data or the cache. This fix allows
    newly cut xorbs to immediately be used for dedup by other threads and
    correctly tracks all the information across threads.
    hoytak authored May 12, 2025
    Configuration menu
    Copy the full SHA
    50fde9b View commit details
    Browse the repository at this point in the history
  3. Changed debug to minimal for python wheel. (#312)

    Currently at O3 + minimal debug info; changing to Os + lto drops to
    35mb.
    hoytak authored May 12, 2025
    Configuration menu
    Copy the full SHA
    6f24934 View commit details
    Browse the repository at this point in the history
Loading