Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: huggingface/datasets
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 3.3.2
Choose a base ref
...
head repository: huggingface/datasets
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: 3.4.0
Choose a head ref
  • 11 commits
  • 75 files changed
  • 6 contributors

Commits on Feb 20, 2025

  1. set dev version (#7417)

    lhoestq authored Feb 20, 2025
    Configuration menu
    Copy the full SHA
    14233c0 View commit details
    Browse the repository at this point in the history

Commits on Mar 4, 2025

  1. fix: None default with bool type on load creates typing error (#7426)

    * fix typing on load
    
    * fix docstring
    stephantul authored Mar 4, 2025
    Configuration menu
    Copy the full SHA
    6631dc0 View commit details
    Browse the repository at this point in the history

Commits on Mar 5, 2025

  1. Use pyupgrade --py39-plus (#7428)

    * Use pyupgrade --py39-plus
    
    * Make style
    cyyever authored Mar 5, 2025
    Configuration menu
    Copy the full SHA
    26379d5 View commit details
    Browse the repository at this point in the history
  2. Faster folder based builder + parquet support + allow repeated media …

    …+ use torchvideo (#7424)
    
    * faster folder based builder + parquet support + allow repeated media
    
    * add _visit_with_path in features
    
    * support image/audio/video in nested data
    
    * docs
    
    * use filters even without metadata
    
    * minor
    
    * replace decord by torchcodec
    
    * switch to torchvision
    
    * update video docs
    
    * minor
    
    * fix tests
    
    * fix tests
    
    * fix tests
    
    * better webdataset docs
    
    * style
    
    * fix
    lhoestq authored Mar 5, 2025
    Configuration menu
    Copy the full SHA
    5c8869f View commit details
    Browse the repository at this point in the history

Commits on Mar 7, 2025

  1. Add with_split to DatasetDict.map (#7368)

    * Add: with_split
    
    * Add: support for 'with_split' parameter in DatasetDict.map method
    
    * Refactor: simplify dataset mapping in DatasetDict
    
    * Refactor: DatasetDict to bind function with split parameter
    
    * rm breakpoint
    
    * Enhance DatasetDict and IterableDatasetDict to support function binding with split parameter
    
    * Add: unbind
    
    * fix ci
    
    ---------
    
    Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
    jp1924 and lhoestq authored Mar 7, 2025
    Configuration menu
    Copy the full SHA
    f693f4e View commit details
    Browse the repository at this point in the history

Commits on Mar 12, 2025

  1. Refactor string_to_dict to return None if there is no match inste…

    …ad of raising `ValueError` (#7435)
    
    * Refactor string_to_dict to return None if there is no match instead of raising ValueError
    
    instead of having the pattern of using try-except to handle when there is no match, we can instead check if the return value is None; we can also assert that the return value should not be None if we know that should be true
    
    * Allow for source_url_fields to be None
    
    they can be local file paths here
    
    https://github.com/huggingface/datasets/actions/runs/13683185040/job/38380924390?pr=7435#step:10:9731
    Matthew Hoffman authored Mar 12, 2025
    Configuration menu
    Copy the full SHA
    67ffdfb View commit details
    Browse the repository at this point in the history

Commits on Mar 13, 2025

  1. Fix small bugs with async map (#7445)

    * fix async map resuming
    
    * fix with_indices
    
    * fix tests
    
    * fix tests
    
    * again
    lhoestq authored Mar 13, 2025
    Configuration menu
    Copy the full SHA
    f09db01 View commit details
    Browse the repository at this point in the history

Commits on Mar 14, 2025

  1. Add IterableDataset.decode with multithreading (#7450)

    * add IterableDataset.decode with multithreading
    
    * graceful async ends
    
    * test
    
    * docs
    
    * fix tests
    lhoestq authored Mar 14, 2025
    Configuration menu
    Copy the full SHA
    7ad7379 View commit details
    Browse the repository at this point in the history
  2. Fix resuming after ds.set_epoch(new_epoch) (#7451)

    * fix resuming with new epoch
    
    * more readable states
    
    * add test
    
    * make style
    lhoestq authored Mar 14, 2025
    Configuration menu
    Copy the full SHA
    e8ee24a View commit details
    Browse the repository at this point in the history
  3. minor docs changes (#7452)

    lhoestq authored Mar 14, 2025
    Configuration menu
    Copy the full SHA
    97ff626 View commit details
    Browse the repository at this point in the history
  4. release: 3.4.0 (#7453)

    * release: 3.4.0
    
    * minor
    lhoestq authored Mar 14, 2025
    Configuration menu
    Copy the full SHA
    14fb15a View commit details
    Browse the repository at this point in the history
Loading