several breakages due to recent `datasets` 

It seems that `datasets==2.16.0` and higher breaks `evaluate`

```
$ cat test-evaluate.py
from evaluate import load
import os
import torch.distributed as dist

dist.init_process_group("nccl")

rank = int(os.environ.get("LOCAL_RANK", 0))
world_size = dist.get_world_size()

metric = load("accuracy",
                  experiment_id = "test4",
                  num_process = world_size,
                  process_id  = rank)
metric.add_batch(predictions=[], references=[])
```

# Problem 1. `umask` isn't being respected when creating lock files

as we are in a group setting we use `umask 000`

but this script creates files with missing perms:
```
-rw-r--r-- 1 [...]/metrics/accuracy/default/test4-2-rdv.lock
```
which is invalid, since `umask 000` should have led to:
```
-rw-rw-rw- 1 [...]/metrics/accuracy/default/test4-2-rdv.lock
```

the problem applies to all other locks created during such run - that is a few more .lock files there.

this is the same issue that was reported and dealt with multiple times in `datasets`

if I downgrade to `datasets==2.15.0` the files are created correctly with:
```
-rw-rw-rw- 
```

# Problem 2. `Expected to find locked file /data/huggingface/metrics/accuracy/default/test4-2-0.arrow.lock from process 1 but it doesn't exist.`

```
$ python -u -m torch.distributed.run --nproc_per_node=2 --rdzv_endpoint localhost:6000  --rdzv_backend c10d test-evaluate.py
master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
WARNING:__main__:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Using the latest cached version of the module from /data/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Mon Jan 29 18:42:31 2024) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
Using the latest cached version of the module from /data/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c447638f751434e8e65b14 (last modified on Mon Jan 29 18:42:31 2024) since it couldn't be found locally at evaluate-metric--accuracy, or remotely on the Hugging Face Hub.
Traceback (most recent call last):
  File "/home/stas/test/test-evaluate.py", line 14, in <module>
    metric.add_batch(predictions=[], references=[])
  File "/env/lib/conda/evaluate-test/lib/python3.9/site-packages/evaluate/module.py", line 510, in add_batch
    self._init_writer()
  File "/env/lib/conda/evaluate-test/lib/python3.9/site-packages/evaluate/module.py", line 656, in _init_writer
    self._check_all_processes_locks()  # wait for everyone to be ready
  File "/env/lib/conda/evaluate-test/lib/python3.9/site-packages/evaluate/module.py", line 350, in _check_all_processes_locks
    raise ValueError(
ValueError: Expected to find locked file /data/huggingface/metrics/accuracy/default/test4-2-0.arrow.lock from process 0 but it doesn't exist.
Traceback (most recent call last):
  File "/home/stas/test/test-evaluate.py", line 14, in <module>
    metric.add_batch(predictions=[], references=[])
  File "/env/lib/conda/evaluate-test/lib/python3.9/site-packages/evaluate/module.py", line 510, in add_batch
    self._init_writer()
  File "/env/lib/conda/evaluate-test/lib/python3.9/site-packages/evaluate/module.py", line 659, in _init_writer
    self._check_rendez_vous()  # wait for master to be ready and to let everyone go
  File "/env/lib/conda/evaluate-test/lib/python3.9/site-packages/evaluate/module.py", line 362, in _check_rendez_vous
    raise ValueError(
ValueError: Expected to find locked file /data/huggingface/metrics/accuracy/default/test4-2-0.arrow.lock from process 1 but it doesn't exist.
```

The files are there:
```
-rw-rw-rw- 1 stas stas 0 Jan 29 22:14 /data/huggingface/metrics/accuracy/default/test4-2-0.arrow
-rw-r--r-- 1 stas stas 0 Jan 29 22:15 /data/huggingface/metrics/accuracy/default/test4-2-0.arrow.lock
-rw-rw-rw- 1 stas stas 0 Jan 29 22:14 /data/huggingface/metrics/accuracy/default/test4-2-1.arrow
-rw-r--r-- 1 stas stas 0 Jan 29 22:14 /data/huggingface/metrics/accuracy/default/test4-2-1.arrow.lock
-rw-r--r-- 1 stas stas 0 Jan 29 22:14 /data/huggingface/metrics/accuracy/default/test4-2-rdv.lock
```

if I downgrade to `datasets==2.15.0` the above code starts to work.

with `datasets<2.16` works, `datasets>=2.16` breaks.

Using `evaluate==0.4.1`

Thank you!

@lhoestq 

@williamberrios who reported this  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

several breakages due to recent `datasets` #542

Problem 1. `umask` isn't being respected when creating lock files

Problem 2. `Expected to find locked file /data/huggingface/metrics/accuracy/default/test4-2-0.arrow.lock from process 1 but it doesn't exist.`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

several breakages due to recent datasets #542

Description

Problem 1. umask isn't being respected when creating lock files

Problem 2. Expected to find locked file /data/huggingface/metrics/accuracy/default/test4-2-0.arrow.lock from process 1 but it doesn't exist.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

several breakages due to recent `datasets` #542

Problem 1. `umask` isn't being respected when creating lock files

Problem 2. `Expected to find locked file /data/huggingface/metrics/accuracy/default/test4-2-0.arrow.lock from process 1 but it doesn't exist.`