Dataset streaming example not working

### System Info

```shell
- `transformers` version: 4.18.0
- Platform: Linux-5.4.173.el7-x86_64-with-glibc2.10
- Python version: 3.8.12
- Huggingface_hub version: 0.5.1
- PyTorch version (GPU?): 1.11.0a0+17540c5 (True)
- Tensorflow version (GPU?): 2.8.0 (True)
- Flax version (CPU?/GPU?/TPU?): 0.4.2 (gpu)
- Jax version: 0.3.10
- JaxLib version: 0.3.10
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
```


### Who can help?

@patrickvonplaten 

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Following the guide to train a model in streaming mode using the [dataset-streaming](https://github.com/huggingface/transformers/tree/main/examples/research_projects/jax-projects/dataset-streaming) directory results in the following error.

```
[11:11:16] - INFO - datasets_modules.datasets.oscar.84838bd49d2295f62008383b05620571535451d84545037bb94d6f3501651df2.oscar - generating examples from = https://s3.amazonaws.com/datasets.huggingface.co/oscar/1.0/unshuffled/deduplicated/en/en_part_480.txt.gz
Token indices sequence length is longer than the specified maximum sequence length for this model (1195 > 512). Running this sequence through the model will result in indexing errors
Traceback (most recent call last):
  File "./run_mlm_flax_stream.py", line 549, in <module>
    eval_samples = advance_iter_and_group_samples(training_iter, data_args.num_eval_samples, max_seq_length)
  File "./run_mlm_flax_stream.py", line 284, in advance_iter_and_group_samples
    samples = {k: samples[k] + tokenized_samples[k] for k in tokenized_samples.keys()}
  File "./run_mlm_flax_stream.py", line 284, in <dictcomp>
    samples = {k: samples[k] + tokenized_samples[k] for k in tokenized_samples.keys()}
TypeError: can only concatenate list (not "int") to list
```

### Expected behavior

```shell
Model training to start.
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dataset streaming example not working #17132

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dataset streaming example not working #17132

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions