Saving and loading a tokenizer does not produce an identical tokenizer in 4.34

### System Info

- `transformers` version: 4.34.0
- Platform: macOS-13.5-arm64-arm-64bit
- Python version: 3.10.12
- Huggingface_hub version: 0.17.3
- Safetensors version: 0.4.0
- Accelerate version: 0.20.3
- Accelerate config: 	not found
- PyTorch version (GPU?): 2.1.0 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed

### Who can help?

@ArthurZucker 

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction
```python
In [1]: import transformers

In [2]: t0tt = transformers.AutoTokenizer.from_pretrained('bigscience/T0pp')
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565

In [3]: t0tt.save_pretrained('saved-tokenizer')
Out[3]: 
('saved-tokenizer/tokenizer_config.json',
 'saved-tokenizer/special_tokens_map.json',
 'saved-tokenizer/spiece.model',
 'saved-tokenizer/added_tokens.json',
 'saved-tokenizer/tokenizer.json')

In [4]: loaded_t0tt = transformers.AutoTokenizer.from_pretrained('saved-tokenizer')
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

In [6]: t0tt._eos_token
Out[6]: AddedToken("</s>", rstrip=True, lstrip=True, single_word=False, normalized=True, special=True)

In [7]: loaded_t0tt._eos_token
Out[7]: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True)

In [8]: t0tt.eos_token
Out[8]: '</s>'

In [9]: t0tt('hello </s>        goodbye')
Out[9]: {'input_ids': [21820, 1, 23281, 1], 'attention_mask': [1, 1, 1, 1]}

In [10]: loaded_t0tt('hello </s>        goodbye')
Out[10]: {'input_ids': [21820, 3, 1, 23281, 1], 'attention_mask': [1, 1, 1, 1, 1]}
```
### Expected behavior

When saving and loading a tokenizer, it
(1) behaves the same
(2) has the same config details on the AddedToken

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Saving and loading a tokenizer does not produce an identical tokenizer in 4.34 #26773

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Saving and loading a tokenizer does not produce an identical tokenizer in 4.34 #26773

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions