processor_nougat has wrong default data type

### System Info

- `transformers` version: 4.34.0
- Platform: Linux-6.2.0-26-generic-x86_64-with-glibc2.27
- Python version: 3.8.0
- Huggingface_hub version: 0.16.4
- Safetensors version: 0.3.3-post.1
- Accelerate version: 0.22.0
- Accelerate config:    not found
- PyTorch version (GPU?): 2.0.1+cu117 (True)
- Tensorflow version (GPU?): 2.13.1 (True)
- Flax version (CPU?/GPU?/TPU?): 0.7.0 (cpu)
- Jax version: 0.4.13

### Who can help?

 @amyeroberts @ArthurZucker 


### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

The nougat processor fails to work. The test code I run is pasted as below:

```python

PRETRAINED_PATH_TO_NOUGAT = ""
processor = NougatProcessor.from_pretrained(PRETRAINED_PATH_TO_NOUGAT)
model = VisionEncoderDecoderModel.from_pretrained(PRETRAINED_PATH_TO_NOUGAT")

device = "cuda:0" if torch.cuda.is_available() else "cpu"
model.to(device)
# prepare PDF image for the model
filepath = "/path/to/dummy/image.png"
image = Image.open(filepath)
pixel_values = processor(image, return_tensors="pt").pixel_values

# generate transcription (here we only generate 30 tokens)
outputs = model.generate(
    pixel_values.to(device),
    min_length=1,
    max_new_tokens=512,
    bad_words_ids=[[processor.tokenizer.unk_token_id]],
)

sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
sequence = processor.post_process_generation(sequence, fix_markdown=False)

```

The error log is as below:
```
Traceback (most recent call last):
  File "/home/ysocr/tests/test_generate.py", line 15, in <module>
    pixel_values = processor(image, return_tensors="pt").pixel_values
  File "/home/venv/lib/python3.8/site-packages/transformers/models/nougat/processing_nougat.py", line 91, in __call__
    inputs = self.image_processor(
  File "/home/venv/lib/python3.8/site-packages/transformers/image_processing_utils.py", line 546, in __call__
    return self.preprocess(images, **kwargs)
  File "/home/venv/lib/python3.8/site-packages/transformers/models/nougat/image_processing_nougat.py", line 505, in preprocess
    images = [
  File "/home/venv/lib/python3.8/site-packages/transformers/models/nougat/image_processing_nougat.py", line 506, in <listcomp>
    to_channel_dimension_format(image, data_format, input_channel_dim=input_data_format) for image in images
  File "/home/venv/lib/python3.8/site-packages/transformers/image_transforms.py", line 78, in to_channel_dimension_format
    target_channel_dim = ChannelDimension(channel_dim)
  File "/usr/lib/python3.8/enum.py", line 304, in __call__
    return cls.__new__(cls, value)
  File "/usr/lib/python3.8/enum.py", line 595, in __new__
    raise exc
  File "/usr/lib/python3.8/enum.py", line 579, in __new__
    result = cls._missing_(value)
  File "/home/venv/lib/python3.8/site-packages/transformers/utils/generic.py", line 433, in _missing_
    raise ValueError(
ValueError: ChannelDimension.FIRST is not a valid ChannelDimension, please select one of ['channels_first', 'channels_last']
```

After checking the codes,  I found it is the default data type of ``data_format`` that leads to this error.  I believe the expected data type of ``data_format`` should be ``Optional[ChannelDimension] = ChannelDimension.FIRST`` rather than ``Optional["ChannelDimension"] = "ChannelDimension.FIRST"``. Besides, it is weird that default datatype of ``resample``and ``input_data_format`` is ``"PILImageResampling"`` and ``"ChannelDimension"`` respectively. See line 55, line 64 and line 65.


https://github.com/huggingface/transformers/blob/6015f91a5a28548a597f8d24341d089fe04994e8/src/transformers/models/nougat/processing_nougat.py#L55-L66


I notice @ArthurZucker made such changes and added some comments. It could be a bug or maybe it is just some design I misunderstand? 

### Expected behavior

Ensure the nougat example works.

	resample: "PILImageResampling" = None, # noqa: F821
	do_thumbnail: bool = None,
	do_align_long_axis: bool = None,
	do_pad: bool = None,
	do_rescale: bool = None,
	rescale_factor: Union[int, float] = None,
	do_normalize: bool = None,
	image_mean: Optional[Union[float, List[float]]] = None,
	image_std: Optional[Union[float, List[float]]] = None,
	data_format: Optional["ChannelDimension"] = "ChannelDimension.FIRST", # noqa: F821
	input_data_format: Optional[Union[str, "ChannelDimension"]] = None, # noqa: F821
	text_pair: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

processor_nougat has wrong default data type #26597

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

processor_nougat has wrong default data type #26597

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions