-
Notifications
You must be signed in to change notification settings - Fork 30.3k
Description
System Info
transformers
version: 4.34.0- Platform: Linux-6.2.0-26-generic-x86_64-with-glibc2.27
- Python version: 3.8.0
- Huggingface_hub version: 0.16.4
- Safetensors version: 0.3.3-post.1
- Accelerate version: 0.22.0
- Accelerate config: not found
- PyTorch version (GPU?): 2.0.1+cu117 (True)
- Tensorflow version (GPU?): 2.13.1 (True)
- Flax version (CPU?/GPU?/TPU?): 0.7.0 (cpu)
- Jax version: 0.4.13
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
The nougat processor fails to work. The test code I run is pasted as below:
PRETRAINED_PATH_TO_NOUGAT = ""
processor = NougatProcessor.from_pretrained(PRETRAINED_PATH_TO_NOUGAT)
model = VisionEncoderDecoderModel.from_pretrained(PRETRAINED_PATH_TO_NOUGAT")
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model.to(device)
# prepare PDF image for the model
filepath = "/path/to/dummy/image.png"
image = Image.open(filepath)
pixel_values = processor(image, return_tensors="pt").pixel_values
# generate transcription (here we only generate 30 tokens)
outputs = model.generate(
pixel_values.to(device),
min_length=1,
max_new_tokens=512,
bad_words_ids=[[processor.tokenizer.unk_token_id]],
)
sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
sequence = processor.post_process_generation(sequence, fix_markdown=False)
The error log is as below:
Traceback (most recent call last):
File "/home/ysocr/tests/test_generate.py", line 15, in <module>
pixel_values = processor(image, return_tensors="pt").pixel_values
File "/home/venv/lib/python3.8/site-packages/transformers/models/nougat/processing_nougat.py", line 91, in __call__
inputs = self.image_processor(
File "/home/venv/lib/python3.8/site-packages/transformers/image_processing_utils.py", line 546, in __call__
return self.preprocess(images, **kwargs)
File "/home/venv/lib/python3.8/site-packages/transformers/models/nougat/image_processing_nougat.py", line 505, in preprocess
images = [
File "/home/venv/lib/python3.8/site-packages/transformers/models/nougat/image_processing_nougat.py", line 506, in <listcomp>
to_channel_dimension_format(image, data_format, input_channel_dim=input_data_format) for image in images
File "/home/venv/lib/python3.8/site-packages/transformers/image_transforms.py", line 78, in to_channel_dimension_format
target_channel_dim = ChannelDimension(channel_dim)
File "/usr/lib/python3.8/enum.py", line 304, in __call__
return cls.__new__(cls, value)
File "/usr/lib/python3.8/enum.py", line 595, in __new__
raise exc
File "/usr/lib/python3.8/enum.py", line 579, in __new__
result = cls._missing_(value)
File "/home/venv/lib/python3.8/site-packages/transformers/utils/generic.py", line 433, in _missing_
raise ValueError(
ValueError: ChannelDimension.FIRST is not a valid ChannelDimension, please select one of ['channels_first', 'channels_last']
After checking the codes, I found it is the default data type of data_format
that leads to this error. I believe the expected data type of data_format
should be Optional[ChannelDimension] = ChannelDimension.FIRST
rather than Optional["ChannelDimension"] = "ChannelDimension.FIRST"
. Besides, it is weird that default datatype of resample
and input_data_format
is "PILImageResampling"
and "ChannelDimension"
respectively. See line 55, line 64 and line 65.
transformers/src/transformers/models/nougat/processing_nougat.py
Lines 55 to 66 in 6015f91
resample: "PILImageResampling" = None, # noqa: F821 | |
do_thumbnail: bool = None, | |
do_align_long_axis: bool = None, | |
do_pad: bool = None, | |
do_rescale: bool = None, | |
rescale_factor: Union[int, float] = None, | |
do_normalize: bool = None, | |
image_mean: Optional[Union[float, List[float]]] = None, | |
image_std: Optional[Union[float, List[float]]] = None, | |
data_format: Optional["ChannelDimension"] = "ChannelDimension.FIRST", # noqa: F821 | |
input_data_format: Optional[Union[str, "ChannelDimension"]] = None, # noqa: F821 | |
text_pair: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None, |
I notice @ArthurZucker made such changes and added some comments. It could be a bug or maybe it is just some design I misunderstand?
Expected behavior
Ensure the nougat example works.