Skip to content

Wrong charset when CodedCharacterSet=ESC - A #614

@kenwa

Description

@kenwa

According to https://en.wikipedia.org/wiki/ISO/IEC_2022#cite_note-14.3.2-90 ISO-8859-1 should be used both when CodedCharacterSet is

  • ESC % A
  • ESC . A
  • ESC - A

Currently, only the first two syntaxes are supported.

The fix seems to as simple as adding a new constant to Iso2022Converter

private static final byte MINUS_SIGN = 0x2D;

and add an extra if clause to com.drew.metadata.iptc.Iso2022Converter#convertISO2022CharsetToJavaCharset

if (bytes.length > 2 && bytes[0] == ESC && bytes[1] == MINUS_SIGN && bytes[2] == LATIN_CAPITAL_A) return ISO_8859_1;

The Iso2022ConverterTest.java should also be extended with

assertEquals("ISO-8859-1", Iso2022Converter.convertISO2022CharsetToJavaCharset(new byte[]{0x1B, (byte)0x2D, (byte)0x41}));
A pull request has been created #615

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions