Wrong charset when CodedCharacterSet=ESC - A

According to https://en.wikipedia.org/wiki/ISO/IEC_2022#cite_note-14.3.2-90 ISO-8859-1 should be used both when CodedCharacterSet is
    

- ESC % A
- ESC . A
- ESC - A

Currently, only the first two syntaxes are supported. 

The fix seems to as simple as adding a new constant to Iso2022Converter

`
private static final byte MINUS_SIGN = 0x2D;
`

and add an extra if clause to com.drew.metadata.iptc.Iso2022Converter#convertISO2022CharsetToJavaCharset 

`
 if (bytes.length > 2 && bytes[0] == ESC && bytes[1] == MINUS_SIGN && bytes[2] == LATIN_CAPITAL_A)
    return ISO_8859_1;
`

The Iso2022ConverterTest.java should also be extended with 

`
assertEquals("ISO-8859-1", Iso2022Converter.convertISO2022CharsetToJavaCharset(new byte[]{0x1B, (byte)0x2D, (byte)0x41}));
`
A pull request has been created https://github.com/drewnoakes/metadata-extractor/pull/615


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wrong charset when CodedCharacterSet=ESC - A #614

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Wrong charset when CodedCharacterSet=ESC - A #614

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions