-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
I'm having a issue similar to #425 but a little different, like this:
Content-Disposition: attachment;filename="中文文件名.rar"
of cause it still violates RFC but seems common in China and Chrome/IE/Edge/curl will handle this, firefox won't.
You said in #425 you're ready to accept a patch so I looked a bit into the code, looks like only a few lines of code needed around here, there is already an utf8 test routine there called utf8dfa, but I don't know how to access options yet.
I'm proposing changing default encode from ISO-8859-1 to UTF-8 without an option, shouldn't break anything, UTF-8 is designed that way.
update: sorry I was thinking about ASCII. and I should change the issue title to "an option to default to UTF-8 instead of ISO-8859-1"
BTW, utf8dfa is actually not used correctly there,
Line 1115 in df19921
if (utf8dfa(&dfa_state, &dfa_code, *p) == UTF8_REJECT) { |
if (utf8dfa(&dfa_state, &dfa_code, *p) == UTF8_REJECT) {
*p is
const char
, it should be explicitly converted to unsigned const char
before implicitly converting to uint_32t
, for example -1 will be wrongly converted to 0xffffffff
instead of 0x000000ff
, which will result in a disastrous array index out of bounds in utf8dfa.
but, luckily, look up two lines, the utf8dfa call is enclosed in a if (inRFC5987AttrChar(*p)) {
check block, so it will never trigger, and also render this check entirely pointless, printable ASCII is automatically UTF-8 proof.