• William A Rowe Jr's avatar
    Correct interpretation of utf-8 0xf8-0xff · 961c0e6b
    William A Rowe Jr authored
    In consuming this useful string utility, it was discovered
    that the interpretation of leading byte codes 0xf8-0xff
    did not conform to either the RFC 3629 nor ISO/IEC 10646
    definitions of utf-8.
    
    The IETF RFC describes only 1-4 byte encodings (a limited
    number of 4 byte encodings at that), and plainly states in
    section 1. Introduction;
       o  The octet values C0, C1, F5 to FF never appear.
    
    Alternately, the ISO definition "R.2 Specification of UTF-8"
    preseented in the original IETF RFC 2279 clearly define the
    meaning of leading byte values F5 through FD, and RFC 3629
    Section 10. Security paragraph 3 calls out this alternate
    reading (alterative to "never appears".) F5-F7 begin an
    invalid (in the domain of unicode code points) 4-byte UTF-8
    sequence (similar to F0-F4), while F8-FC begin a 5-byte
    sequence, FC and FD begin a 6 byte sequence.
    
    The curent code is wrong in that it doesn't treat the codes
    F8-FF as invalid 1-byte characters, nor does it treat the
    codes F8-FD as the correct number of bytes. No valid parser
    will land these lead characters 4 bytes forward. Most will
    treat these as the 5 or 6 byte utf-32 character and may then
    treat the resulting character as invalid, while some parsers
    may reject all leading F5-FF characters as a single byte of
    erronious input, followed by each invalid continuation byte.
    
    We propose the conventional reading of F8-FD as 5 and 6 byte
    sequences as originally defined, while FE-FF must be read
    as single byte invalid code points.
    Signed-off-by: 's avatarWilliam A Rowe Jr <wrowe@pivotal.io>
    Signed-off-by: 's avatarYechiel Kalmenson <ykalmenson@pivotal.io>
    961c0e6b
Name
Last commit
Last update
..
google/protobuf Loading commit data...
solaris Loading commit data...
Makefile.am Loading commit data...
README.md Loading commit data...
libprotobuf-lite.map Loading commit data...
libprotobuf.map Loading commit data...
libprotoc.map Loading commit data...