Minor update to encoding documentation

7cfe718d · Milo Yip · e590e075 · 7cfe718d
Commit 7cfe718d authored Jul 15, 2014 by Milo Yip
Hide whitespace changes
Inline Side-by-side

Showing with 5 additions and 6 deletions

encoding.md doc/encoding.md +5 -6

No files found.
--- a/doc/encoding.md
+++ b/doc/encoding.md
@@ -6,8 +6,7 @@ According to [ECMA-404](http://www.ecma-international.org/publications/files/ECM

 The earlier [RFC4627](http://www.ietf.org/rfc/rfc4627.txt) stated that,

-> (in §3) JSON text SHALL be encoded in Unicode.  The default encoding is
-   UTF-8.
+> (in §3) JSON text SHALL be encoded in Unicode.  The default encoding is UTF-8.

 > (in §6) JSON may be represented using UTF-8, UTF-16, or UTF-32. When JSON is written in UTF-8, JSON is 8bit compatible.  When JSON is written in UTF-16 or UTF-32, the binary content-transfer-encoding must be used.

@@ -28,9 +27,9 @@ Those unique numbers are called code points, which is in the range `0x0` to `0x1

 There are various encodings for storing Unicode code points. These are called Unicode Transformation Format (UTF). RapidJSON supports the most commonly used UTFs, including

-* UTF-8: 8-bit variable-width encoding. It maps a code point to 1-4 bytes.
-* UTF-16: 16-bit variable-width encoding. It maps a code point to 1-2 16-bit code units (i.e., 2-4 bytes).
-* UTF-32: 32-bit fixed-width encoding. It directly maps a code point to 1 32-bit code unit (i.e. 4 bytes).
+* UTF-8: 8-bit variable-width encoding. It maps a code point to 1–4 bytes.
+* UTF-16: 16-bit variable-width encoding. It maps a code point to 1–2 16-bit code units (i.e., 2–4 bytes).
+* UTF-32: 32-bit fixed-width encoding. It directly maps a code point to a single 32-bit code unit (i.e. 4 bytes).

 For UTF-16 and UTF-32, the byte order (endianness) does matter. Within computer memory, they are often stored in the computer's endianness. However, when it is stored in file or transferred over network, we need to state the byte order of the byte sequence, either little-endian (LE) or big-endian (BE). 

@@ -78,7 +77,7 @@ For a detail example, please check the example in [DOM's Encoding](doc/stream.md

 ## Character Type {#CharacterType}

-As shown in the declaration, each encoding has a `CharType` template parameter. Actually, it may be a little bit confusing, but each `CharType` stores a code unit, not a character (code point). As mentioned in previous section, a code point may be encoded to 1-4 code units for UTF-8.
+As shown in the declaration, each encoding has a `CharType` template parameter. Actually, it may be a little bit confusing, but each `CharType` stores a code unit, not a character (code point). As mentioned in previous section, a code point may be encoded to 1–4 code units for UTF-8.

 For `UTF16(LE|BE)`, `UTF32(LE|BE)`, the `CharType` must be integer type of at least 2 and 4 bytes  respectively.