- 11 Dec, 2017 3 commits
-
-
Kenton Varda authored
So, don't use UTF-8 literals when trying to represent invalid byte sequences. Just use standard string literals.
-
Kenton Varda authored
Different platforms have different sizes for wchar_t. For example: * Linux: 32-bit (originally intended as UCS-4, rarely used in practice) * Windows: 16-bit (originally intended as UCS-2, but now probably treated as UTF-16) * BeOS: 8-bit (strictly intended to be UTF-8) For KJ purposes, we'll assume wchar_t arrays use the UTF encoding appropriate to their size, whatever that may be on the target platform. This is mainly being added because the Win32 API uses wchar_t heavily.
-
Kenton Varda authored
This allows arbitrary char16 arrays to round-trip through UTF-8 without losing information, even if the char16 arrays are not valid UTF-16. This is necessary e.g. for filesystem manipulation on Windows, where filenames contain 16-bit characters but valid UTF-16 is not enforced. Invalid UTF-16 represented in UTF-8 is affectionately known as WTF-8: http://simonsapin.github.io/wtf-8/
-
- 04 Dec, 2017 1 commit
-
-
Harris Hancock authored
This change modifies decodeBase64() to report errors as required by the WHATWG HTML spec's atob() JavaScript function. Notably, it reports errors for non-whitespace characters outside of the valid base64 character range ([+/0-9A-Za-z=]), and performs sanity checks on padding and input length. I took care to keep the algorithm single-pass, and to support streaming via multiple calls of base64_decode_block(), though we don't currently expose that functionality.
-
- 14 Oct, 2017 1 commit
-
-
Edward Catmur authored
If we finish decoding in step_a state, there is no current output character, so reading *plainchar will either be an uninitialized read or (if the output buffer is minimally sized) a past-the-end read. Detected by -fsanitize=address.
-
- 12 Oct, 2017 2 commits
-
-
Kenton Varda authored
-
Edward Catmur authored
If we finish decoding in step_a state, there is no current output character, so reading *plainchar will either be an uninitialized read or (if the output buffer is minimally sized) a past-the-end read. Detected by -fsanitize=address.
-
- 30 May, 2017 1 commit
-
-
Kenton Varda authored
-
- 23 May, 2017 4 commits
-
-
Kenton Varda authored
-
Kenton Varda authored
-
Kenton Varda authored
- Rename UtfResult -> EncodingResult - Make it usable like a Maybe, so that we don't need separate "try" functions. - Check errors in hex decoding and URI decoding.
-
Kenton Varda authored
-
- 22 May, 2017 1 commit
-
-
Kenton Varda authored
In particular: UTF-{8,16,32}, Hex, URI encoding, and Base64
-