1. 11 Dec, 2017 3 commits
    • Kenton Varda's avatar
      Fix MSVC: \x sequences in UTF-8 literals are treated as \u sequences, not bytes. · 3c091037
      Kenton Varda authored
      So, don't use UTF-8 literals when trying to represent invalid byte sequences. Just use standard string literals.
      3c091037
    • Kenton Varda's avatar
      Support encoding to and from wchar_t arrays. · ff9c3321
      Kenton Varda authored
      Different platforms have different sizes for wchar_t. For example:
      
      * Linux: 32-bit (originally intended as UCS-4, rarely used in practice)
      * Windows: 16-bit (originally intended as UCS-2, but now probably treated as UTF-16)
      * BeOS: 8-bit (strictly intended to be UTF-8)
      
      For KJ purposes, we'll assume wchar_t arrays use the UTF encoding appropriate to their size, whatever that may be on the target platform.
      
      This is mainly being added because the Win32 API uses wchar_t heavily.
      ff9c3321
    • Kenton Varda's avatar
      Extend Unicode encoders to support 'WTF-8'. · 5483d8f7
      Kenton Varda authored
      This allows arbitrary char16 arrays to round-trip through UTF-8 without losing information, even if the char16 arrays are not valid UTF-16.
      
      This is necessary e.g. for filesystem manipulation on Windows, where filenames contain 16-bit characters but valid UTF-16 is not enforced.
      
      Invalid UTF-16 represented in UTF-8 is affectionately known as WTF-8: http://simonsapin.github.io/wtf-8/
      5483d8f7
  2. 04 Dec, 2017 1 commit
    • Harris Hancock's avatar
      decodeBase64() reports errors required by HTML spec · f3e0ed22
      Harris Hancock authored
      This change modifies decodeBase64() to report errors as required by the WHATWG HTML spec's atob() JavaScript function. Notably, it reports errors for non-whitespace characters outside of the valid base64 character range ([+/0-9A-Za-z=]), and performs sanity checks on padding and input length.
      
      I took care to keep the algorithm single-pass, and to support streaming via multiple calls of base64_decode_block(), though we don't currently expose that functionality.
      f3e0ed22
  3. 14 Oct, 2017 1 commit
    • Edward Catmur's avatar
      Don't read past the end of the decode out buffer. · c2fbfc70
      Edward Catmur authored
      If we finish decoding in step_a state, there is no current output character, so reading *plainchar will either be an uninitialized read or (if the output buffer is minimally sized) a past-the-end read.
      
      Detected by -fsanitize=address.
      c2fbfc70
  4. 12 Oct, 2017 2 commits
  5. 30 May, 2017 1 commit
  6. 23 May, 2017 4 commits
  7. 22 May, 2017 1 commit