Commits · cb482fe34143f035654304d7faa838ed988ba0be · submodule / capnproto

11 Dec, 2017 3 commits

Fix MSVC: \x sequences in UTF-8 literals are treated as \u sequences, not bytes. · 3c091037
Kenton Varda authored 7 years ago
```
So, don't use UTF-8 literals when trying to represent invalid byte sequences. Just use standard string literals.
```
3c091037

Support encoding to and from wchar_t arrays. · ff9c3321

Kenton Varda authored 7 years ago

Different platforms have different sizes for wchar_t. For example:

* Linux: 32-bit (originally intended as UCS-4, rarely used in practice)
* Windows: 16-bit (originally intended as UCS-2, but now probably treated as UTF-16)
* BeOS: 8-bit (strictly intended to be UTF-8)

For KJ purposes, we'll assume wchar_t arrays use the UTF encoding appropriate to their size, whatever that may be on the target platform.

This is mainly being added because the Win32 API uses wchar_t heavily.

ff9c3321

Extend Unicode encoders to support 'WTF-8'. · 5483d8f7

Kenton Varda authored 7 years ago

This allows arbitrary char16 arrays to round-trip through UTF-8 without losing information, even if the char16 arrays are not valid UTF-16.

This is necessary e.g. for filesystem manipulation on Windows, where filenames contain 16-bit characters but valid UTF-16 is not enforced.

Invalid UTF-16 represented in UTF-8 is affectionately known as WTF-8: http://simonsapin.github.io/wtf-8/

5483d8f7

04 Dec, 2017 1 commit

decodeBase64() reports errors required by HTML spec · f3e0ed22

Harris Hancock authored 7 years ago

This change modifies decodeBase64() to report errors as required by the WHATWG HTML spec's atob() JavaScript function. Notably, it reports errors for non-whitespace characters outside of the valid base64 character range ([+/0-9A-Za-z=]), and performs sanity checks on padding and input length.

I took care to keep the algorithm single-pass, and to support streaming via multiple calls of base64_decode_block(), though we don't currently expose that functionality.

f3e0ed22

14 Oct, 2017 1 commit

Don't read past the end of the decode out buffer. · c2fbfc70

Edward Catmur authored 7 years ago

If we finish decoding in step_a state, there is no current output character, so reading *plainchar will either be an uninitialized read or (if the output buffer is minimally sized) a past-the-end read.

Detected by -fsanitize=address.

c2fbfc70

12 Oct, 2017 2 commits

Revert "Don't read past the end of the base64 decode out buffer." · 5e41df4b
Kenton Varda authored 7 years ago

5e41df4b

Don't read past the end of the decode out buffer. · 0771d33b

Edward Catmur authored 7 years ago

If we finish decoding in step_a state, there is no current output character, so reading *plainchar will either be an uninitialized read or (if the output buffer is minimally sized) a past-the-end read.

Detected by -fsanitize=address.

0771d33b

30 May, 2017 1 commit
- The URI standard says to prefer upper-case hex for percent encoding. · 18e3c9f1
  Kenton Varda authored 7 years ago
  
  18e3c9f1
23 May, 2017 4 commits
- Try again to placate MSVC. · 745049be
  Kenton Varda authored 7 years ago
  
  745049be
- Fix CI breakages. · bc566b58
  Kenton Varda authored 7 years ago
  
  bc566b58
- Improve KJ encoding lib error handling: · df52bf86
  Kenton Varda authored 7 years ago
```
- Rename UtfResult -> EncodingResult
- Make it usable like a Maybe, so that we don't need separate "try" functions.
- Check errors in hex decoding and URI decoding.
```
  df52bf86
- Add CEscape to encodings. · 03800dfa
  Kenton Varda authored 7 years ago
  
  03800dfa
22 May, 2017 1 commit
- Add KJ utility functions to encode/decode blobs in common formats. · f74555b4
  Kenton Varda authored 7 years ago
```
In particular: UTF-{8,16,32}, Hex, URI encoding, and Base64
```
  f74555b4