Doc updates.

ac4f403c · Kenton Varda · df371fc2 · ac4f403c · ac4f403c · ac4f403c
Commit ac4f403c authored Aug 29, 2013 by Kenton Varda
Showing with 82 additions and 16 deletions

cxx.md doc/cxx.md +42 -6

encoding.md doc/encoding.md +24 -2

install.md doc/install.md +5 -4

language.md doc/language.md +0 -0

otherlang.md doc/otherlang.md +11 -4

No files found.
--- a/doc/cxx.md
+++ b/doc/cxx.md
@@ -162,6 +162,9 @@ together -- KJ is simply the stuff which is not specific to Cap'n Proto serializ
 useful to others independently of Cap'n Proto.  For now, the the two are distributed together.  The
 name "KJ" has no particular meaning; it was chosen to be short and easy-to-type.

+As of v0.3, KJ is distributed with Cap'n Proto but built as a separate library.  You may need
+to explicitly link against libraries:  `-lcapnp -lkj`
+
 ## Generating Code

 To generate C++ code from your `.capnp` [interface definition](language.html), run:
@@ -170,6 +173,8 @@ To generate C++ code from your `.capnp` [interface definition](language.html), r

 This will create `myproto.capnp.h` and `myproto.capnp.c++` in the same directory as `myproto.capnp`.

+To use this code in your app, you must link against both `libcapnp` and `libkj`.
+
 ### Setting a Namespace

 You probably want your generated types to live in a C++ namespace.  You will need to import
@@ -278,14 +283,26 @@ void setMyListField(::capnp::List<double>::Reader value);
 ::capnp::List<double>::Builder initMyListField(size_t size);
 {% endhighlight %}

+### Groups
+
+Groups look a lot like a combination of a nested type and a field of that type, except that you
+cannot set, adopt, or disown a group -- you can only get and init it.
+
 ### Unions

-For each union `foo` declared in the struct, the struct's reader and builder have a method
-`getFoo()` which returns a reader/builder for the union.  The union reader/builder has accessors
-for each field exactly like a struct's accessors.  It also has an accessor `which()` which returns
-an enum indicating which member of the union is currently set.  Setting any member of the union
-updates the value returned by `which()`.  Getting a member other than the currently-set member
-crashes in debug mode or returns garbage when `NDEBUG` is defined.
+A named union (as opposed to an unnamed one) works just like a group, except with some additions:
+
+* For each field `foo`, the union reader and builder have a method `isFoo()` which returns true
+  if `foo` is the currently-set field in the union.
+* The union reader and builder also have a method `which()` that returns an enum value indicating
+  which field is currently set.
+* Calling the set, init, or adopt accessors for a field makes it the currently-set field.
+* Calling the get or disown accessors on a field that isn't currently set will throw an
+  exception in debug mode or return garbage when `NDEBUG` is defined.
+
+Unnamed unions differ from named unions only in that the accessor methods from the union's members
+are added directly to the containing type's reader and builder, rather than generating a nested
+type.

 See the [example](#example_usage) at the top of the page for an example of unions.

@@ -337,6 +354,12 @@ implement `size()` and `operator[]` methods.  `Builder::operator[]` even returns
 (unlike with `List<T>`).  `Text::Reader` additionally has a method `cStr()` which returns a
 NUL-terminated `const char*`.

+As a special convenience, if you are using GCC 4.8+ or Clang, `Text::Reader` (and its underlying
+type, `kj::StringPtr`) can be implicitly converted to and from `std::string` format.  This is
+accomplished without actually `#include`ing `<string>`, since some clients do not want to rely
+on this rather-bulky header.  In fact, any class which defines a `.c_str()` method will be
+implicitly convertible in this way.  Unfortunately, this trick doesn't work on GCC 4.7.
+
 ### Interfaces

 Interfaces (RPC) are not yet implemented at this time.
@@ -543,6 +566,11 @@ Notes about the dynamic API:
  use the Dynamic API to manipulate objects of these types.  `MessageBuilder` and `MessageReader`
  have methods for accessing the message root using a dynamic schema.

+* While `SchemaLoader` loads binary schemas, you can also parse directly from text using
+  `SchemaParser` (`capnp/schema-parser.h`).  However, this requires linking against `libcapnpc`
+  (in addition to `libcapnp` and `libkj`) -- this code is bulky and not terribly efficient.  If
+  you can arrange to use only binary schemas at runtime, you'll be better off.
+
 * Unlike with Protobufs, there is no "global registry" of compiled-in types.  To get the schema
  for a compiled-in type, use `capnp::Schema::from<MyType>()`.

@@ -552,6 +580,14 @@ Notes about the dynamic API:
  dynamic API or the schema API, you do not even need to link their implementations into your
  executable.

+* The dynamic API performs type checks at runtime.  In case of error, it will throw an exception.
+  If you compile with `-fno-exceptions`, it will crash instead.  Correct usage of the API should
+  never throw, but bugs happen.  Enabling and catching exceptions will make your code more robust.
+
+* Loading user-provided schemas has security implications: it greatly increases the attack
+  surface of the Cap'n Proto library.  In particular, it is easy for an attacker to trigger
+  exceptions.  To protect yourself, you are strongly advised to enable exceptions and catch them.
+
 ## Orphans

 An "orphan" is a Cap'n Proto object that is disconnected from the message structure.  That is,

--- a/doc/encoding.md
+++ b/doc/encoding.md
@@ -316,14 +316,36 @@ In addition to the above, there are two tag values which are treated specially:
 * 0x00:  The tag is followed by a single byte which indicates a count of consecutive zero-valued
  words, minus 1.  E.g. if the tag 0x00 is followed by 0x05, the sequence unpacks to 6 words of
  zero.
-* 0xff:  The tag is followed by the bytes of the word as described above, but after those bytes is
-  another byte with value N.  Following that byte is N unpacked words that should be copied
+
+  Or, put another way: the tag is first decoded as if it were not special.  Since none of the bits
+  are set, it is followed by no bytes and expands to a word full of zeros.  After that, the next
+  byte is interpreted as a count of _additional_ words that are also all-zero.
+
+* 0xff:  The tag is followed by the bytes of the word (as if it weren't special), but after those
+  bytes is another byte with value N.  Following that byte is N unpacked words that should be copied
  directly.  These unpacked words may or may not contain zeros -- it is up to the compressor to
  decide when to end the unpacked span and return to packing each word.  The purpose of this rule
  is to minimize the impact of packing on data that doesn't contain any zeros -- in particular,
  long text blobs.  Because of this rule, the worst-case space overhead of packing is 2 bytes per
  2 KiB of input (256 words = 2KiB).

+Examples:
+
+    unpacked (hex):  00 (x 32 bytes)
+    packed (hex):  00 03
+
+    unpacked (hex):  8a (x 32 bytes)
+    packed (hex):  ff 8a (x 8 bytes) 03 8a (x 24 bytes)
+
+Notice that both of the special cases begin by treating the tag as if it weren't special.  This
+is intentionally designed to make encoding faster:  you can compute the tag value and encode the
+bytes in a single pass through the input word.  Only after you've finished with that word do you
+need to check whether the tag ended up being 0x00 or 0xff.
+
+It is possible to write both an encoder and a decoder which only branch at the end of each word,
+and only to handle the two special tags.  It is not necessary to branch on every byte.  See the
+C++ reference implementation for an example.
+
 Packing is normally applied on top of the standard stream framing described in the previous
 section.


--- a/doc/install.md
+++ b/doc/install.md
@@ -101,11 +101,12 @@ cd capnproto-c++-0.2.1
 make -j6 check
 sudo make install</code></pre>

-This will install `capnp`, the Cap'n Proto command-line tool.  It will also install `libcapnp` in
-`/usr/local/lib` and headers in `/usr/local/include/capnp` and `/usr/local/include/kj`.
+This will install `capnp`, the Cap'n Proto command-line tool.  It will also install `libcapnp`,
+`libcapnpc`, and `libkj` in `/usr/local/lib` and headers in `/usr/local/include/capnp` and
+`/usr/local/include/kj`.

-On Linux, if running `capnp` immediately after installation produces an error saying that the
-`libcapnp` library does not exist, run `sudo ldconfig` and try again.
+On Linux, if running `capnp` immediately after installation produces an error complaining about
+missing libraries, run `sudo ldconfig` and try again.

 ### Building from Git with Autotools


--- a/doc/language.md
+++ b/doc/language.md
--- a/doc/otherlang.md
+++ b/doc/otherlang.md
@@ -87,7 +87,14 @@ support Cap'n Proto in a dynamic language, then, is to wrap the C++ library, in
 [C++ dynamic API](cxx.html#dynamic_reflection).  This way you get reasonable performance while
 still avoiding the need to generate any code specific to each schema.

-Of course, you still need to parse the schema.  Version 0.3 of Cap'n Proto will introduce a public
-C++ API to the schema parser which your bindings can invoke.  By the time you read this, the API
-is probably already available at git head, or will be within a few days;
-[send us a note](https://groups.google.com/group/capnproto) if you want to try it out.
+To parse the schema files, use the `capnp::SchemaParser` class (defined in `capnp/schema-parser.h`).
+This way, schemas are loaded at the same time as all the rest of the program's code -- at startup.
+An advanced implementation might consider caching the compiled schemas in binary format, then
+loading the cached version using `capnp::SchemaLoader`, similar to the way e.g. Python caches
+compiled source files as `.pyc` bytecode, but that's up to you.
+
+### Testing Your Implementation
+
+The easiest way to test that you've implemented the spec correctly is to use the `capnp` tool
+to [encode](http://localhost:4000/capnproto/capnp-tool.html#encoding_messages) test inputs and
+[decode](http://localhost:4000/capnproto/capnp-tool.html#decoding_messages) outputs.