Doc updates.

ac4f403c · Kenton Varda · df371fc2 · ac4f403c · ac4f403c · ac4f403c
Commit ac4f403c authored Aug 29, 2013 by Kenton Varda
Showing with 216 additions and 49 deletions

cxx.md doc/cxx.md +42 -6

encoding.md doc/encoding.md +24 -2

install.md doc/install.md +5 -4

language.md doc/language.md +134 -33

otherlang.md doc/otherlang.md +11 -4

No files found.
--- a/doc/cxx.md
+++ b/doc/cxx.md
@@ -162,6 +162,9 @@ together -- KJ is simply the stuff which is not specific to Cap'n Proto serializ
 useful to others independently of Cap'n Proto.  For now, the the two are distributed together.  The
 name "KJ" has no particular meaning; it was chosen to be short and easy-to-type.
+As of v0.3, KJ is distributed with Cap'n Proto but built as a separate library.  You may need
+to explicitly link against libraries:  `-lcapnp -lkj`
 ## Generating Code
 To generate C++ code from your `.capnp` [interface definition](language.html), run:
@@ -170,6 +173,8 @@ To generate C++ code from your `.capnp` [interface definition](language.html), r
 This will create `myproto.capnp.h` and `myproto.capnp.c++` in the same directory as `myproto.capnp`.
+To use this code in your app, you must link against both `libcapnp` and `libkj`.
 ### Setting a Namespace
 You probably want your generated types to live in a C++ namespace.  You will need to import
@@ -278,14 +283,26 @@ void setMyListField(::capnp::List<double>::Reader value);
 ::capnp::List<double>::Builder initMyListField(size_t size);
 {% endhighlight %}
+### Groups
+Groups look a lot like a combination of a nested type and a field of that type, except that you
+cannot set, adopt, or disown a group -- you can only get and init it.
 ### Unions
-For each union `foo` declared in the struct, the struct's reader and builder have a method
+A named union (as opposed to an unnamed one) works just like a group, except with some additions:
-`getFoo()` which returns a reader/builder for the union.  The union reader/builder has accessors
-for each field exactly like a struct's accessors.  It also has an accessor `which()` which returns
+* For each field `foo`, the union reader and builder have a method `isFoo()` which returns true
-an enum indicating which member of the union is currently set.  Setting any member of the union
+  if `foo` is the currently-set field in the union.
-updates the value returned by `which()`.  Getting a member other than the currently-set member
+* The union reader and builder also have a method `which()` that returns an enum value indicating
-crashes in debug mode or returns garbage when `NDEBUG` is defined.
+  which field is currently set.
+* Calling the set, init, or adopt accessors for a field makes it the currently-set field.
+* Calling the get or disown accessors on a field that isn't currently set will throw an
+  exception in debug mode or return garbage when `NDEBUG` is defined.
+Unnamed unions differ from named unions only in that the accessor methods from the union's members
+are added directly to the containing type's reader and builder, rather than generating a nested
+type.
 See the [example](#example_usage) at the top of the page for an example of unions.
@@ -337,6 +354,12 @@ implement `size()` and `operator[]` methods.  `Builder::operator[]` even returns
 (unlike with `List<T>`).  `Text::Reader` additionally has a method `cStr()` which returns a
 NUL-terminated `const char*`.
+As a special convenience, if you are using GCC 4.8+ or Clang, `Text::Reader` (and its underlying
+type, `kj::StringPtr`) can be implicitly converted to and from `std::string` format.  This is
+accomplished without actually `#include`ing `<string>`, since some clients do not want to rely
+on this rather-bulky header.  In fact, any class which defines a `.c_str()` method will be
+implicitly convertible in this way.  Unfortunately, this trick doesn't work on GCC 4.7.
 ### Interfaces
 Interfaces (RPC) are not yet implemented at this time.
@@ -543,6 +566,11 @@ Notes about the dynamic API:
  use the Dynamic API to manipulate objects of these types.  `MessageBuilder` and `MessageReader`
  have methods for accessing the message root using a dynamic schema.
+* While `SchemaLoader` loads binary schemas, you can also parse directly from text using
+  `SchemaParser` (`capnp/schema-parser.h`).  However, this requires linking against `libcapnpc`
+  (in addition to `libcapnp` and `libkj`) -- this code is bulky and not terribly efficient.  If
+  you can arrange to use only binary schemas at runtime, you'll be better off.
 * Unlike with Protobufs, there is no "global registry" of compiled-in types.  To get the schema
  for a compiled-in type, use `capnp::Schema::from<MyType>()`.
@@ -552,6 +580,14 @@ Notes about the dynamic API:
  dynamic API or the schema API, you do not even need to link their implementations into your
  executable.
+* The dynamic API performs type checks at runtime.  In case of error, it will throw an exception.
+  If you compile with `-fno-exceptions`, it will crash instead.  Correct usage of the API should
+  never throw, but bugs happen.  Enabling and catching exceptions will make your code more robust.
+* Loading user-provided schemas has security implications: it greatly increases the attack
+  surface of the Cap'n Proto library.  In particular, it is easy for an attacker to trigger
+  exceptions.  To protect yourself, you are strongly advised to enable exceptions and catch them.
 ## Orphans
 An "orphan" is a Cap'n Proto object that is disconnected from the message structure.  That is,

--- a/doc/encoding.md
+++ b/doc/encoding.md
@@ -316,14 +316,36 @@ In addition to the above, there are two tag values which are treated specially:
 * 0x00:  The tag is followed by a single byte which indicates a count of consecutive zero-valued
  words, minus 1.  E.g. if the tag 0x00 is followed by 0x05, the sequence unpacks to 6 words of
  zero.
-* 0xff:  The tag is followed by the bytes of the word as described above, but after those bytes is
-  another byte with value N.  Following that byte is N unpacked words that should be copied
+  Or, put another way: the tag is first decoded as if it were not special.  Since none of the bits
+  are set, it is followed by no bytes and expands to a word full of zeros.  After that, the next
+  byte is interpreted as a count of _additional_ words that are also all-zero.
+* 0xff:  The tag is followed by the bytes of the word (as if it weren't special), but after those
+  bytes is another byte with value N.  Following that byte is N unpacked words that should be copied
  directly.  These unpacked words may or may not contain zeros -- it is up to the compressor to
  decide when to end the unpacked span and return to packing each word.  The purpose of this rule
  is to minimize the impact of packing on data that doesn't contain any zeros -- in particular,
  long text blobs.  Because of this rule, the worst-case space overhead of packing is 2 bytes per
  2 KiB of input (256 words = 2KiB).
+Examples:
+    unpacked (hex):  00 (x 32 bytes)
+    packed (hex):  00 03
+    unpacked (hex):  8a (x 32 bytes)
+    packed (hex):  ff 8a (x 8 bytes) 03 8a (x 24 bytes)
+Notice that both of the special cases begin by treating the tag as if it weren't special.  This
+is intentionally designed to make encoding faster:  you can compute the tag value and encode the
+bytes in a single pass through the input word.  Only after you've finished with that word do you
+need to check whether the tag ended up being 0x00 or 0xff.
+It is possible to write both an encoder and a decoder which only branch at the end of each word,
+and only to handle the two special tags.  It is not necessary to branch on every byte.  See the
+C++ reference implementation for an example.
 Packing is normally applied on top of the standard stream framing described in the previous
 section.

--- a/doc/install.md
+++ b/doc/install.md
@@ -101,11 +101,12 @@ cd capnproto-c++-0.2.1
 make -j6 check
 sudo make install</code></pre>
-This will install `capnp`, the Cap'n Proto command-line tool.  It will also install `libcapnp` in
+This will install `capnp`, the Cap'n Proto command-line tool.  It will also install `libcapnp`,
-`/usr/local/lib` and headers in `/usr/local/include/capnp` and `/usr/local/include/kj`.
+`libcapnpc`, and `libkj` in `/usr/local/lib` and headers in `/usr/local/include/capnp` and
+`/usr/local/include/kj`.
-On Linux, if running `capnp` immediately after installation produces an error saying that the
+On Linux, if running `capnp` immediately after installation produces an error complaining about
-`libcapnp` library does not exist, run `sudo ldconfig` and try again.
+missing libraries, run `sudo ldconfig` and try again.
 ### Building from Git with Autotools

--- a/doc/language.md
+++ b/doc/language.md
@@ -12,7 +12,7 @@ manipulate that message type in your desired language.
 For example:
-{% highlight python %}
+{% highlight capnproto %}
 # unique file ID, generated by `capnp id`
 @0xdbb9ad1f14bf0b36;
@@ -137,45 +137,60 @@ union declarations do not look like types.
 struct Person {
  # ...
-  employment @4 union {
+  employment :union {
-    unemployed @5 :Void;
+    unemployed @4 :Void;
-    employer @6 :Company;
+    employer @5 :Company;
-    school @7 :School;
+    school @6 :School;
-    selfEmployed @8 :Void;
+    selfEmployed @7 :Void;
    # We assume that a person is only one of these.
  }
 }
 {% endhighlight %}
+Additionally, unions can be unnamed.  Each struct can contain no more than one unnamed union.  Use
+unnamed unions in cases where you would struggle to think of an appropriate name for the union,
+because the union represents the main body of the struct.
+{% highlight capnp %}
+struct Shape {
+  area @0 :Float64;
+  union {
+    circle @1 :Float64;      # radius
+    square @2 :Float64;      # width
+  }
+}
+{% endhighlight %}
 Notes:
-* Unions and their members are numbered in the same number space as fields of the containing
+* Unions members are numbered in the same number space as fields of the containing struct.
-  struct. Remember that the purpose of the numbers is to indicate the evolution order of the
+  Remember that the purpose of the numbers is to indicate the evolution order of the
-  struct. The system needs to know when the union and each of its members was declared relative to
+  struct. The system needs to know when the union fields were declared relative to the non-union
-  the non-union fields. Also note that no more than one element of the union is allowed to have a
+  fields.
-  number less than the union's number, as unionizing two or more pre-existing fields would change
-  their layout.
 * Notice that we used the "useless" `Void` type here. We don't have any extra information to store
  for the `unemployed` or `selfEmployed` cases, but we still want the union to distinguish these
  states from others.
+* By default, when a struct is initialized, the lowest-numbered field in the union is "set".  If
+  you do not want any field set by default, simply declare a field called "unset" and make it the
+  lowest-numbered field.
+* You can move an exsiting field into a new union without breaking compatibility with existing
+  data, as long as all of the other fields in the union are new.  Since the existing field is
+  necessarily the lowest-numebered in the union, it will be the union's default field.
 **Wait, why aren't unions first-class types?**
 Requiring unions to be declared inside a struct, rather than living as free-standing types, has
 some important advantages:
-* If unions were first-class types, then either (a) all unions would have to have a fixed size of
+* If unions were first-class types, then union members would clearly have to be numbered separately
-  18 bytes (a data word, a pointer, and a 2-byte tag) regardless of their members; or (b) unions
+  from the containing type's fields.  This means that the compiler, when deciding how to position
-  would have to be separate objects embedded by pointer, adding 10-14 bytes of overhead to every
+  the union in its containing struct, would have to conservatively assume that any kind of new
-  union (the pointer plus 2-6 bytes lost to padding).
+  field might be added to the union in the future.  To support this, all unions would have to
+  be allocated as separate objects embedded by pointer, wasting space.
-  If neither of these conditions were true, then adding a new field to a union would potentially
-  alter the layout of any struct containing an instance of that union in a backwards-incompatible
-  way.  On the other hand, if each union type is bound to its containing struct, then its fields
-  can be numbered in the same space as the struct's fields, which allows the layout algorithm to
-  extend the union for new fields without disrupting the positioning of existing fields.  All in
-  all, space is saved.
 * A free-standing union would be a liability for protocol evolution, because no additional data
  can be attached to it later on.  Consider, for example, a type which represents a parser token.
@@ -196,8 +211,73 @@ some important advantages:
 Cap'n Proto's unconventional approach to unions provides these advantages without any real down
 side:  where you would conventionally define a free-standing union type, in Cap'n Proto you
-may simply define a struct type that contains only that union, and you have achieved the same
+may simply define a struct type that contains only that union (probably unnamed), and you have
-effect.  Thus, aside from being slightly unintuitive, it is strictly superior.
+achieved the same effect.  Thus, aside from being slightly unintuitive, it is strictly superior.
+### Groups
+A group is a set of fields that are encapsulated in their own scope.
+{% highlight capnp %}
+struct Person {
+  # ...
+  # Note:  This is a terrible way to use groups, and meant
+  #   only to demonstrate the syntax.
+  address :group {
+    houseNumber @8 :UInt32;
+    street @9 :Text;
+    city @10 :Text;
+    country @11 :Text;
+  }
+}
+{% endhighlight %}
+Interface-wise, the above group behaves as if you had defined a nested struct called `Address` and
+then a field `address :Address`.  However, a gorup is _not_ a separate object from its containing
+struct: the fields are numbered in the same space as the containing struct's fields, and are laid
+out exactly the same as if they hadn't been grouped at all.  Essentially, a group is just a
+namespace.
+Groups on their own (as in the above example) are useless, almost as much so as the `Void` type.
+They become interesting when used together with unions.
+{% highlight capnp %}
+struct Shape {
+  area @0 :Float64;
+  union {
+    circle :group {
+      radius @1 :Float64;
+    }
+    rectangle :group {
+      width @2 :Float64;
+      height @3 :Float64;
+    }
+  }
+}
+{% endhighlight %}
+There are two main reason to use groups with unions:
+1. They are often more self-documenting.  Notice that `radius` is now a member of `circle`, so
+   we don't need a comment to explain that the value of `circle` is its radius.
+2. You can add additional members later on, without breaking compatibility.  Notice how we upgraded
+   `square` to `rectangle` above, adding a `height` field.  This definition is actually
+   wire-compatible with the previous version of the `Shape` example from the "union" section
+   (aside from the fact that `height` will always be zero when reading old data -- hey, it's not
+   a perfect example).  In real-world use, it is common to realize after the fact that you need to
+   add some information to a struct that only applies when one particular union field is set.
+   Without the ability to upgrade to a group, you would have to define the new field separately,
+   and have it waste space when not relevant.
+Note that a named union is actually exactly equivalent to a named group containing an unnamed
+union.
+**Wait, weren't groups considered a misfeature in Protobufs?  Why did you do this again?**
+They are useful in unions, which Protobufs did not have.  Meanwhile, you cannot have a "repeated
+group" in Cap'n Proto, which was the case that got into the most trouble with Protobufs.
 ### Dynamically-typed Fields
@@ -481,17 +561,38 @@ A protocol can be changed in the following ways without breaking backwards-compa
 * New types, constants, and aliases can be added anywhere, since they obviously don't affect the
  encoding of any existing type.
-* New fields, values, and methods may be added to structs, enums, and interfaces, respectively,
+* New fields, enumerants, and methods may be added to structs, enums, and interfaces, respectively,
-  with the numbering rules described earlier.
+  as long as each new member's number is larger than all previous members.  Similarly, new fields
+  may be addded to existing groups and unions.
 * New parameters may be added to a method.  The new parameters must be added to the end of the
  parameter list and must have default values.
-* Any symbolic name can be changed, as long as the ordinal numbers stay the same.
+* Members can be re-arranged in the source code, so long as their numbers stay the same.
-* Types definitions can be moved to different scopes.
+* Any symbolic name can be changed, as long as the type ID / ordinal numbers stay the same.  Note
+  that type declarations have an implicit ID generated based on their name and parent's ID, but
+  you can use `capnp compile -ocapnp myschema.capnp` to find out what that number is, and then
+  declare it explicitly after your rename.
+* Types definitions can be moved to different scopes, as long as the type ID is declared
+  explicitly.
 * A field of type `List(T)`, where `T` is a primitive type, blob, or list, may be changed to type
  `List(U)`, where `U` is a struct type whose `@0` field is of type `T`.  This rule is useful when
  you realize too late that you need to attach some extra data to each element of your list.
  Without this rule, you would be stuck defining parallel lists, which are ugly and error-prone.
+* A field can be moved into a group or a union, as long as the group/union and all other fields
-Any other change should be assumed NOT to be safe.  Also, these rules only apply to the Cap'n Proto
+  within it are new.  In other words, a field can be replaced with a group or union containing an
-native encoding.  It is sometimes useful to transcode Cap'n Proto types to other formats, like
+  equivalent field and some new fields.
-JSON, which may have different rules (e.g., field names cannot change in JSON).
+Any other change should be assumed NOT to be safe.  In particular:
+* You cannot change a field, method, or enumerant's number.
+* You cannot change a field or method parameter's type or default value, except as described above.
+* You cannot change a type's ID.
+* You cannot change the name of a type that doesn't have an explicit ID, as the implicit ID is
+  generated based in part on the type name.
+* You cannot move a type to a different scope or file unless it has an explicit ID, as the implicit
+  ID is based in part on the scope's ID.
+* You cannot move an existing field into or out of an existing union, nor can you form a new union
+  containing more than one existing field.
+Also, these rules only apply to the Cap'n Proto native encoding.  It is sometimes useful to
+transcode Cap'n Proto types to other formats, like JSON, which may have different rules (e.g.,
+field names cannot change in JSON).
--- a/doc/otherlang.md
+++ b/doc/otherlang.md
@@ -87,7 +87,14 @@ support Cap'n Proto in a dynamic language, then, is to wrap the C++ library, in
 [C++ dynamic API](cxx.html#dynamic_reflection).  This way you get reasonable performance while
 still avoiding the need to generate any code specific to each schema.
-Of course, you still need to parse the schema.  Version 0.3 of Cap'n Proto will introduce a public
+To parse the schema files, use the `capnp::SchemaParser` class (defined in `capnp/schema-parser.h`).
-C++ API to the schema parser which your bindings can invoke.  By the time you read this, the API
+This way, schemas are loaded at the same time as all the rest of the program's code -- at startup.
-is probably already available at git head, or will be within a few days;
+An advanced implementation might consider caching the compiled schemas in binary format, then
-[send us a note](https://groups.google.com/group/capnproto) if you want to try it out.
+loading the cached version using `capnp::SchemaLoader`, similar to the way e.g. Python caches
+compiled source files as `.pyc` bytecode, but that's up to you.
+### Testing Your Implementation
+The easiest way to test that you've implemented the spec correctly is to use the `capnp` tool
+to [encode](http://localhost:4000/capnproto/capnp-tool.html#encoding_messages) test inputs and
+[decode](http://localhost:4000/capnproto/capnp-tool.html#decoding_messages) outputs.