Commit ac4f403c authored by Kenton Varda's avatar Kenton Varda

Doc updates.

parent df371fc2
......@@ -162,6 +162,9 @@ together -- KJ is simply the stuff which is not specific to Cap'n Proto serializ
useful to others independently of Cap'n Proto. For now, the the two are distributed together. The
name "KJ" has no particular meaning; it was chosen to be short and easy-to-type.
As of v0.3, KJ is distributed with Cap'n Proto but built as a separate library. You may need
to explicitly link against libraries: `-lcapnp -lkj`
## Generating Code
To generate C++ code from your `.capnp` [interface definition](language.html), run:
......@@ -170,6 +173,8 @@ To generate C++ code from your `.capnp` [interface definition](language.html), r
This will create `myproto.capnp.h` and `myproto.capnp.c++` in the same directory as `myproto.capnp`.
To use this code in your app, you must link against both `libcapnp` and `libkj`.
### Setting a Namespace
You probably want your generated types to live in a C++ namespace. You will need to import
......@@ -278,14 +283,26 @@ void setMyListField(::capnp::List<double>::Reader value);
::capnp::List<double>::Builder initMyListField(size_t size);
{% endhighlight %}
### Groups
Groups look a lot like a combination of a nested type and a field of that type, except that you
cannot set, adopt, or disown a group -- you can only get and init it.
### Unions
For each union `foo` declared in the struct, the struct's reader and builder have a method
`getFoo()` which returns a reader/builder for the union. The union reader/builder has accessors
for each field exactly like a struct's accessors. It also has an accessor `which()` which returns
an enum indicating which member of the union is currently set. Setting any member of the union
updates the value returned by `which()`. Getting a member other than the currently-set member
crashes in debug mode or returns garbage when `NDEBUG` is defined.
A named union (as opposed to an unnamed one) works just like a group, except with some additions:
* For each field `foo`, the union reader and builder have a method `isFoo()` which returns true
if `foo` is the currently-set field in the union.
* The union reader and builder also have a method `which()` that returns an enum value indicating
which field is currently set.
* Calling the set, init, or adopt accessors for a field makes it the currently-set field.
* Calling the get or disown accessors on a field that isn't currently set will throw an
exception in debug mode or return garbage when `NDEBUG` is defined.
Unnamed unions differ from named unions only in that the accessor methods from the union's members
are added directly to the containing type's reader and builder, rather than generating a nested
type.
See the [example](#example_usage) at the top of the page for an example of unions.
......@@ -337,6 +354,12 @@ implement `size()` and `operator[]` methods. `Builder::operator[]` even returns
(unlike with `List<T>`). `Text::Reader` additionally has a method `cStr()` which returns a
NUL-terminated `const char*`.
As a special convenience, if you are using GCC 4.8+ or Clang, `Text::Reader` (and its underlying
type, `kj::StringPtr`) can be implicitly converted to and from `std::string` format. This is
accomplished without actually `#include`ing `<string>`, since some clients do not want to rely
on this rather-bulky header. In fact, any class which defines a `.c_str()` method will be
implicitly convertible in this way. Unfortunately, this trick doesn't work on GCC 4.7.
### Interfaces
Interfaces (RPC) are not yet implemented at this time.
......@@ -543,6 +566,11 @@ Notes about the dynamic API:
use the Dynamic API to manipulate objects of these types. `MessageBuilder` and `MessageReader`
have methods for accessing the message root using a dynamic schema.
* While `SchemaLoader` loads binary schemas, you can also parse directly from text using
`SchemaParser` (`capnp/schema-parser.h`). However, this requires linking against `libcapnpc`
(in addition to `libcapnp` and `libkj`) -- this code is bulky and not terribly efficient. If
you can arrange to use only binary schemas at runtime, you'll be better off.
* Unlike with Protobufs, there is no "global registry" of compiled-in types. To get the schema
for a compiled-in type, use `capnp::Schema::from<MyType>()`.
......@@ -552,6 +580,14 @@ Notes about the dynamic API:
dynamic API or the schema API, you do not even need to link their implementations into your
executable.
* The dynamic API performs type checks at runtime. In case of error, it will throw an exception.
If you compile with `-fno-exceptions`, it will crash instead. Correct usage of the API should
never throw, but bugs happen. Enabling and catching exceptions will make your code more robust.
* Loading user-provided schemas has security implications: it greatly increases the attack
surface of the Cap'n Proto library. In particular, it is easy for an attacker to trigger
exceptions. To protect yourself, you are strongly advised to enable exceptions and catch them.
## Orphans
An "orphan" is a Cap'n Proto object that is disconnected from the message structure. That is,
......
......@@ -316,14 +316,36 @@ In addition to the above, there are two tag values which are treated specially:
* 0x00: The tag is followed by a single byte which indicates a count of consecutive zero-valued
words, minus 1. E.g. if the tag 0x00 is followed by 0x05, the sequence unpacks to 6 words of
zero.
* 0xff: The tag is followed by the bytes of the word as described above, but after those bytes is
another byte with value N. Following that byte is N unpacked words that should be copied
Or, put another way: the tag is first decoded as if it were not special. Since none of the bits
are set, it is followed by no bytes and expands to a word full of zeros. After that, the next
byte is interpreted as a count of _additional_ words that are also all-zero.
* 0xff: The tag is followed by the bytes of the word (as if it weren't special), but after those
bytes is another byte with value N. Following that byte is N unpacked words that should be copied
directly. These unpacked words may or may not contain zeros -- it is up to the compressor to
decide when to end the unpacked span and return to packing each word. The purpose of this rule
is to minimize the impact of packing on data that doesn't contain any zeros -- in particular,
long text blobs. Because of this rule, the worst-case space overhead of packing is 2 bytes per
2 KiB of input (256 words = 2KiB).
Examples:
unpacked (hex): 00 (x 32 bytes)
packed (hex): 00 03
unpacked (hex): 8a (x 32 bytes)
packed (hex): ff 8a (x 8 bytes) 03 8a (x 24 bytes)
Notice that both of the special cases begin by treating the tag as if it weren't special. This
is intentionally designed to make encoding faster: you can compute the tag value and encode the
bytes in a single pass through the input word. Only after you've finished with that word do you
need to check whether the tag ended up being 0x00 or 0xff.
It is possible to write both an encoder and a decoder which only branch at the end of each word,
and only to handle the two special tags. It is not necessary to branch on every byte. See the
C++ reference implementation for an example.
Packing is normally applied on top of the standard stream framing described in the previous
section.
......
......@@ -101,11 +101,12 @@ cd capnproto-c++-0.2.1
make -j6 check
sudo make install</code></pre>
This will install `capnp`, the Cap'n Proto command-line tool. It will also install `libcapnp` in
`/usr/local/lib` and headers in `/usr/local/include/capnp` and `/usr/local/include/kj`.
This will install `capnp`, the Cap'n Proto command-line tool. It will also install `libcapnp`,
`libcapnpc`, and `libkj` in `/usr/local/lib` and headers in `/usr/local/include/capnp` and
`/usr/local/include/kj`.
On Linux, if running `capnp` immediately after installation produces an error saying that the
`libcapnp` library does not exist, run `sudo ldconfig` and try again.
On Linux, if running `capnp` immediately after installation produces an error complaining about
missing libraries, run `sudo ldconfig` and try again.
### Building from Git with Autotools
......
......@@ -12,7 +12,7 @@ manipulate that message type in your desired language.
For example:
{% highlight python %}
{% highlight capnproto %}
# unique file ID, generated by `capnp id`
@0xdbb9ad1f14bf0b36;
......@@ -137,45 +137,60 @@ union declarations do not look like types.
struct Person {
# ...
employment @4 union {
unemployed @5 :Void;
employer @6 :Company;
school @7 :School;
selfEmployed @8 :Void;
employment :union {
unemployed @4 :Void;
employer @5 :Company;
school @6 :School;
selfEmployed @7 :Void;
# We assume that a person is only one of these.
}
}
{% endhighlight %}
Additionally, unions can be unnamed. Each struct can contain no more than one unnamed union. Use
unnamed unions in cases where you would struggle to think of an appropriate name for the union,
because the union represents the main body of the struct.
{% highlight capnp %}
struct Shape {
area @0 :Float64;
union {
circle @1 :Float64; # radius
square @2 :Float64; # width
}
}
{% endhighlight %}
Notes:
* Unions and their members are numbered in the same number space as fields of the containing
struct. Remember that the purpose of the numbers is to indicate the evolution order of the
struct. The system needs to know when the union and each of its members was declared relative to
the non-union fields. Also note that no more than one element of the union is allowed to have a
number less than the union's number, as unionizing two or more pre-existing fields would change
their layout.
* Unions members are numbered in the same number space as fields of the containing struct.
Remember that the purpose of the numbers is to indicate the evolution order of the
struct. The system needs to know when the union fields were declared relative to the non-union
fields.
* Notice that we used the "useless" `Void` type here. We don't have any extra information to store
for the `unemployed` or `selfEmployed` cases, but we still want the union to distinguish these
states from others.
* By default, when a struct is initialized, the lowest-numbered field in the union is "set". If
you do not want any field set by default, simply declare a field called "unset" and make it the
lowest-numbered field.
* You can move an exsiting field into a new union without breaking compatibility with existing
data, as long as all of the other fields in the union are new. Since the existing field is
necessarily the lowest-numebered in the union, it will be the union's default field.
**Wait, why aren't unions first-class types?**
Requiring unions to be declared inside a struct, rather than living as free-standing types, has
some important advantages:
* If unions were first-class types, then either (a) all unions would have to have a fixed size of
18 bytes (a data word, a pointer, and a 2-byte tag) regardless of their members; or (b) unions
would have to be separate objects embedded by pointer, adding 10-14 bytes of overhead to every
union (the pointer plus 2-6 bytes lost to padding).
If neither of these conditions were true, then adding a new field to a union would potentially
alter the layout of any struct containing an instance of that union in a backwards-incompatible
way. On the other hand, if each union type is bound to its containing struct, then its fields
can be numbered in the same space as the struct's fields, which allows the layout algorithm to
extend the union for new fields without disrupting the positioning of existing fields. All in
all, space is saved.
* If unions were first-class types, then union members would clearly have to be numbered separately
from the containing type's fields. This means that the compiler, when deciding how to position
the union in its containing struct, would have to conservatively assume that any kind of new
field might be added to the union in the future. To support this, all unions would have to
be allocated as separate objects embedded by pointer, wasting space.
* A free-standing union would be a liability for protocol evolution, because no additional data
can be attached to it later on. Consider, for example, a type which represents a parser token.
......@@ -196,8 +211,73 @@ some important advantages:
Cap'n Proto's unconventional approach to unions provides these advantages without any real down
side: where you would conventionally define a free-standing union type, in Cap'n Proto you
may simply define a struct type that contains only that union, and you have achieved the same
effect. Thus, aside from being slightly unintuitive, it is strictly superior.
may simply define a struct type that contains only that union (probably unnamed), and you have
achieved the same effect. Thus, aside from being slightly unintuitive, it is strictly superior.
### Groups
A group is a set of fields that are encapsulated in their own scope.
{% highlight capnp %}
struct Person {
# ...
# Note: This is a terrible way to use groups, and meant
# only to demonstrate the syntax.
address :group {
houseNumber @8 :UInt32;
street @9 :Text;
city @10 :Text;
country @11 :Text;
}
}
{% endhighlight %}
Interface-wise, the above group behaves as if you had defined a nested struct called `Address` and
then a field `address :Address`. However, a gorup is _not_ a separate object from its containing
struct: the fields are numbered in the same space as the containing struct's fields, and are laid
out exactly the same as if they hadn't been grouped at all. Essentially, a group is just a
namespace.
Groups on their own (as in the above example) are useless, almost as much so as the `Void` type.
They become interesting when used together with unions.
{% highlight capnp %}
struct Shape {
area @0 :Float64;
union {
circle :group {
radius @1 :Float64;
}
rectangle :group {
width @2 :Float64;
height @3 :Float64;
}
}
}
{% endhighlight %}
There are two main reason to use groups with unions:
1. They are often more self-documenting. Notice that `radius` is now a member of `circle`, so
we don't need a comment to explain that the value of `circle` is its radius.
2. You can add additional members later on, without breaking compatibility. Notice how we upgraded
`square` to `rectangle` above, adding a `height` field. This definition is actually
wire-compatible with the previous version of the `Shape` example from the "union" section
(aside from the fact that `height` will always be zero when reading old data -- hey, it's not
a perfect example). In real-world use, it is common to realize after the fact that you need to
add some information to a struct that only applies when one particular union field is set.
Without the ability to upgrade to a group, you would have to define the new field separately,
and have it waste space when not relevant.
Note that a named union is actually exactly equivalent to a named group containing an unnamed
union.
**Wait, weren't groups considered a misfeature in Protobufs? Why did you do this again?**
They are useful in unions, which Protobufs did not have. Meanwhile, you cannot have a "repeated
group" in Cap'n Proto, which was the case that got into the most trouble with Protobufs.
### Dynamically-typed Fields
......@@ -481,17 +561,38 @@ A protocol can be changed in the following ways without breaking backwards-compa
* New types, constants, and aliases can be added anywhere, since they obviously don't affect the
encoding of any existing type.
* New fields, values, and methods may be added to structs, enums, and interfaces, respectively,
with the numbering rules described earlier.
* New fields, enumerants, and methods may be added to structs, enums, and interfaces, respectively,
as long as each new member's number is larger than all previous members. Similarly, new fields
may be addded to existing groups and unions.
* New parameters may be added to a method. The new parameters must be added to the end of the
parameter list and must have default values.
* Any symbolic name can be changed, as long as the ordinal numbers stay the same.
* Types definitions can be moved to different scopes.
* Members can be re-arranged in the source code, so long as their numbers stay the same.
* Any symbolic name can be changed, as long as the type ID / ordinal numbers stay the same. Note
that type declarations have an implicit ID generated based on their name and parent's ID, but
you can use `capnp compile -ocapnp myschema.capnp` to find out what that number is, and then
declare it explicitly after your rename.
* Types definitions can be moved to different scopes, as long as the type ID is declared
explicitly.
* A field of type `List(T)`, where `T` is a primitive type, blob, or list, may be changed to type
`List(U)`, where `U` is a struct type whose `@0` field is of type `T`. This rule is useful when
you realize too late that you need to attach some extra data to each element of your list.
Without this rule, you would be stuck defining parallel lists, which are ugly and error-prone.
Any other change should be assumed NOT to be safe. Also, these rules only apply to the Cap'n Proto
native encoding. It is sometimes useful to transcode Cap'n Proto types to other formats, like
JSON, which may have different rules (e.g., field names cannot change in JSON).
* A field can be moved into a group or a union, as long as the group/union and all other fields
within it are new. In other words, a field can be replaced with a group or union containing an
equivalent field and some new fields.
Any other change should be assumed NOT to be safe. In particular:
* You cannot change a field, method, or enumerant's number.
* You cannot change a field or method parameter's type or default value, except as described above.
* You cannot change a type's ID.
* You cannot change the name of a type that doesn't have an explicit ID, as the implicit ID is
generated based in part on the type name.
* You cannot move a type to a different scope or file unless it has an explicit ID, as the implicit
ID is based in part on the scope's ID.
* You cannot move an existing field into or out of an existing union, nor can you form a new union
containing more than one existing field.
Also, these rules only apply to the Cap'n Proto native encoding. It is sometimes useful to
transcode Cap'n Proto types to other formats, like JSON, which may have different rules (e.g.,
field names cannot change in JSON).
......@@ -87,7 +87,14 @@ support Cap'n Proto in a dynamic language, then, is to wrap the C++ library, in
[C++ dynamic API](cxx.html#dynamic_reflection). This way you get reasonable performance while
still avoiding the need to generate any code specific to each schema.
Of course, you still need to parse the schema. Version 0.3 of Cap'n Proto will introduce a public
C++ API to the schema parser which your bindings can invoke. By the time you read this, the API
is probably already available at git head, or will be within a few days;
[send us a note](https://groups.google.com/group/capnproto) if you want to try it out.
To parse the schema files, use the `capnp::SchemaParser` class (defined in `capnp/schema-parser.h`).
This way, schemas are loaded at the same time as all the rest of the program's code -- at startup.
An advanced implementation might consider caching the compiled schemas in binary format, then
loading the cached version using `capnp::SchemaLoader`, similar to the way e.g. Python caches
compiled source files as `.pyc` bytecode, but that's up to you.
### Testing Your Implementation
The easiest way to test that you've implemented the spec correctly is to use the `capnp` tool
to [encode](http://localhost:4000/capnproto/capnp-tool.html#encoding_messages) test inputs and
[decode](http://localhost:4000/capnproto/capnp-tool.html#decoding_messages) outputs.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment