Commit 02f54b0c authored by Kenton Varda's avatar Kenton Varda

Docs written, need review.

parent 1590c336
TODO: Documentation <img src='http://kentonv.github.com/capnproto/images/infinity-times-faster.png' style='width:334px; height:306px; float: right;'>
Cap'n Proto is an insanely fast data interchange format and capability-based RPC system. Think
JSON, except binary. Or think [Protocol Buffers](http://protobuf.googlecode.com), except faster.
In fact, in benchmarks, Cap'n Proto is INFINITY TIMES faster than Protocol Buffers.
[Read more...](http://kentonv.github.com/capnproto/)
...@@ -16,6 +16,7 @@ ...@@ -16,6 +16,7 @@
<!-- HEADER --> <!-- HEADER -->
<div id="header_wrap" class="outer"> <div id="header_wrap" class="outer">
<header class="inner"> <header class="inner">
<a id="discuss_banner" href="https://groups.google.com/group/capnproto">Discuss on Groups</a>
<a id="forkme_banner" href="https://github.com/kentonv/capnproto">View on GitHub</a> <a id="forkme_banner" href="https://github.com/kentonv/capnproto">View on GitHub</a>
<h1 id="project_title">Cap'n Proto</h1> <h1 id="project_title">Cap'n Proto</h1>
......
---
layout: page
---
# C++ Runtime
The Cap'n Proto C++ runtime implementation provides an easy-to-use interface for manipulating
messages backed by fast pointer arithmetic.
## Example Usage
For the Cap'n Proto definition:
{% highlight capnp %}
struct Person {
id @0 :UInt32;
name @1 :Text;
email @2 :Text;
}
struct AddressBook {
people @0 :List(Person);
}
{% endhighlight %}
You might write code like:
{% highlight c++ %}
#include "addressbook.capnp.h"
#include <capnproto/message.h>
#include <capnproto/serialize-packed.h>
void writeAddressBook(int fd) {
::capnproto::MallocMessageBuilder message;
AddressBook::Builder addressBook = message.initRoot<AddressBook>();
::capnproto::List<Person>::Builder people = addressBook.initPeople(2);
Person::Builder alice = people[0];
alice.setId(123);
alice.setName("Alice");
alice.setEmail("alice@example.com");
Person::Builder bob = people[1];
bob.setId(456);
bob.setName("Bob");
bob.setEmail("alice@example.com");
writePackedMessageToFd(fd, message);
}
void printAddressBook(int fd) {
::capnproto::PackedFdMessageReader message(fd);
AddressBook::Reader addressBook = message.getRoot<AddressBook>();
for (Person::Reader person : addressBook.getPeople()) {
std::cout << person.getName() << ": " << person.getEmail() << std::endl;
}
}
{% endhighlight %}
## C++ Feature Usage: C++11, Exceptions
This implementation makes use of C++11 features. If you are using GCC, you will need at least
version 4.7 to compile Cap'n Proto, with `--std=gnu++0x`. Other compilers have not been tested at
this time. In general, you do not need to understand C++11 features to _use_ Cap'n Proto; it all
happens under the hood.
This implementation prefers to handle errors using exceptions. Exceptions are only used in
circumstances that should never occur in normal opertaion. For example, exceptions are thrown
on assertion failures (indicating bugs in the code), network failures (indicating incorrect
configuration), and invalid input. Exceptions thrown by Cap'n Proto are never part of the
interface and never need to be caught in correct usage. The purpose of throwing exceptions is to
allow higher-level code a chance to recover from unexpected circumstances without disrupting other
work happening in the same process. For example, a server that handles requests from multiple
clients should, on exception, return an error to the client that caused the exception and close
that connection, but should continue handling other connections normally.
When Cap'n Proto code might throw an exception from a destructor, it first checks
`std::uncaught_exception()` to ensure that this is safe. If another exception is already active,
the new exception is assumed to be a side-effect of the main exception, and is either silently
swallowed or reported on a side channel.
In recognition of the fact that some teams prefer not to use exceptions, and that even enabling
exceptions in the compiler introduces overhead, Cap'n Proto allows you to disable them entirely
by registering your own exception callback. The callback will be called in place of throwing an
exception. The callback may abort the process, and is required to do so in certain circumstances
(e.g. when a fatal bug is detected). If the callback returns normally, Cap'n Proto will attempt
to continue by inventing "safe" values. This will lead to garbage output, but at least the program
will not crash. Your exception callback should set some sort of a flag indicating that an error
occurred, and somewhere up the stack you should check for that flag and cancel the operation.
(TODO: Document how to register this callback; this is not actually implemented as of this
writing.)
## Generating Code
To generate C++ code from your `.capnp` [interface definition](language.html), run:
capnpc myproto.capnp
This will create `myproto.capnp.h` and `myproto.capnp.c++` in the same directory as `myproto.capnp`.
_TODO: This will become more complicated later as we add support for more languages and such._
## Primitive Types
Primitive types map to the obvious C++ types:
* `Bool` -> `bool`
* `IntNN` -> `intNN_t`
* `UIntNN` -> `uintNN_t`
* `Float32` -> `float`
* `Float64` -> `double`
* `Void` -> `::capnproto::Void` (An enum with one value: `::capnproto::Void::VOID`)
## Structs
For each struct `Foo` in your interface, a C++ type named `Foo` generated. This type itself is
really just a namespace; it contains two important inner classes: `Reader` and `Builder`.
`Reader` represents a read-only instance of `Foo` while `Builder` represents a writable instance
(usually, one that you are building). Both classes behave like pointers, in that you can pass them
by value and they do not own the underlying data that they operate on. In other words,
`Foo::Builder` is like a pointer to a `Foo` while `Foo::Reader` is like a const pointer to a `Foo`.
For every field `bar` defined in `Foo`, `Foo::Reader` has a method `getBar()`. For primitive types,
`get` just returns the type, but for structs, lists, and blobs, it returns a `Reader` for the
type.
{% highlight c++ %}
// Example Reader methods:
// myPrimitiveField @0 :Int32;
int32_t getMyPrimitiveField();
// myTextField @1 :Text;
::capnproto::Text::Reader getMyTextField();
// (Note that Text::Reader may be implicitly cast to const char* and
// std::string.)
// myStructField @2 :MyStruct;
MyStruct::Reader getMyStructField();
// myListField @3 :List(Float64);
::capnproto::List<double> getMyListField();
{% endhighlight %}
`Foo::Builder`, meanwhile, has two or three methods for each field `bar`:
* `getBar()`: For primitives, returns the value. For composites, returns a Builder for the
composite. If a composite field has not been initialized (its pointer is null), it will be
initialized to a copy of the field's default value before returning.
* `setBar(x)`: For primitives, sets the value to X. For composites, sets the value to a copy of
x, which must be a Reader for the type.
* `initBar(n)`: Only for lists (including blobs). Sets the field to a newly-allocated list
of size n and returns a Builder for it. The elements of the list are initialized to their empty
state (zero for numbers, default values for structs).
* `initBar()`: Only for structs. Sets the field to a newly-allocated struct and returns a
Builder for it. Note that the newly-allocated struct is initialized to the default value for
the struct's _type_ (i.e., all-zero) rather than the default value for the field `bar` (if it
has one).
{% highlight c++ %}
// Example Builder methods:
// myPrimitiveField @0 :Int32;
int32_t getMyPrimitiveField();
void setMyPrimitiveField(int32_t value);
// myTextField @1 :Text;
::capnproto::Text::Builder getMyTextField();
void setMyTextField(::capnproto::Text::Reader value);
::capnproto::Text::Builder initMyTextField(size_t size);
// (Note that Text::Reader is implicitly constructable from const char*
// and std::string, and Text::Builder can be implicitly cast to
// these types.)
// myStructField @2 :MyStruct;
MyStruct::Builder getMyStructField();
void setMyStructField(MyStruct::Reader value);
MyStruct::Builder initMyStructField();
// myListField @3 :List(Float64);
::capnproto::List<double>::Builder getMyListField();
void setMyListField(::capnproto::List<double>::Reader value);
::capnproto::List<double>::Builder initMyListField(size_t size);
{% endhighlight %}
## Lists
Lists are represented by the type `capnproto::List<T>`, where `T` is any of the primitive types,
any Cap'n Proto user-defined type, `capnproto::Text`, `capnproto::Data`, or `capnproto::List<T>`.
The type `List<T>` itself is not instantiatable, but has two inner classes: `Reader` and `Builder`.
As with structs, these types behave like pointers to read-only and read-write data, respectively.
Both `Reader` and `Builder` implement `size()`, `operator[]`, `begin()`, and `end()`, as good C++
containers should. Note, though, that `operator[]` is read-only -- you cannot use it to assign
the element, because that would require returning a reference, which is impossible because the
underlying data may not be in your CPU's native format (e.g., wrong byte order). Instead, to
assign an element of a list, you must use `builder.set(index, value)`.
For `List<Foo>` where `Foo` is a non-primitive type, the type returned by `operator[]` and
`iterator::operator*()` is `Foo::Reader` (for `List<Foo>::Reader`) or `Foo::Builder`
(for `List<Foo>::Builder`). The builder's `set` method takes a `Foo::Reader` as its second
parameter.
For lists of lists or lists of blobs, the builder also has a method `init(index, size)` which sets
the element at the given index to a newly-allocated value with the given size and returns a builder
for it. Struct lists do not have an `init` method because all elements are initialized to empty
values when the list is created.
## Enums
Cap'n Proto enums become C++11 "enum classes". That means, they behave like any other enum, but
the enum's values are scoped within the type. E.g. for an enum `Foo` with value `bar`, you must
refer to the value as `Foo::BAR`. The enum class's base type is `uint16_t`.
To match prevaling C++ style, an enum's value names are converted to UPPERCASE_WITH_UNDERSCORES
(whereas in the definition language you'd write them in camelCase).
Keep in mind when writing `switch` blocks that an enum read off the wire may have a numeric
value that is not listed in its definition. This may be the case if the sender is using a newer
version of the protocol, or if the message is corrupt or malicious.
## Blobs (Text and Data)
Blobs are manipulated using the classes `capnproto::Text` and `capnproto::Data`. These classes are,
again, just containers for inner classes `Reader` and `Builder`. These classes are iterable and
implement `data()`, `size()`, and `operator[]` methods, similar to `std::string`.
`Builder::operator[]` even returns a reference (unlike with `List<T>`). `Text::Reader`
additionally has a method `c_str()` which returns a NUL-terminated `const char*`.
These classes strive to be easy to convert to other common representations of raw data.
Blob readers and builders can be implicitly converted to any class which takes
`(const char*, size_t)` as its constructor parameters, and from any class which has
`const char* data()` and `size_t size()` methods (in particular, `std::string`). Text
readers and builders can additionally be implicitly converted to and from NUL-terminated
`const char*`s.
Because of this, callers often don't need to know anything about the blob classes. If you use
`std::string` to represent blobs in your own code, or NUL-terminated character arrays for text,
just pretend that's what Cap'n Proto uses too.
## Interfaces
Interfaces (RPC) are not yet implemented at this time.
## Messages and I/O
To create a new message, you must start by creating a `capnproto::MessageBuilder`
(`capnproto/message.h`). This is an abstract type which you can implement yourself, but most users
will want to use `capnproto::MallocMessageBuilder`. Once your message is constructed, write it to
a file descriptor `capnproto::writeMessageToFd(fd, builder)` (`capnproto/serialize.h`) or
`capnproto::writePackedMessageToFd(fd, builder)` (`capnproto/serialize-packed.h`).
To read a message, you must create a `capnproto::MessageReader`, which is another abstract type.
Implementations are specific to the import source. You can use `capnproto::StreamFdMessageReader`
(`capnproto/serialize.h`) or `capnproto::PackedFdMessageReader` (`capnproto/serialize-packed.h`)
to read from file descriptors; both take the file descriptor as a constructor argument.
Note that if your stream contains additional data after the message, `PackedFdMessageReader` may
accidentally read some of that data, since it does buffered I/O. To make this work correctly, you
will need to set up a multi-use buffered stream. Buffered I/O may also be a good idea with
`StreamFdMessageReader` and also when writing, for performance reasons. See `capnproto/io.h` for
details.
There is an [example](#example_usage) of all this at the beginning of this page.
## Reference
The runtime library contains lots of useful features not described on this page. For now, the
best reference is the header files. See:
capnproto/list.h
capnproto/blob.h
capnproto/io.h
capnproto/serialized.h
capnproto/serialized-packed.h
...@@ -222,3 +222,58 @@ to an object in a segment that is full. If you can't allocate even one word in ...@@ -222,3 +222,58 @@ to an object in a segment that is full. If you can't allocate even one word in
the target resides, then you will need to allocate a landing pad in some other segment, and use the target resides, then you will need to allocate a landing pad in some other segment, and use
this double-far approach. This should be exceedingly rare in practice since pointers are normally this double-far approach. This should be exceedingly rare in practice since pointers are normally
set to point to _new_ objects. set to point to _new_ objects.
## Serialization Over a Stream
When transmitting a message, the segments must be framed in some way, i.e. to communicate the
number of segments and their sizes before communicating the actual data. The best framing approach
may differ depending on the medium -- for example, messages read via `mmap` or shared memory may
call for different approach than messages sent over a socket or a pipe. Cap'n Proto does not
attempt to specify a framing format for every situation. However, since byte streams are by far
the most common transmission medium, Cap'n Proto does define and implement a recommended framing
format for them.
When transmitting over a stream, the following should be sent. All integers are unsigned and
little-endian.
* (4 bytes) The number of segments, minus one (since there is always at least one segment).
* (N * 4 bytes) The size of each segment, in words.
* (0 or 4 bytes) Padding up to a multiple of words.
* The content of each segment, in order.
## Packing
For cases where bandwidth usage matters, Cap'n Proto defines a simple compression scheme called
"packing". This scheme is based on the observation that Cap'n Proto messages contain lots of
zero bytes: padding bytes, unset fields, and high-order bytes of small-valued integers.
In packed format, each word of the message is reduced to a tag byte followed by zero to eight
content bytes. The bits of the tag byte correspond to the bytes of the unpacked word, with the
least-significant bit corresponding to the first byte. Each zero bit indicates that the
corresponding byte is zero. The non-zero bytes are packed following the tag.
For example, here is some typical Cap'n Proto data (a struct pointer (offset = 2, data size = 3,
pointer count = 2) followed by a text pointer (offset = 6, length = 53)) and its packed form:
unpacked (hex): 08 00 00 00 03 00 02 00 19 00 00 00 aa 01 00 00
packed (hex): 51 08 03 02 31 19 aa 01
In addition to the above, there are two tag values which are treated specially: 0x00 and 0xff.
* 0x00: The tag is followed by a single byte which indicates a count of consecutive zero-valued
words, minus 1. E.g. if the tag 0x00 is followed by 0x05, the sequence unpacks to 6 words of
zero.
* 0xff: The tag is followed by the bytes of the word as described above, but after those bytes is
another byte with value N. Following that byte is N unpacked words that should be copied
directly. These unpacked words may or may not contain zeros -- it is up to the compressor to
decide when to end the unpacked span and return to packing each word. The purpose of this rule
is to minimize the impact of packing on data that doesn't contain any zeros -- in particular,
long text blobs. Because of this rule, the worst-case space overhead of packing is 2 bytes per
2 KiB of input (256 words = 2KiB).
## Compression
When Cap'n Proto messages may contain repetitive data (especially, large text blobs), it makes sense
to apply a standard compression algorithm in addition to packing. When CPU time is also still
important, we recommend Google's [Snappy](https://code.google.com/p/snappy/). Otherwise,
[zlib](http://www.zlib.net) is probably a good choice.
...@@ -4,7 +4,10 @@ layout: page ...@@ -4,7 +4,10 @@ layout: page
# Installation # Installation
## Cap'n Proto IS NOT READY ## Cap'n Proto is not ready yet
<a class="prominent_link" style="color: #fff"
href="https://groups.google.com/group/capnproto-announce">Sign Up for Updates</a>
As of this writing, Cap'n Proto is in the very early stages of development. It is still missing As of this writing, Cap'n Proto is in the very early stages of development. It is still missing
many essential features: many essential features:
...@@ -20,10 +23,13 @@ many essential features: ...@@ -20,10 +23,13 @@ many essential features:
end-to-end benchmarks by, like, 2x-5x. We can do better. end-to-end benchmarks by, like, 2x-5x. We can do better.
* **RPC:** The RPC protocol has not yet been specified, much less implemented. * **RPC:** The RPC protocol has not yet been specified, much less implemented.
* **Support for languages other than C++:** Hasn't been started yet. * **Support for languages other than C++:** Hasn't been started yet.
* Many other little things.
Therefore, these instructions are for those that would like to hack on Cap'n Proto. If that's you,
you should join the [discussion group](https://groups.google.com/group/capnproto)!
Therefore, you should only be installing Cap'n Proto at this time if you just want to play around Or, if you just want to know when it's ready, add yourself to the
with it or help develop it. If so, great! Please report your findings to the [announce list](https://groups.google.com/group/capnproto-announce).
[discussion group](https://groups.google.com/group/capnproto).
## Installing the Cap'n Proto Compiler ## Installing the Cap'n Proto Compiler
...@@ -58,7 +64,8 @@ changes (via inotify) and immediately rebuilds as necessary. Instant feedback i ...@@ -58,7 +64,8 @@ changes (via inotify) and immediately rebuilds as necessary. Instant feedback i
productivity, so I really like using Ekam. productivity, so I really like using Ekam.
Unfortunately it's very much unfinished. It works (for me), but it is quirky and rough around the Unfortunately it's very much unfinished. It works (for me), but it is quirky and rough around the
edges. It only works on Linux, and is best used together with Eclipse. edges. It only works on Linux, and is best used together with Eclipse. If you find it
unacceptable, scroll down to the Automake instructions, below.
The Cap'n Proto repo includes a script which will attempt to set up Ekam for you. The Cap'n Proto repo includes a script which will attempt to set up Ekam for you.
...@@ -74,6 +81,14 @@ Once Ekam is installed, you can do: ...@@ -74,6 +81,14 @@ Once Ekam is installed, you can do:
make -f Makefile.ekam continuous make -f Makefile.ekam continuous
This will build everything it can and run tests. If successful, the benchmarks will be built
and saved in `tmp/capnproto/benchmark`. Try running `tmp/capnproto/benchmark/runner`.
Note that Ekam will fail to build some things and output a bunch of error messages. You should
be able to ignore any errors that originate outside of the `capnproto` directory -- these are just
parts of other packages like Google Test that Ekam doesn't fully know how to build, but aren't
needed by Cap'n Proto anyway.
If you use Eclipse, you should use the Ekam Eclipse plugin to get build results fed back into your If you use Eclipse, you should use the Ekam Eclipse plugin to get build results fed back into your
editor. Build the plugin like so: editor. Build the plugin like so:
...@@ -113,3 +128,6 @@ If setting up Ekam is too much work for you, you can also build with Automake. ...@@ -113,3 +128,6 @@ If setting up Ekam is too much work for you, you can also build with Automake.
autoreconf -i autoreconf -i
./configure ./configure
make check make check
sudo make install
This will install libcapnproto.a in /usr/local/lib and headers in /usr/local/include/capnproto.
...@@ -54,7 +54,7 @@ Some notes: ...@@ -54,7 +54,7 @@ Some notes:
### Comments ### Comments
Comments are indicated by hash signs and extend to the end of the line; Comments are indicated by hash signs and extend to the end of the line:
{% highlight capnp %} {% highlight capnp %}
# This is a comment. # This is a comment.
...@@ -310,13 +310,3 @@ A protocol can be changed in the following ways without breaking backwards-compa ...@@ -310,13 +310,3 @@ A protocol can be changed in the following ways without breaking backwards-compa
Any other change should be assumed NOT to be safe. Also, these rules only apply to the Cap'n Proto Any other change should be assumed NOT to be safe. Also, these rules only apply to the Cap'n Proto
native encoding. It is sometimes useful to transcode Cap'n Proto types to other formats, like native encoding. It is sometimes useful to transcode Cap'n Proto types to other formats, like
JSON, which may have different rules (e.g., field names cannot change in JSON). JSON, which may have different rules (e.g., field names cannot change in JSON).
## Running the Compiler
Simply run:
capnpc person.capnp
This will create `person.capnp.h` and `person.capnp.c++` in the same directory as `person.capnp`.
_TODO: This will become more complicated later as we add support for more languages and such._
...@@ -14,3 +14,12 @@ Here are some misc planned / hoped-for features: ...@@ -14,3 +14,12 @@ Here are some misc planned / hoped-for features:
file descriptor across the socket. Once messages are being allocated in shared memory, RPCs file descriptor across the socket. Once messages are being allocated in shared memory, RPCs
can be initiated by merely signaling a [futex](http://man7.org/linux/man-pages/man2/futex.2.html) can be initiated by merely signaling a [futex](http://man7.org/linux/man-pages/man2/futex.2.html)
(on Linux, at least), which ought to be ridiculously fast. (on Linux, at least), which ought to be ridiculously fast.
* **Promise Pipelining:** When an RPC will return a reference to a new remote object, the client
will be able to initiate calls to the returned object before the initial RPC has actually
completed. Essentially, the client says to the server: "Call method `foo` of the object to be
returned by RPC id N." Obviously, if the original RPC fails, the dependent call also fails.
Otherwise, the server can start executing the dependent call as soon as the original call
completes, without the need for a network round-trip. In the object-capability programming
language <a href="http://en.wikipedia.org/wiki/E_(programming_language)">E</a> this is known as
[promise pipelining](http://en.wikipedia.org/wiki/Futures_and_promises#Promise_pipelining).
...@@ -277,7 +277,7 @@ Full-Width Styles ...@@ -277,7 +277,7 @@ Full-Width Styles
margin: 0 auto; margin: 0 auto;
} }
#forkme_banner { #discuss_banner {
display: block; display: block;
position: absolute; position: absolute;
top:0; top:0;
...@@ -285,6 +285,21 @@ Full-Width Styles ...@@ -285,6 +285,21 @@ Full-Width Styles
z-index: 10; z-index: 10;
padding: 10px 50px 10px 10px; padding: 10px 50px 10px 10px;
color: #fff; color: #fff;
background: url('../images/groups-logo.png') #0090ff no-repeat 95% 50%;
font-weight: 700;
box-shadow: 0 0 10px rgba(0,0,0,.5);
border-bottom-left-radius: 2px;
border-bottom-right-radius: 2px;
}
#forkme_banner {
display: block;
position: absolute;
top:0;
right: 230px;
z-index: 10;
padding: 10px 50px 10px 10px;
color: #fff;
background: url('../images/blacktocat.png') #0090ff no-repeat 95% 50%; background: url('../images/blacktocat.png') #0090ff no-repeat 95% 50%;
font-weight: 700; font-weight: 700;
box-shadow: 0 0 10px rgba(0,0,0,.5); box-shadow: 0 0 10px rgba(0,0,0,.5);
...@@ -292,6 +307,22 @@ Full-Width Styles ...@@ -292,6 +307,22 @@ Full-Width Styles
border-bottom-right-radius: 2px; border-bottom-right-radius: 2px;
} }
.prominent_link {
display: block;
float: right;
z-index: 10;
padding: 10px 50px 10px 10px;
color: #fff;
background: url('../images/groups-logo.png') #0090ff no-repeat 95% 50%;
background-color: #0090ff;
font-weight: 700;
box-shadow: 0 0 10px rgba(0,0,0,.5);
border-top-left-radius: 2px;
border-top-right-radius: 2px;
border-bottom-left-radius: 2px;
border-bottom-right-radius: 2px;
}
#header_wrap { #header_wrap {
background: #212121; background: #212121;
background: -moz-linear-gradient(top, #373737, #212121); background: -moz-linear-gradient(top, #373737, #212121);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment