Commit 274088f7 authored by Kenton Varda's avatar Kenton Varda

Document new features in 0.5: Generics and canonicalization.

parent 4c7ae535
......@@ -4,7 +4,10 @@ title: Cap'n Proto, FlatBuffers, and SBE
author: kentonv
---
**Update:** I have made [some corrections](https://github.com/kentonv/capnproto/commit/e4e6c9076ae16804c07968cd3bdf6107155df7ee) since the original version of this post.
**Update Jun 18, 2014:** I have made [some corrections](https://github.com/kentonv/capnproto/commit/e4e6c9076ae16804c07968cd3bdf6107155df7ee) since the original version of this post.
**Update Dec 15, 2014:** Updated to reflect that Cap'n Proto 0.5 now supports Visual Studio and that
Java is now well-supported.
Yesterday, some engineers at Google released [FlatBuffers](http://google-opensource.blogspot.com/2014/06/flatbuffers-memory-efficient.html), a new serialization protocol and library with similar design principles to Cap'n Proto. Also, a few months back, Real Logic released [Simple Binary Encoding](http://mechanical-sympathy.blogspot.com/2014/05/simple-binary-encoding.html), another protocol and library of this nature.
......@@ -37,13 +40,16 @@ Note: For features which are properties of the implementation rather than the pr
<tr><td>Padding takes space on wire?</td><td class="pass">no</td><td class="warn">optional</td><td class="fail">yes</td><td class="fail">yes</td></tr>
<tr><td>Unset fields take space on wire?</td><td class="pass">no</td><td class="fail">yes</td><td class="fail">yes</td><td class="pass">no</td></tr>
<tr><td>Pointers take space on wire?</td><td class="pass">no</td><td class="fail">yes</td><td class="pass">no</td><td class="fail">yes</td></tr>
<tr><td>C++</td><td class="pass">yes</td><td class="warn">GCC/Clang<br>(no MSVC)</td><td class="pass">yes</td><td class="pass">yes</td></tr>
<tr><td>Java</td><td class="pass">yes</td><td class="warn">in progress</td><td class="pass">yes</td><td class="pass">yes</td></tr>
<tr><td>Python</td><td class="pass">yes</td><td class="warn">yes (C ext)</td><td class="fail">no</td><td class="fail">no</td></tr>
<tr><td>Other languages</td><td class="pass">lots!</td><td class="warn">6+ others</td><td class="warn">C#</td><td class="fail">no</td></tr>
<tr><td>C++</td><td class="pass">yes</td><td class="pass">yes (C++11)*</td><td class="pass">yes</td><td class="pass">yes</td></tr>
<tr><td>Java</td><td class="pass">yes</td><td class="pass">yes*</td><td class="pass">yes</td><td class="pass">yes</td></tr>
<tr><td>C#</td><td class="pass">yes</td><td class="pass">yes*</td><td class="pass">yes</td><td class="pass">yes*</td></tr>
<tr><td>Go</td><td class="pass">yes</td><td class="pass">yes</td><td class="fail">no</td><td class="pass">yes*</td></tr>
<tr><td>Other languages</td><td class="pass">lots!</td><td class="warn">6+ others*</td><td class="fail">no</td><td class="fail">no</td></tr>
<tr><td>Authors' preferred use case</td><td>distributed<br>computing</td><td><a href="https://sandstorm.io">platforms /<br>sandboxing</a></td><td>financial<br>trading</td><td>games</td></tr>
</table>
\* Updated Dec 15, 2014 (Cap'n Proto 0.5.0).
**Schema Evolution**
All four protocols allow you to add new fields to a schema over time, without breaking backwards-compatibility. New fields will be ignored by old binaries, and new binaries will fill in a default value when reading old data.
......@@ -172,9 +178,19 @@ FlatBuffers also uses pointers, even though most objects are variable-width, pos
**Platform Support**
A really huge weakness of Cap'n Proto today is that it doesn't compile in Visual Studio, and therefore effectively doesn't support Windows (unless you count Cygwin, but most people don't).
As of Dec 15, 2014, Cap'n Proto supports a superset of the languages supported by FlatBuffers and
SBE, but is still far behind Protocol Buffers.
While Cap'n Proto C++ is well-supported on POSIX platforms using GCC or Clang as their compiler,
Cap'n Proto has only limited support for Visual C++: the basic serialization library works, but
reflection and RPC do not yet work. Support will be expanded once Visual Studio's C++ compiler
completes support for C++11.
In comparison, SBE and FlatBuffers have reflection interfaces that work in Visual C++, though
neither one has built-in RPC. Reflection is critical for certain use cases, but the majority of
users won't need it.
The problem initially was that Cap'n Proto makes liberal use of C++11 features, and MSVC has lagged behind in implementing them. We considered, but balked at, forking and backporting. It looks like [VS14 CTP1](http://blogs.msdn.com/b/vcblog/archive/2014/06/03/visual-studio-14-ctp.aspx) may finally be far enough to make a port practical, so now it's just a matter of finding the time. But that's a new problem: all my time is currently consumed by [Sandstorm.io](https://sandstorm.io), and while it is a major use of Cap'n Proto and will thus drive further development, it is entirely Linux-based, making it hard for me to justify spending my time on Windows support. What we need is a volunteer!
(This section has been updated. When originally written, Cap'n Proto did not support MSVC at all.)
### Benchmarks?
......
---
layout: post
title: "Cap'n Proto 0.5: Generics, Visual C++, Java, C#, Sandstorm.io"
author: kentonv
---
Today we're releasing Cap'n Proto 0.5. We've added lots of goodies!
### Finally: Visual Studio
Microsoft Visual Studio 2015 (currently in "preview") finally supports enough C++11 to get Cap'n
Proto working, and we've duly added official support for it!
Not all features are support yet. The core serialization functionality sufficient for 90% of users
is available, but reflection and RPC APIs are not. We will turn on these APIs as soon as Visual C++
is ready (the main blocker is incomplete `constexpr` support).
As part of this, we now support CMake as a build system, and it can be used on Unix as well.
In related news, for Windows users not interested in C++ but who need the Cap'n Proto tools for
other languages, we now provide precompiled Windows binaries. See
[the installation page]({{site.baseurl}}install.html).
I'd like to thank [Bryan Boreham](https://github.com/bboreham),
[Joshua Warner](https://github.com/joshuawarner32), and [Phillip Quinn](https://github.com/pqu) for
their help in getting this working.
### C#, Java
While not strictly part of this release, our two biggest missing languages recently gained support
for Cap'n Proto:
* [Marc Gravell](https://github.com/mgravell) -- the man responsible for the most popular C#
implementation of Protobufs -- has now implemented
[Cap'n Proto in C#](https://github.com/mgravell/capnproto-net).
* [David Renshaw](https://github.com/dwrensha), author of our existing Rust implementation and
[Sandstorm.io](https://sandstorm.io) core developer, has implemented
[Cap'n Proto in Java](https://github.com/dwrensha/capnproto-java).
### Generics
Cap'n Proto now supports [generics](http://localhost:4000/capnproto/language.html#generic-types),
in the sense of Java generics or C++ templates. While working on
[Sandstorm.io](https://sandstorm.io) we frequently found that we wanted this, and it turned out
to be easy to support.
This is a feature which Protocol Buffers does not support and likely never will. Cap'n Proto has a
much easier time supporting exotic language features because the generated code is so simple. In
C++, nearly all Cap'n Proto generated code is inline accessor methods, which can easily become
templates. Protocol Buffers, in contrast, has generated parse and serialize functions and a host
of other auxiliary stuff, which is too complex to inline and thus would need to be adapted to
generics without using C++ templates. This would get ugly fast.
Generics are not yet supported by all Cap'n Proto language implementations, but where they are not
supported, things degrade gracefully: all type parameters simply become `AnyPointer`. You can still
use generics in your schemas as documentation. Meanwhile, at least our C++, Java, and Python
implementations have already been updated to support generics, and other implementations that
wrap the C++ reflection API are likely to work too.
### Canonicalization
0.5 introduces a (backwards-compatible) change in
[the way struct lists should be encoded](http://localhost:4000/capnproto/encoding.html#lists), in
order to support [canonicalization](http://localhost:4000/capnproto/encoding.html#canonicalization).
We believe this will make Cap'n Proto more appropriate for use in cryptographic protocols. If
you've implemented Cap'n Proto in another language, please update your code!
### Sandstorm and Capability Systems
[Sandstorm.io](https://sandstorm.io) is Cap'n Proto's parent project: a platform for personal
servers that is radically easier and more secure.
Cap'n Proto RPC is the underlying communications layer powering Sandstorm. Sandstorm is a
[capability system](http://www.erights.org/elib/capability/overview.html): applications can send
each other object references and address messages to those objects. Messages can themselves contain
new object references, and the recipient implicitly gains permission to use any object reference
they receive. Essentially, Sandstorm allows the interfaces between two apps, or between and app
and the platform, to be designed using the same vocabulary as interfaces between objects or
libraries in an object-oriented programming language (but
[without the mistakes of CORBA or DCOM](http://localhost:4000/capnproto/rpc.html#distributed-objects)).
Cap'n Proto RPC is at the core of this.
This has powerful implications: Consider the case of service discovery. On Sandstorm, all
applications start out isolated from each other in secure containers. However, applications can
(or, will be able to) publish Cap'n Proto object references to the system representing APIs they
support. Then, another app can make a request to the system, saying "I need an object that
implements interface Foo". At this point, the system can display a picker UI to the user,
presenting all objects the user owns that satisfy the requirement. However, the requesting app only
ever receives a reference to the object the user chooses; all others remain hidden. Thus, security
becomes "automatic". The user does not have to edit an ACL on the providing app, nor copy around
credentials, nor even answer any security question at all; it all derives automatically and
naturally from the user's choices. We call this interface "The Powerbox".
Moreover, because Sandstorm is fully aware of the object references held by every app, it will
be able to display a visualization of these connections, allowing a user to quickly see which of
their apps have access to each other and even revoke connections that are no longer desired with
a mouse click.
Cap'n Proto 0.5 introduces primitives to support "persistent" capabilities -- that is, the ability
to "save" an object reference to disk and then restore it later, on a different connection.
Obviously, the features described above totally depend on this feature.
The next release of Cap'n Proto is likely to include another feature essential for Sandstorm: the
ability to pass capabilities from machine to machine and have Cap'n Proto automatically form direct
connections when you do. This allows servers running on different machines to interact with each
other in a completely object-oriented way. Instead of passing around URLs (which necessitate a
global namespace, lifetime management, firewall traversal, and all sorts of other obstacles), you
can pass around capabilities and not worry about it. This will be central to Sandstorm's strategies
for federation and cluster management.
### Other notes
* The C++ RPC code now uses `epoll` on Linux.
* We now test Cap'n Proto on Android and MinGW, in addition to Linux, Mac OSX, Cygwin, and Visual
Studio. (iOS and FreeBSD are also reported to work, though are not yet part of our testing
process.)
......@@ -371,6 +371,46 @@ implicitly convertible in this way. Unfortunately, this trick doesn't work on G
[Interfaces (RPC) have their own page.](cxxrpc.html)
### Generics
[Generic types](language.html#generic-types) become templates in C++. The outer type (the one whose
name matches the schema declaration's name) is templatized; the inner `Reader` and `Builder` types
are not, because they inherit the parameters from the outer type. Similarly, template parameters
should refer to outer types, not `Reader` or `Builder` types.
For example, given:
{% highlight capnp %}
struct Map(Key, Value) {
entries @0 :List(Entry);
struct Entry {
key @0 :Key;
value @1 :Value;
}
}
struct People {
byName @0 :Map(Text, Person);
# Maps names to Person instances.
}
{% endhighlight %}
You might write code like:
{% highlight c++ %}
void processPeople(People::Reader people) {
Map<Text, Person>::Reader reader = people.getByName();
capnp::List<Map<Text, Person>::Entry>::Reader entries =
reader.getEntries()
for (auto entry: entries) {
processPerson(entry);
}
}
{% endhighlight %}
Note that all template parameters will be specified with a default value of `AnyPointer`.
Therefore, the type `Map<>` is equivalent to `Map<capnp::AnyPointer, capnp::AnyPointer>`.
### Constants
Constants are exposed with their names converted to UPPERCASE_WITH_UNDERSCORES naming style
......
......@@ -278,6 +278,9 @@ auto promise3 = promise2.then(
});
{% endhighlight %}
For [generic methods](language.html#generic-methods), the `fooRequest()` method will be a template;
you must explicitly specify type parameters.
### Servers
The generated `Server` type is an abstract interface which may be subclassed to implement a
......@@ -315,6 +318,11 @@ private:
};
{% endhighlight %}
On the server side, [generic methods](language.html#generic-methods) are NOT templates. Instead,
the generated code is exactly as if all of the generic parameters were bound to `AnyPointer`. The
server generally does not get to know exactly what type the client requested; it must be designed
to be correct for any parameterization.
## Initializing RPC
Cap'n Proto makes it easy to start up an RPC client or server using the "EZ RPC" classes,
......
......@@ -84,6 +84,41 @@ The built-in blob types are encoded as follows:
Note that the NUL terminator is included in the size sent on the wire, but the runtime library
should not count it in any size reported to the application.
### Structs
A struct value is encoded as a pointer to its content. The content is split into two sections:
data and pointers, with the pointer section appearing immediately after the data section. This
split allows structs to be traversed (e.g., copied) without knowing their type.
A struct pointer looks like this:
lsb struct pointer msb
+-+-----------------------------+---------------+---------------+
|A| B | C | D |
+-+-----------------------------+---------------+---------------+
A (2 bits) = 0, to indicate that this is a struct pointer.
B (30 bits) = Offset, in words, from the end of the pointer to the
start of the struct's data section. Signed.
C (16 bits) = Size of the struct's data section, in words.
D (16 bits) = Size of the struct's pointer section, in words.
Fields are positioned within the struct according to an algorithm with the following principles:
* The position of each field depends only on its definition and the definitions of lower-numbered
fields, never on the definitions of higher-numbered fields. This ensures backwards-compatibility
when new fields are added.
* Due to alignment reqirements, fields in the data section may be separated by padding. However,
later-numbered fields may be positioned into the padding left between earlier-numbered fields.
Because of this, a struct will never contain more than 63 bits of padding. Since objects are
rounded up to a whole number of words anyway, padding never ends up wasting space.
* Unions and groups need not occupy contiguous memory. Indeed, they may have to be split into
multiple slots if new fields are added later on.
Field offsets are computed by the Cap'n Proto compiler. The precise algorithm is too complicated
to describe here, but you need not implement it yourself, as the compiler can produce a compiled
schema format which includes offset information.
### Lists
A list value is encoded as a pointer to a flat array of values.
......@@ -111,13 +146,6 @@ A list value is encoded as a pointer to a flat array of values.
The pointed-to values are tightly-packed. In particular, `Bool`s are packed bit-by-bit in
little-endian order (the first bit is the least-significant bit of the first byte).
Lists of structs use the smallest element size in which the struct can fit. So, a
list of structs that each contain two `UInt8` fields and nothing else could be encoded with C = 3
(2-byte elements). A list of structs that each contain a single `Text` field would be encoded as
C = 6 (pointer elements). A list of structs that each contain a single `Bool` field would be
encoded using C = 1 (1-bit elements). A list of structs which are each more than one word in size
must be encoded using C = 7 (composite).
When C = 7, the elements of the list are fixed-width composite values -- usually, structs. In
this case, the list content is prefixed by a "tag" word that describes each individual element.
The tag has the same layout as a struct pointer, except that the pointer offset (B) instead
......@@ -133,47 +161,18 @@ In the future, we could consider implementing matrixes using the "composite" ele
elements being fixed-size lists rather than structs. In this case, the tag would look like a list
pointer rather than a struct pointer. As of this writing, no such feature has been implemented.
Notice that because a small struct is encoded as if it were a primitive value, this means that
if you have a field of type `List(T)` where `T` is a primitive or blob type, it
is possible to change that field to `List(U)` where `U` is a struct whose `@0` field has type `T`,
without breaking backwards-compatibility. This comes in handy when you discover too late that you
need to associate some extra data with each value in a primitive list -- instead of using parallel
lists (eww), you can just replace it with a struct list.
### Structs
A struct value is encoded as a pointer to its content. The content is split into two sections:
data and pointers, with the pointer section appearing immediately after the data section. This
split allows structs to be traversed (e.g., copied) without knowing their type.
A struct pointer looks like this:
lsb struct pointer msb
+-+-----------------------------+---------------+---------------+
|A| B | C | D |
+-+-----------------------------+---------------+---------------+
A (2 bits) = 0, to indicate that this is a struct pointer.
B (30 bits) = Offset, in words, from the end of the pointer to the
start of the struct's data section. Signed.
C (16 bits) = Size of the struct's data section, in words.
D (16 bits) = Size of the struct's pointer section, in words.
Fields are positioned within the struct according to an algorithm with the following principles:
* The position of each field depends only on its definition and the definitions of lower-numbered
fields, never on the definitions of higher-numbered fields. This ensures backwards-compatibility
when new fields are added.
* Due to alignment reqirements, fields in the data section may be separated by padding. However,
later-numbered fields may be positioned into the padding left between earlier-numbered fields.
Because of this, a struct will never contain more than 63 bits of padding. Since objects are
rounded up to a whole number of words anyway, padding never ends up wasting space.
* Unions and groups need not occupy contiguous memory. Indeed, they may have to be split into
multiple slots if new fields are added later on.
Field offsets are computed by the Cap'n Proto compiler. The precise algorithm is too complicated
to describe here, but you need not implement it yourself, as the compiler can produce a compiled
schema format which includes offset information.
A struct list must always be written using C = 7. However, a list of any element size (except
C = 1, i.e. 1-bit) may be *decoded* as a struct list, with each element being interpreted as being
a prefix of the struct data. For instance, a list of 2-byte values (C = 3) can be decoded as a
struct list where each struct has 2 bytes in their "data" section (and an empty pointer section). A
list of pointer values (C = 6) can be decoded as a struct list where each sturct has a pointer
section with one pointer (and an empty data section). The purpose of this rule is to make it
possible to upgrade a list of primitives to a list of structs, as described under the
[protocol evolution rules](http://localhost:4000/capnproto/language.html#evolving-your-protocol).
(We make a special exception that boolean lists cannot be upgraded in this way due to the
unreasonable implementation burden.) Note that even though struct lists can be decoded from any
element size (except C = 1), it is NOT permitted to encode a struct list using any type other than
C = 7 because doing so would interfere with the [canonicalization algorithm](#canonicalization).
#### Default Values
......@@ -329,27 +328,71 @@ section.
### Compression
When Cap'n Proto messages may contain repetitive data (especially, large text blobs), it makes sense
to apply a standard compression algorithm in addition to packing. When CPU time is scarce, we
recommend Google's [Snappy](https://code.google.com/p/snappy/). Otherwise,
[zlib](http://www.zlib.net) is slower but will compress more.
## Security Notes
A naive implementation of a Cap'n Proto reader may be vulnerable to DoS attacks based on two types
of malicious input:
* A message containing cyclic (or even just overlapping) pointers can cause the reader to go into
an infinite loop while traversing the content.
* A message with deeply-nested objects can cause a stack overflow in typical code which processes
messages recursively.
To defend against these attacks, every Cap'n Proto implementation should implement the following
restrictions by default:
* As the application traverses the message, each time a pointer is dereferenced, a counter should
be incremented by the size of the data to which it points. If this counter goes over some limit,
an error should be raised, and/or default values should be returned. The C++ implementation
currently defaults to a limit of 64MiB, but allows the caller to set a different limit if desired.
* As the application traverses the message, the pointer depth should be tracked. Again, if it goes
over some limit, an error should be raised. The C++ implementation currently defaults to a limit
of 64 pointers, but allows the caller to set a different limit.
to apply a standard compression algorithm in addition to packing. When CPU time is scarce, we
recommend [LZ4 compression](https://code.google.com/p/lz4/). Otherwise, [zlib](http://www.zlib.net)
is slower but will compress more.
## Canonicalization
Cap'n Proto messages have a well-defined canonical form. Cap'n Proto encoders are NOT required to
output messages in canonical form, and in fact they will almost never do so by default. However,
it is possible to write code which canonicalizes a Cap'n Proto message without knowing its schema.
A canonical Cap'n Proto message must adhere to the following rules:
* The object tree must be encoded in preorder (with respect to the order of the pointers within
each object).
* The message must be encoded as a single segment. (When signing or hashing a canonical Cap'n Proto
message, the segment table shall not be included, because it would be redundant.)
* Trailing zero-valued words in a struct's data or pointer segments must be truncated. Since zero
represents a default value, this does not change the struct's meaning. This rule is important
to ensure that adding a new field to a struct does not affect the canonical encoding of messages
that do not set that field.
* Similarly, for a struct list, if a trailing word in a section of all structs in the list is zero,
then it must be truncated from all structs in the list. (All structs in a struct list must have
equal sizes, hence a trailing zero can only be removed if it is zero in all elements.)
* Canonical messages are not packed. However, packing can still be applied for transmission
purposes; the message must simply be unpacked before checking signatures.
Note that Cap'n Proto 0.5 introduced the rule that struct lists must always be encoded using
C = 7 in the [list pointer](#lists). Prior versions of Cap'n Proto allowed struct lists to be
encoded using any element size, so that small structs could be compacted to take less that a word
per element, and many encoders in fact implemented this. Unfortunately, this "optimization" made
canonicalization impossible without knowing the schema, which is a significant obstacle. Therefore,
the rules have been changed in 0.5, but data written by previous versions may not be possible to
canonicalize.
## Security Considerations
A naive implementation of a Cap'n Proto reader may be vulnerable to attacks based on various kinds
of malicious input. Implementations MUST guard against these.
### Pointer Validation
Cap'n Proto readers must validate pointers, e.g. to check that the target object is within the
bounds of its segment. To avoid an upfront scan of the message (which would defeat Cap'n Proto's
O(1) parsing performance), validation should occur lazily when the getter method for a pointer is
called, throwing an exception or returning a default value if the pointer is invalid.
### Amplification attack
A message containing cyclic (or even just overlapping) pointers can cause the reader to go into
an infinite loop while traversing the content.
To defend against this, as the application traverses the message, each time a pointer is
dereferenced, a counter should be incremented by the size of the data to which it points. If this
counter goes over some limit, an error should be raised, and/or default values should be returned.
The C++ implementation currently defaults to a limit of 64MiB, but allows the caller to set a
different limit if desired. Another reasonable strategy is to set the limit to some multiple of
the original message size; however, most applications should place limits on overall message sizes
anyway, so it makes sense to have one check cover both.
### Stack overflow DoS attack
A message with deeply-nested objects can cause a stack overflow in typical code which processes
messages recursively.
To defend against this, as the application traverses the message, the pointer depth should be
tracked. If it goes over some limit, an error should be raised. The C++ implementation currently
defaults to a limit of 64 pointers, but allows the caller to set a different limit.
......@@ -284,6 +284,8 @@ group" in Cap'n Proto, which was the case that got into the most trouble with Pr
A struct may have a field with type `AnyPointer`. This field's value can be of any pointer type --
i.e. any struct, interface, list, or blob. This is essentially like a `void*` in C.
See also [generics](#generic-types).
### Enums
An enum is a type with a small finite set of symbolic values.
......@@ -352,6 +354,126 @@ it very easy to develop secure protocols with Cap'n Proto -- you almost don't ne
access control at all. This feature is what makes Cap'n Proto a "capability-based" RPC system -- a
reference to an object inherently represents a "capability" to access it.
### Generic Types
A struct or interface type may be parameterized, making it "generic". For example, this is useful
for defining type-safe containers:
{% highlight capnp %}
struct Map(Key, Value) {
entries @0 :List(Entry);
struct Entry {
key @0 :Key;
value @1 :Value;
}
}
struct People {
byName @0 :Map(Text, Person);
# Maps names to Person instances.
}
{% endhighlight %}
Cap'n Proto generics work very similarly to Java generics or C++ templates. Some notes:
* Only pointer types (structs, lists, blobs, and interfaces) can be used as generic parameters,
much like in Java. This is a pragmatic limitation: allowing parameters to have non-pointer types
would mean that different parameterizations of a struct could have completely different layouts,
which would excessively complicate the Cap'n Proto implementation.
* A type declaration nested inside a generic type may use the type parameters of the outer type,
as you can see in the example above. This differs from Java, but matches C++. If you want to
refer to a nested type from outside the outer type, you must specify the parameters on the outer
type, not the inner. For example, `Map(Text, Person).Entry` is a valid type;
`Map.Entry(Text, Person)` is NOT valid. (Of course, an inner type may declare additional generic
parameters.)
* If you refer to a generic type but omit its parameters (e.g. declare a field of type `Map` rather
than `Map(T, U)`), it is as if you specified `AnyPointer` for each parameter. Note that such
a type is wire-compatible with any specific parameterization, so long as you interpret the
`AnyPointer`s as the correct type at runtime.
* Relatedly, it is safe to cast an generic interface of a specific parameterization to a generic
interface where all parameters are `AnyPointer` and vice versa, as long as the `AnyPointer`s are
treated as the correct type at runtime. This means that e.g. you can implement a server in a
generic way that is correct for all parameterizations but call it from clients using a specific
parameterization.
* The encoding of a generic type is exactly the same as the encoding of a type produced by
substituting the type parameters manually. For example, `Map(Text, Person)` is encoded exactly
the same as:
<div>{% highlight capnp %}
struct PersonMap {
# Encoded the same as Map(Text, Person).
entries @0 :List(Entry);
struct Entry {
key @0 :Text;
value @1 :Person;
}
}
{% endhighlight %}
</div>
Therefore, it is possible to upgrade non-generic types to generic types while retaining
backwards-compatibility.
* Similarly, a generic interface's protocol is exactly the same as the interface obtained by
manually substituting the generic parameters.
### Generic Methods
Interface methods may also have "implicit" generic parameters that apply to a particular method
call. This commonly applies to "factory" methods. For example:
{% highlight capnp %}
interface Assignable(T) {
# A generic interface, with non-generic methods.
get @0 () -> (value :T);
set @1 (value :T) -> ();
}
interface AssignableFactory {
newAssignable @0 [T] (initialValue :T)
-> (assignable :Assignable(T));
# A generic method.
}
{% endhighlight %}
Here, the method `newAssignable()` is generic. The return type of the method depends on the input
type.
Ideally, calls to a generic method should not have to explicitly specify the method's type
parameters, because they should be inferred from the types of the method's regular parameters.
However, this may not always be possible; it depends on the programming language and API details.
Note that if a method's generic parameter is used only in its returns, not its parameters, then
this implies that the returned value is appropriate for any parameterization. For example:
{% highlight capnp %}
newUnsetAssignable @1 [T] () -> (assignable :Assignable(T));
# Create a new assignable. `get()` on the returned object will
# throw an exception until `set()` has been called at least once.
{% endhighlight %}
Because of the way this method is designed, the returned `Assignable` is initially valid for any
`T`. Effectively, it doesn't take on a type until the first time `set()` is called, and then `T`
retroactively becomes the type of value passed to `set()`.
In contrast, if it's the case that the returned type is unknown, then you should NOT declare it
as generic. Instead, use `AnyPointer`, or omit a type's parameters (since they default to
`AnyPointer`). For example:
{% highlight capnp %}
getNamedAssignable @2 (name :Text) -> (assignable :Assignable);
# Get the `Assignable` with the given name. It is the
# responsibility of the caller to keep track of the type of each
# named `Assignable` and cast the returned object appropriately.
{% endhighlight %}
Here, we omitted the parameters to `Assignable` in the return type, because the returned object
has a specific type parameterization but it is not locally knowable.
### Constants
You can define constants in Cap'n Proto. These don't affect what is sent on the wire, but they
......@@ -577,34 +699,82 @@ are much more likely.
## Evolving Your Protocol
A protocol can be changed in the following ways without breaking backwards-compatibility:
A protocol can be changed in the following ways without breaking backwards-compatibility, and
without changing the [canonical](encoding.html#canonicalization) encoding of a message:
* New types, constants, and aliases can be added anywhere, since they obviously don't affect the
encoding of any existing type.
* New fields, enumerants, and methods may be added to structs, enums, and interfaces, respectively,
as long as each new member's number is larger than all previous members. Similarly, new fields
may be added to existing groups and unions.
* New parameters may be added to a method. The new parameters must be added to the end of the
parameter list and must have default values.
* Members can be re-arranged in the source code, so long as their numbers stay the same.
* Any symbolic name can be changed, as long as the type ID / ordinal numbers stay the same. Note
that type declarations have an implicit ID generated based on their name and parent's ID, but
you can use `capnp compile -ocapnp myschema.capnp` to find out what that number is, and then
declare it explicitly after your rename.
* Types definitions can be moved to different scopes, as long as the type ID is declared
* Type definitions can be moved to different scopes, as long as the type ID is declared
explicitly.
* A field can be moved into a group or a union, as long as the group/union and all other fields
within it are new. In other words, a field can be replaced with a group or union containing an
equivalent field and some new fields.
* A non-generic type can be made [generic](#generic-types), and new generic parameters may be
added to an existing generic type. Other types used inside the body of the newly-generic type can
be replaced with the new generic parameter so long as all existing users of the type are updated
to bind that generic parameter to the type it replaced. For example:
<div>{% highlight capnp %}
struct Map {
entries @0 :List(Entry);
struct Entry {
key @0 :Text;
value @1 :Text;
}
}
{% endhighlight %}
</div>
Can change to:
<div>{% highlight capnp %}
struct Map(Key, Value) {
entries @0 :List(Entry);
struct Entry {
key @0 :Key;
value @1 :Value;
}
}
{% endhighlight %}
</div>
As long as all existing uses of `Map` are replaced with `Map(Text, Text)` (and any uses of
`Map.Entry` are replaced with `Map(Text, Text).Entry`).
(This rule applies analogously to generic methods.)
The following changes are backwards-compatible but may change the canonical encoding of a mesasge.
Apps that rely on canonicalization (such as some cryptographic protocols) should avoid changes in
this list, but most apps can safely use them:
* A field of type `List(T)`, where `T` is a primitive type, blob, or list, may be changed to type
`List(U)`, where `U` is a struct type whose `@0` field is of type `T`. This rule is useful when
you realize too late that you need to attach some extra data to each element of your list.
Without this rule, you would be stuck defining parallel lists, which are ugly and error-prone.
* A field can be moved into a group or a union, as long as the group/union and all other fields
within it are new. In other words, a field can be replaced with a group or union containing an
equivalent field and some new fields.
As a special exception to this rule, `List(Bool)` may **not** be upgraded to a list of structs,
because implementing this for bit lists has proven unreasonably expensive.
Any other change should be assumed NOT to be safe. In particular:
Any change not listed above should be assumed NOT to be safe. In particular:
* You cannot change a field, method, or enumerant's number.
* You cannot change a field or method parameter's type or default value, except as described above.
* You cannot change a field or method parameter's type or default value.
* You cannot change a type's ID.
* You cannot change the name of a type that doesn't have an explicit ID, as the implicit ID is
generated based in part on the type name.
......
......@@ -163,12 +163,31 @@ No!
CORBA failed for many reasons, with the usual problems of design-by-committee being a big one.
However, CORBA also had a critical technical flaw: it did not implement promise pipelining. As
shown above, promise pipelining is absolutely critical to making object-oriented interfaces work
in the presence of latency. It is often said that object- and RPC-oriented protocols don't work
because they try to pretend that a network call is equivalent to a local call. In reality, this
is not actually a problem with object protocols in general, but specifically CORBA and
similarly-naive protocols that lack promise pipelining. Promise pipelining is the missing link.
However, the biggest reason for CORBA's failure is that it tried to make remote calls look the
same as local calls. Cap'n Proto does NOT do this -- remote calls have a different kind of API
involving promises, and accounts for the presence of a network introducing latency and
unreliability.
As shown above, promise pipelining is absolutely critical to making object-oriented interfaces work
in the presence of latency. If remote calls look the same as local calls, there is no opportunity
to introduce promise pipelining, and latency is inevitable. Any distributed object protocol which
does not support promise pipelining cannot -- and should not -- succeed. Thus the failure of CORBA
(and DCOM, etc.) was inevitable, but Cap'n Proto is different.
### Handling disconnects
Networks are unreliable. Occasionally, connections will be lost. When this happens, all
capabilities (object references) served by the connection will become disconnected. Any further
calls addressed to these capabilities will throw "disconnected" exceptions. When this happens, the
client will need to create a new connection and try again. All Cap'n Proto applications with
long-running connections (and probably short-running ones too) should be prepared to catch
"disconnected" exceptions and respond appropriately.
On the server side, when all references to an object have been "dropped" (either because the
clients explicitly dropped them or because they became disconnected), the object will be closed
(in C++, the destructor is called; in GC'd languages, a `close()` method is called). This allows
servers to easily allocate per-client resources without having to clean up on a timeout or risk
leaking memory.
### Security
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment