Document new features in 0.5: Generics and canonicalization.

274088f7 · Kenton Varda · 4c7ae535 · 274088f7 · 274088f7 · 274088f7
Commit 274088f7 authored Dec 14, 2014 by Kenton Varda
7 changed files
--- a/doc/_posts/2014-06-17-capnproto-flatbuffers-sbe.md
+++ b/doc/_posts/2014-06-17-capnproto-flatbuffers-sbe.md
@@ -4,7 +4,10 @@ title: Cap'n Proto, FlatBuffers, and SBE
 author: kentonv
 ---

-**Update:** I have made [some corrections](https://github.com/kentonv/capnproto/commit/e4e6c9076ae16804c07968cd3bdf6107155df7ee) since the original version of this post.
+**Update Jun 18, 2014:** I have made [some corrections](https://github.com/kentonv/capnproto/commit/e4e6c9076ae16804c07968cd3bdf6107155df7ee) since the original version of this post.
+
+**Update Dec 15, 2014:** Updated to reflect that Cap'n Proto 0.5 now supports Visual Studio and that
+Java is now well-supported.

 Yesterday, some engineers at Google released [FlatBuffers](http://google-opensource.blogspot.com/2014/06/flatbuffers-memory-efficient.html), a new serialization protocol and library with similar design principles to Cap'n Proto. Also, a few months back, Real Logic released [Simple Binary Encoding](http://mechanical-sympathy.blogspot.com/2014/05/simple-binary-encoding.html), another protocol and library of this nature.

@@ -37,13 +40,16 @@ Note: For features which are properties of the implementation rather than the pr
 <tr><td>Padding takes space on wire?</td><td class="pass">no</td><td class="warn">optional</td><td class="fail">yes</td><td class="fail">yes</td></tr>
 <tr><td>Unset fields take space on wire?</td><td class="pass">no</td><td class="fail">yes</td><td class="fail">yes</td><td class="pass">no</td></tr>
 <tr><td>Pointers take space on wire?</td><td class="pass">no</td><td class="fail">yes</td><td class="pass">no</td><td class="fail">yes</td></tr>
-<tr><td>C++</td><td class="pass">yes</td><td class="warn">GCC/Clang<br>(no MSVC)</td><td class="pass">yes</td><td class="pass">yes</td></tr>
-<tr><td>Java</td><td class="pass">yes</td><td class="warn">in progress</td><td class="pass">yes</td><td class="pass">yes</td></tr>
-<tr><td>Python</td><td class="pass">yes</td><td class="warn">yes (C ext)</td><td class="fail">no</td><td class="fail">no</td></tr>
-<tr><td>Other languages</td><td class="pass">lots!</td><td class="warn">6+ others</td><td class="warn">C#</td><td class="fail">no</td></tr>
+<tr><td>C++</td><td class="pass">yes</td><td class="pass">yes (C++11)*</td><td class="pass">yes</td><td class="pass">yes</td></tr>
+<tr><td>Java</td><td class="pass">yes</td><td class="pass">yes*</td><td class="pass">yes</td><td class="pass">yes</td></tr>
+<tr><td>C#</td><td class="pass">yes</td><td class="pass">yes*</td><td class="pass">yes</td><td class="pass">yes*</td></tr>
+<tr><td>Go</td><td class="pass">yes</td><td class="pass">yes</td><td class="fail">no</td><td class="pass">yes*</td></tr>
+<tr><td>Other languages</td><td class="pass">lots!</td><td class="warn">6+ others*</td><td class="fail">no</td><td class="fail">no</td></tr>
 <tr><td>Authors' preferred use case</td><td>distributed<br>computing</td><td><a href="https://sandstorm.io">platforms /<br>sandboxing</a></td><td>financial<br>trading</td><td>games</td></tr>
 </table>

+\* Updated Dec 15, 2014 (Cap'n Proto 0.5.0).
+
 **Schema Evolution**

 All four protocols allow you to add new fields to a schema over time, without breaking backwards-compatibility. New fields will be ignored by old binaries, and new binaries will fill in a default value when reading old data.
@@ -172,9 +178,19 @@ FlatBuffers also uses pointers, even though most objects are variable-width, pos

 **Platform Support**

-A really huge weakness of Cap'n Proto today is that it doesn't compile in Visual Studio, and therefore effectively doesn't support Windows (unless you count Cygwin, but most people don't).
+As of Dec 15, 2014, Cap'n Proto supports a superset of the languages supported by FlatBuffers and
+SBE, but is still far behind Protocol Buffers.
+
+While Cap'n Proto C++ is well-supported on POSIX platforms using GCC or Clang as their compiler,
+Cap'n Proto has only limited support for Visual C++: the basic serialization library works, but
+reflection and RPC do not yet work. Support will be expanded once Visual Studio's C++ compiler
+completes support for C++11.
+
+In comparison, SBE and FlatBuffers have reflection interfaces that work in Visual C++, though
+neither one has built-in RPC. Reflection is critical for certain use cases, but the majority of
+users won't need it.

-The problem initially was that Cap'n Proto makes liberal use of C++11 features, and MSVC has lagged behind in implementing them. We considered, but balked at, forking and backporting. It looks like [VS14 CTP1](http://blogs.msdn.com/b/vcblog/archive/2014/06/03/visual-studio-14-ctp.aspx) may finally be far enough to make a port practical, so now it's just a matter of finding the time. But that's a new problem: all my time is currently consumed by [Sandstorm.io](https://sandstorm.io), and while it is a major use of Cap'n Proto and will thus drive further development, it is entirely Linux-based, making it hard for me to justify spending my time on Windows support. What we need is a volunteer!
+(This section has been updated. When originally written, Cap'n Proto did not support MSVC at all.)

 ### Benchmarks?


--- a/doc/_posts/2014-12-15-capnproto-0.5-generics-msvc-java-csharp.md
+++ b/doc/_posts/2014-12-15-capnproto-0.5-generics-msvc-java-csharp.md
+---
+layout: post
+title: "Cap'n Proto 0.5: Generics, Visual C++, Java, C#, Sandstorm.io"
+author: kentonv
+---
+
+Today we're releasing Cap'n Proto 0.5. We've added lots of goodies!
+
+### Finally: Visual Studio
+
+Microsoft Visual Studio 2015 (currently in "preview") finally supports enough C++11 to get Cap'n
+Proto working, and we've duly added official support for it!
+
+Not all features are support yet. The core serialization functionality sufficient for 90% of users
+is available, but reflection and RPC APIs are not. We will turn on these APIs as soon as Visual C++
+is ready (the main blocker is incomplete `constexpr` support).
+
+As part of this, we now support CMake as a build system, and it can be used on Unix as well.
+
+In related news, for Windows users not interested in C++ but who need the Cap'n Proto tools for
+other languages, we now provide precompiled Windows binaries. See
+[the installation page]({{site.baseurl}}install.html).
+
+I'd like to thank [Bryan Boreham](https://github.com/bboreham),
+[Joshua Warner](https://github.com/joshuawarner32), and [Phillip Quinn](https://github.com/pqu) for
+their help in getting this working.
+
+### C#, Java
+
+While not strictly part of this release, our two biggest missing languages recently gained support
+for Cap'n Proto:
+
+* [Marc Gravell](https://github.com/mgravell) -- the man responsible for the most popular C#
+  implementation of Protobufs -- has now implemented
+  [Cap'n Proto in C#](https://github.com/mgravell/capnproto-net).
+* [David Renshaw](https://github.com/dwrensha), author of our existing Rust implementation and
+  [Sandstorm.io](https://sandstorm.io) core developer, has implemented
+  [Cap'n Proto in Java](https://github.com/dwrensha/capnproto-java).
+
+### Generics
+
+Cap'n Proto now supports [generics](http://localhost:4000/capnproto/language.html#generic-types),
+in the sense of Java generics or C++ templates. While working on
+[Sandstorm.io](https://sandstorm.io) we frequently found that we wanted this, and it turned out
+to be easy to support.
+
+This is a feature which Protocol Buffers does not support and likely never will. Cap'n Proto has a
+much easier time supporting exotic language features because the generated code is so simple. In
+C++, nearly all Cap'n Proto generated code is inline accessor methods, which can easily become
+templates. Protocol Buffers, in contrast, has generated parse and serialize functions and a host
+of other auxiliary stuff, which is too complex to inline and thus would need to be adapted to
+generics without using C++ templates. This would get ugly fast.
+
+Generics are not yet supported by all Cap'n Proto language implementations, but where they are not
+supported, things degrade gracefully: all type parameters simply become `AnyPointer`. You can still
+use generics in your schemas as documentation. Meanwhile, at least our C++, Java, and Python
+implementations have already been updated to support generics, and other implementations that
+wrap the C++ reflection API are likely to work too.
+
+### Canonicalization
+
+0.5 introduces a (backwards-compatible) change in
+[the way struct lists should be encoded](http://localhost:4000/capnproto/encoding.html#lists), in
+order to support [canonicalization](http://localhost:4000/capnproto/encoding.html#canonicalization).
+We believe this will make Cap'n Proto more appropriate for use in cryptographic protocols. If
+you've implemented Cap'n Proto in another language, please update your code!
+
+### Sandstorm and Capability Systems
+
+[Sandstorm.io](https://sandstorm.io) is Cap'n Proto's parent project: a platform for personal
+servers that is radically easier and more secure.
+
+Cap'n Proto RPC is the underlying communications layer powering Sandstorm. Sandstorm is a
+[capability system](http://www.erights.org/elib/capability/overview.html): applications can send
+each other object references and address messages to those objects. Messages can themselves contain
+new object references, and the recipient implicitly gains permission to use any object reference
+they receive. Essentially, Sandstorm allows the interfaces between two apps, or between and app
+and the platform, to be designed using the same vocabulary as interfaces between objects or
+libraries in an object-oriented programming language (but
+[without the mistakes of CORBA or DCOM](http://localhost:4000/capnproto/rpc.html#distributed-objects)).
+Cap'n Proto RPC is at the core of this.
+
+This has powerful implications: Consider the case of service discovery. On Sandstorm, all
+applications start out isolated from each other in secure containers. However, applications can
+(or, will be able to) publish Cap'n Proto object references to the system representing APIs they
+support. Then, another app can make a request to the system, saying "I need an object that
+implements interface Foo". At this point, the system can display a picker UI to the user,
+presenting all objects the user owns that satisfy the requirement. However, the requesting app only
+ever receives a reference to the object the user chooses; all others remain hidden. Thus, security
+becomes "automatic". The user does not have to edit an ACL on the providing app, nor copy around
+credentials, nor even answer any security question at all; it all derives automatically and
+naturally from the user's choices. We call this interface "The Powerbox".
+
+Moreover, because Sandstorm is fully aware of the object references held by every app, it will
+be able to display a visualization of these connections, allowing a user to quickly see which of
+their apps have access to each other and even revoke connections that are no longer desired with
+a mouse click.
+
+Cap'n Proto 0.5 introduces primitives to support "persistent" capabilities -- that is, the ability
+to "save" an object reference to disk and then restore it later, on a different connection.
+Obviously, the features described above totally depend on this feature.
+
+The next release of Cap'n Proto is likely to include another feature essential for Sandstorm: the
+ability to pass capabilities from machine to machine and have Cap'n Proto automatically form direct
+connections when you do. This allows servers running on different machines to interact with each
+other in a completely object-oriented way. Instead of passing around URLs (which necessitate a
+global namespace, lifetime management, firewall traversal, and all sorts of other obstacles), you
+can pass around capabilities and not worry about it. This will be central to Sandstorm's strategies
+for federation and cluster management.
+
+### Other notes
+
+* The C++ RPC code now uses `epoll` on Linux.
+* We now test Cap'n Proto on Android and MinGW, in addition to Linux, Mac OSX, Cygwin, and Visual
+  Studio. (iOS and FreeBSD are also reported to work, though are not yet part of our testing
+  process.)
--- a/doc/cxx.md
+++ b/doc/cxx.md
@@ -371,6 +371,46 @@ implicitly convertible in this way.  Unfortunately, this trick doesn't work on G

 [Interfaces (RPC) have their own page.](cxxrpc.html)

+### Generics
+
+[Generic types](language.html#generic-types) become templates in C++. The outer type (the one whose
+name matches the schema declaration's name) is templatized; the inner `Reader` and `Builder` types
+are not, because they inherit the parameters from the outer type. Similarly, template parameters
+should refer to outer types, not `Reader` or `Builder` types.
+
+For example, given:
+
+{% highlight capnp %}
+struct Map(Key, Value) {
+  entries @0 :List(Entry);
+  struct Entry {
+    key @0 :Key;
+    value @1 :Value;
+  }
+}
+
+struct People {
+  byName @0 :Map(Text, Person);
+  # Maps names to Person instances.
+}
+{% endhighlight %}
+
+You might write code like:
+
+{% highlight c++ %}
+void processPeople(People::Reader people) {
+  Map<Text, Person>::Reader reader = people.getByName();
+  capnp::List<Map<Text, Person>::Entry>::Reader entries =
+      reader.getEntries()
+  for (auto entry: entries) {
+    processPerson(entry);
+  }
+}
+{% endhighlight %}
+
+Note that all template parameters will be specified with a default value of `AnyPointer`.
+Therefore, the type `Map<>` is equivalent to `Map<capnp::AnyPointer, capnp::AnyPointer>`.
+
 ### Constants

 Constants are exposed with their names converted to UPPERCASE_WITH_UNDERSCORES naming style

--- a/doc/cxxrpc.md
+++ b/doc/cxxrpc.md
@@ -278,6 +278,9 @@ auto promise3 = promise2.then(
 });
 {% endhighlight %}

+For [generic methods](language.html#generic-methods), the `fooRequest()` method will be a template;
+you must explicitly specify type parameters.
+
 ### Servers

 The generated `Server` type is an abstract interface which may be subclassed to implement a
@@ -315,6 +318,11 @@ private:
 };
 {% endhighlight %}

+On the server side, [generic methods](language.html#generic-methods) are NOT templates. Instead,
+the generated code is exactly as if all of the generic parameters were bound to `AnyPointer`. The
+server generally does not get to know exactly what type the client requested; it must be designed
+to be correct for any parameterization.
+
 ## Initializing RPC

 Cap'n Proto makes it easy to start up an RPC client or server using the  "EZ RPC" classes,

--- a/doc/encoding.md
+++ b/doc/encoding.md
@@ -84,6 +84,41 @@ The built-in blob types are encoded as follows:
  Note that the NUL terminator is included in the size sent on the wire, but the runtime library
  should not count it in any size reported to the application.

+### Structs
+
+A struct value is encoded as a pointer to its content.  The content is split into two sections:
+data and pointers, with the pointer section appearing immediately after the data section.  This
+split allows structs to be traversed (e.g., copied) without knowing their type.
+
+A struct pointer looks like this:
+
+    lsb                      struct pointer                       msb
+    +-+-----------------------------+---------------+---------------+
+    |A|             B               |       C       |       D       |
+    +-+-----------------------------+---------------+---------------+
+
+    A (2 bits) = 0, to indicate that this is a struct pointer.
+    B (30 bits) = Offset, in words, from the end of the pointer to the
+        start of the struct's data section.  Signed.
+    C (16 bits) = Size of the struct's data section, in words.
+    D (16 bits) = Size of the struct's pointer section, in words.
+
+Fields are positioned within the struct according to an algorithm with the following principles:
+
+* The position of each field depends only on its definition and the definitions of lower-numbered
+  fields, never on the definitions of higher-numbered fields.  This ensures backwards-compatibility
+  when new fields are added.
+* Due to alignment reqirements, fields in the data section may be separated by padding.  However,
+  later-numbered fields may be positioned into the padding left between earlier-numbered fields.
+  Because of this, a struct will never contain more than 63 bits of padding.  Since objects are
+  rounded up to a whole number of words anyway, padding never ends up wasting space.
+* Unions and groups need not occupy contiguous memory.  Indeed, they may have to be split into
+  multiple slots if new fields are added later on.
+
+Field offsets are computed by the Cap'n Proto compiler.  The precise algorithm is too complicated
+to describe here, but you need not implement it yourself, as the compiler can produce a compiled
+schema format which includes offset information.
+
 ### Lists

 A list value is encoded as a pointer to a flat array of values.
@@ -111,13 +146,6 @@ A list value is encoded as a pointer to a flat array of values.
 The pointed-to values are tightly-packed.  In particular, `Bool`s are packed bit-by-bit in
 little-endian order (the first bit is the least-significant bit of the first byte).

-Lists of structs use the smallest element size in which the struct can fit.  So, a
-list of structs that each contain two `UInt8` fields and nothing else could be encoded with C = 3
-(2-byte elements).  A list of structs that each contain a single `Text` field would be encoded as
-C = 6 (pointer elements).  A list of structs that each contain a single `Bool` field would be
-encoded using C = 1 (1-bit elements).  A list of structs which are each more than one word in size
-must be encoded using C = 7 (composite).
-
 When C = 7, the elements of the list are fixed-width composite values -- usually, structs.  In
 this case, the list content is prefixed by a "tag" word that describes each individual element.
 The tag has the same layout as a struct pointer, except that the pointer offset (B) instead
@@ -133,47 +161,18 @@ In the future, we could consider implementing matrixes using the "composite" ele
 elements being fixed-size lists rather than structs.  In this case, the tag would look like a list
 pointer rather than a struct pointer.  As of this writing, no such feature has been implemented.

-Notice that because a small struct is encoded as if it were a primitive value, this means that
-if you have a field of type `List(T)` where `T` is a primitive or blob type, it
-is possible to change that field to `List(U)` where `U` is a struct whose `@0` field has type `T`,
-without breaking backwards-compatibility.  This comes in handy when you discover too late that you
-need to associate some extra data with each value in a primitive list -- instead of using parallel
-lists (eww), you can just replace it with a struct list.
-
-### Structs
-
-A struct value is encoded as a pointer to its content.  The content is split into two sections:
-data and pointers, with the pointer section appearing immediately after the data section.  This
-split allows structs to be traversed (e.g., copied) without knowing their type.
-
-A struct pointer looks like this:
-
-    lsb                      struct pointer                       msb
-    +-+-----------------------------+---------------+---------------+
-    |A|             B               |       C       |       D       |
-    +-+-----------------------------+---------------+---------------+
-
-    A (2 bits) = 0, to indicate that this is a struct pointer.
-    B (30 bits) = Offset, in words, from the end of the pointer to the
-        start of the struct's data section.  Signed.
-    C (16 bits) = Size of the struct's data section, in words.
-    D (16 bits) = Size of the struct's pointer section, in words.
-
-Fields are positioned within the struct according to an algorithm with the following principles:
-
-* The position of each field depends only on its definition and the definitions of lower-numbered
-  fields, never on the definitions of higher-numbered fields.  This ensures backwards-compatibility
-  when new fields are added.
-* Due to alignment reqirements, fields in the data section may be separated by padding.  However,
-  later-numbered fields may be positioned into the padding left between earlier-numbered fields.
-  Because of this, a struct will never contain more than 63 bits of padding.  Since objects are
-  rounded up to a whole number of words anyway, padding never ends up wasting space.
-* Unions and groups need not occupy contiguous memory.  Indeed, they may have to be split into
-  multiple slots if new fields are added later on.
-
-Field offsets are computed by the Cap'n Proto compiler.  The precise algorithm is too complicated
-to describe here, but you need not implement it yourself, as the compiler can produce a compiled
-schema format which includes offset information.
+A struct list must always be written using C = 7. However, a list of any element size (except
+C = 1, i.e. 1-bit) may be *decoded* as a struct list, with each element being interpreted as being
+a prefix of the struct data. For instance, a list of 2-byte values (C = 3) can be decoded as a
+struct list where each struct has 2 bytes in their "data" section (and an empty pointer section). A
+list of pointer values (C = 6) can be decoded as a struct list where each sturct has a pointer
+section with one pointer (and an empty data section). The purpose of this rule is to make it
+possible to upgrade a list of primitives to a list of structs, as described under the
+[protocol evolution rules](http://localhost:4000/capnproto/language.html#evolving-your-protocol).
+(We make a special exception that boolean lists cannot be upgraded in this way due to the
+unreasonable implementation burden.) Note that even though struct lists can be decoded from any
+element size (except C = 1), it is NOT permitted to encode a struct list using any type other than
+C = 7 because doing so would interfere with the [canonicalization algorithm](#canonicalization).

 #### Default Values

@@ -329,27 +328,71 @@ section.
 ### Compression

 When Cap'n Proto messages may contain repetitive data (especially, large text blobs), it makes sense
-to apply a standard compression algorithm in addition to packing.  When CPU time is scarce, we
-recommend Google's [Snappy](https://code.google.com/p/snappy/).  Otherwise,
-[zlib](http://www.zlib.net) is slower but will compress more.
-
-## Security Notes
-
-A naive implementation of a Cap'n Proto reader may be vulnerable to DoS attacks based on two types
-of malicious input:
-
-* A message containing cyclic (or even just overlapping) pointers can cause the reader to go into
-  an infinite loop while traversing the content.
-* A message with deeply-nested objects can cause a stack overflow in typical code which processes
-  messages recursively.
-
-To defend against these attacks, every Cap'n Proto implementation should implement the following
-restrictions by default:
-
-* As the application traverses the message, each time a pointer is dereferenced, a counter should
-  be incremented by the size of the data to which it points.  If this counter goes over some limit,
-  an error should be raised, and/or default values should be returned.  The C++ implementation
-  currently defaults to a limit of 64MiB, but allows the caller to set a different limit if desired.
-* As the application traverses the message, the pointer depth should be tracked.  Again, if it goes
-  over some limit, an error should be raised.  The C++ implementation currently defaults to a limit
-  of 64 pointers, but allows the caller to set a different limit.
+to apply a standard compression algorithm in addition to packing. When CPU time is scarce, we
+recommend [LZ4 compression](https://code.google.com/p/lz4/). Otherwise, [zlib](http://www.zlib.net)
+is slower but will compress more.
+
+## Canonicalization
+
+Cap'n Proto messages have a well-defined canonical form. Cap'n Proto encoders are NOT required to
+output messages in canonical form, and in fact they will almost never do so by default. However,
+it is possible to write code which canonicalizes a Cap'n Proto message without knowing its schema.
+
+A canonical Cap'n Proto message must adhere to the following rules:
+
+* The object tree must be encoded in preorder (with respect to the order of the pointers within
+  each object).
+* The message must be encoded as a single segment. (When signing or hashing a canonical Cap'n Proto
+  message, the segment table shall not be included, because it would be redundant.)
+* Trailing zero-valued words in a struct's data or pointer segments must be truncated. Since zero
+  represents a default value, this does not change the struct's meaning. This rule is important
+  to ensure that adding a new field to a struct does not affect the canonical encoding of messages
+  that do not set that field.
+* Similarly, for a struct list, if a trailing word in a section of all structs in the list is zero,
+  then it must be truncated from all structs in the list. (All structs in a struct list must have
+  equal sizes, hence a trailing zero can only be removed if it is zero in all elements.)
+* Canonical messages are not packed. However, packing can still be applied for transmission
+  purposes; the message must simply be unpacked before checking signatures.
+
+Note that Cap'n Proto 0.5 introduced the rule that struct lists must always be encoded using
+C = 7 in the [list pointer](#lists). Prior versions of Cap'n Proto allowed struct lists to be
+encoded using any element size, so that small structs could be compacted to take less that a word
+per element, and many encoders in fact implemented this. Unfortunately, this "optimization" made
+canonicalization impossible without knowing the schema, which is a significant obstacle. Therefore,
+the rules have been changed in 0.5, but data written by previous versions may not be possible to
+canonicalize.
+
+## Security Considerations
+
+A naive implementation of a Cap'n Proto reader may be vulnerable to attacks based on various kinds
+of malicious input. Implementations MUST guard against these.
+
+### Pointer Validation
+
+Cap'n Proto readers must validate pointers, e.g. to check that the target object is within the
+bounds of its segment. To avoid an upfront scan of the message (which would defeat Cap'n Proto's
+O(1) parsing performance), validation should occur lazily when the getter method for a pointer is
+called, throwing an exception or returning a default value if the pointer is invalid.
+
+### Amplification attack
+
+A message containing cyclic (or even just overlapping) pointers can cause the reader to go into
+an infinite loop while traversing the content.
+
+To defend against this, as the application traverses the message, each time a pointer is
+dereferenced, a counter should be incremented by the size of the data to which it points.  If this
+counter goes over some limit, an error should be raised, and/or default values should be returned.
+
+The C++ implementation currently defaults to a limit of 64MiB, but allows the caller to set a
+different limit if desired. Another reasonable strategy is to set the limit to some multiple of
+the original message size; however, most applications should place limits on overall message sizes
+anyway, so it makes sense to have one check cover both.
+
+### Stack overflow DoS attack
+
+A message with deeply-nested objects can cause a stack overflow in typical code which processes
+messages recursively.
+
+To defend against this, as the application traverses the message, the pointer depth should be
+tracked. If it goes over some limit, an error should be raised.  The C++ implementation currently
+defaults to a limit of 64 pointers, but allows the caller to set a different limit.
--- a/doc/language.md
+++ b/doc/language.md
@@ -284,6 +284,8 @@ group" in Cap'n Proto, which was the case that got into the most trouble with Pr
 A struct may have a field with type `AnyPointer`.  This field's value can be of any pointer type --
 i.e. any struct, interface, list, or blob.  This is essentially like a `void*` in C.

+See also [generics](#generic-types).
+
 ### Enums

 An enum is a type with a small finite set of symbolic values.
@@ -352,6 +354,126 @@ it very easy to develop secure protocols with Cap'n Proto -- you almost don't ne
 access control at all. This feature is what makes Cap'n Proto a "capability-based" RPC system -- a
 reference to an object inherently represents a "capability" to access it.

+### Generic Types
+
+A struct or interface type may be parameterized, making it "generic". For example, this is useful
+for defining type-safe containers:
+
+{% highlight capnp %}
+struct Map(Key, Value) {
+  entries @0 :List(Entry);
+  struct Entry {
+    key @0 :Key;
+    value @1 :Value;
+  }
+}
+
+struct People {
+  byName @0 :Map(Text, Person);
+  # Maps names to Person instances.
+}
+{% endhighlight %}
+
+Cap'n Proto generics work very similarly to Java generics or C++ templates. Some notes:
+
+* Only pointer types (structs, lists, blobs, and interfaces) can be used as generic parameters,
+  much like in Java. This is a pragmatic limitation: allowing parameters to have non-pointer types
+  would mean that different parameterizations of a struct could have completely different layouts,
+  which would excessively complicate the Cap'n Proto implementation.
+
+* A type declaration nested inside a generic type may use the type parameters of the outer type,
+  as you can see in the example above. This differs from Java, but matches C++. If you want to
+  refer to a nested type from outside the outer type, you must specify the parameters on the outer
+  type, not the inner. For example, `Map(Text, Person).Entry` is a valid type;
+  `Map.Entry(Text, Person)` is NOT valid. (Of course, an inner type may declare additional generic
+  parameters.)
+
+* If you refer to a generic type but omit its parameters (e.g. declare a field of type `Map` rather
+  than `Map(T, U)`), it is as if you specified `AnyPointer` for each parameter. Note that such
+  a type is wire-compatible with any specific parameterization, so long as you interpret the
+  `AnyPointer`s as the correct type at runtime.
+
+* Relatedly, it is safe to cast an generic interface of a specific parameterization to a generic
+  interface where all parameters are `AnyPointer` and vice versa, as long as the `AnyPointer`s are
+  treated as the correct type at runtime. This means that e.g. you can implement a server in a
+  generic way that is correct for all parameterizations but call it from clients using a specific
+  parameterization.
+
+* The encoding of a generic type is exactly the same as the encoding of a type produced by
+  substituting the type parameters manually. For example, `Map(Text, Person)` is encoded exactly
+  the same as:
+
+  <div>{% highlight capnp %}
+  struct PersonMap {
+    # Encoded the same as Map(Text, Person).
+    entries @0 :List(Entry);
+    struct Entry {
+      key @0 :Text;
+      value @1 :Person;
+    }
+  }
+  {% endhighlight %}
+  </div>
+
+  Therefore, it is possible to upgrade non-generic types to generic types while retaining
+  backwards-compatibility.
+
+* Similarly, a generic interface's protocol is exactly the same as the interface obtained by
+  manually substituting the generic parameters.
+
+### Generic Methods
+
+Interface methods may also have "implicit" generic parameters that apply to a particular method
+call. This commonly applies to "factory" methods. For example:
+
+{% highlight capnp %}
+interface Assignable(T) {
+  # A generic interface, with non-generic methods.
+  get @0 () -> (value :T);
+  set @1 (value :T) -> ();
+}
+
+interface AssignableFactory {
+  newAssignable @0 [T] (initialValue :T)
+      -> (assignable :Assignable(T));
+  # A generic method.
+}
+{% endhighlight %}
+
+Here, the method `newAssignable()` is generic. The return type of the method depends on the input
+type.
+
+Ideally, calls to a generic method should not have to explicitly specify the method's type
+parameters, because they should be inferred from the types of the method's regular parameters.
+However, this may not always be possible; it depends on the programming language and API details.
+
+Note that if a method's generic parameter is used only in its returns, not its parameters, then
+this implies that the returned value is appropriate for any parameterization. For example:
+
+{% highlight capnp %}
+newUnsetAssignable @1 [T] () -> (assignable :Assignable(T));
+# Create a new assignable. `get()` on the returned object will
+# throw an exception until `set()` has been called at least once.
+{% endhighlight %}
+
+Because of the way this method is designed, the returned `Assignable` is initially valid for any
+`T`. Effectively, it doesn't take on a type until the first time `set()` is called, and then `T`
+retroactively becomes the type of value passed to `set()`.
+
+In contrast, if it's the case that the returned type is unknown, then you should NOT declare it
+as generic. Instead, use `AnyPointer`, or omit a type's parameters (since they default to
+`AnyPointer`). For example:
+
+{% highlight capnp %}
+getNamedAssignable @2 (name :Text) -> (assignable :Assignable);
+# Get the `Assignable` with the given name. It is the
+# responsibility of the caller to keep track of the type of each
+# named `Assignable` and cast the returned object appropriately.
+{% endhighlight %}
+
+Here, we omitted the parameters to `Assignable` in the return type, because the returned object
+has a specific type parameterization but it is not locally knowable.
+
 ### Constants

 You can define constants in Cap'n Proto.  These don't affect what is sent on the wire, but they
@@ -577,34 +699,82 @@ are much more likely.

 ## Evolving Your Protocol

-A protocol can be changed in the following ways without breaking backwards-compatibility:
+A protocol can be changed in the following ways without breaking backwards-compatibility, and
+without changing the [canonical](encoding.html#canonicalization) encoding of a message:

 * New types, constants, and aliases can be added anywhere, since they obviously don't affect the
  encoding of any existing type.
+
 * New fields, enumerants, and methods may be added to structs, enums, and interfaces, respectively,
  as long as each new member's number is larger than all previous members.  Similarly, new fields
  may be added to existing groups and unions.
+
 * New parameters may be added to a method.  The new parameters must be added to the end of the
  parameter list and must have default values.
+
 * Members can be re-arranged in the source code, so long as their numbers stay the same.
+
 * Any symbolic name can be changed, as long as the type ID / ordinal numbers stay the same.  Note
  that type declarations have an implicit ID generated based on their name and parent's ID, but
  you can use `capnp compile -ocapnp myschema.capnp` to find out what that number is, and then
  declare it explicitly after your rename.
-* Types definitions can be moved to different scopes, as long as the type ID is declared
+
+* Type definitions can be moved to different scopes, as long as the type ID is declared
  explicitly.
+
+* A field can be moved into a group or a union, as long as the group/union and all other fields
+  within it are new.  In other words, a field can be replaced with a group or union containing an
+  equivalent field and some new fields.
+
+* A non-generic type can be made [generic](#generic-types), and new generic parameters may be
+  added to an existing generic type. Other types used inside the body of the newly-generic type can
+  be replaced with the new generic parameter so long as all existing users of the type are updated
+  to bind that generic parameter to the type it replaced. For example:
+
+  <div>{% highlight capnp %}
+  struct Map {
+    entries @0 :List(Entry);
+    struct Entry {
+      key @0 :Text;
+      value @1 :Text;
+    }
+  }
+  {% endhighlight %}
+  </div>
+
+  Can change to:
+
+  <div>{% highlight capnp %}
+  struct Map(Key, Value) {
+    entries @0 :List(Entry);
+    struct Entry {
+      key @0 :Key;
+      value @1 :Value;
+    }
+  }
+  {% endhighlight %}
+  </div>
+
+  As long as all existing uses of `Map` are replaced with `Map(Text, Text)` (and any uses of
+  `Map.Entry` are replaced with `Map(Text, Text).Entry`).
+
+  (This rule applies analogously to generic methods.)
+
+The following changes are backwards-compatible but may change the canonical encoding of a mesasge.
+Apps that rely on canonicalization (such as some cryptographic protocols) should avoid changes in
+this list, but most apps can safely use them:
+
 * A field of type `List(T)`, where `T` is a primitive type, blob, or list, may be changed to type
  `List(U)`, where `U` is a struct type whose `@0` field is of type `T`.  This rule is useful when
  you realize too late that you need to attach some extra data to each element of your list.
  Without this rule, you would be stuck defining parallel lists, which are ugly and error-prone.
-* A field can be moved into a group or a union, as long as the group/union and all other fields
-  within it are new.  In other words, a field can be replaced with a group or union containing an
-  equivalent field and some new fields.
+  As a special exception to this rule, `List(Bool)` may **not** be upgraded to a list of structs,
+  because implementing this for bit lists has proven unreasonably expensive.

-Any other change should be assumed NOT to be safe.  In particular:
+Any change not listed above should be assumed NOT to be safe.  In particular:

 * You cannot change a field, method, or enumerant's number.
-* You cannot change a field or method parameter's type or default value, except as described above.
+* You cannot change a field or method parameter's type or default value.
 * You cannot change a type's ID.
 * You cannot change the name of a type that doesn't have an explicit ID, as the implicit ID is
  generated based in part on the type name.

--- a/doc/rpc.md
+++ b/doc/rpc.md
@@ -163,12 +163,31 @@ No!

 CORBA failed for many reasons, with the usual problems of design-by-committee being a big one.

-However, CORBA also had a critical technical flaw:  it did not implement promise pipelining.  As
-shown above, promise pipelining is absolutely critical to making object-oriented interfaces work
-in the presence of latency.  It is often said that object- and RPC-oriented protocols don't work
-because they try to pretend that a network call is equivalent to a local call.  In reality, this
-is not actually a problem with object protocols in general, but specifically CORBA and
-similarly-naive protocols that lack promise pipelining.  Promise pipelining is the missing link.
+However, the biggest reason for CORBA's failure is that it tried to make remote calls look the
+same as local calls. Cap'n Proto does NOT do this -- remote calls have a different kind of API
+involving promises, and accounts for the presence of a network introducing latency and
+unreliability.
+
+As shown above, promise pipelining is absolutely critical to making object-oriented interfaces work
+in the presence of latency. If remote calls look the same as local calls, there is no opportunity
+to introduce promise pipelining, and latency is inevitable. Any distributed object protocol which
+does not support promise pipelining cannot -- and should not -- succeed. Thus the failure of CORBA
+(and DCOM, etc.) was inevitable, but Cap'n Proto is different.
+
+### Handling disconnects
+
+Networks are unreliable. Occasionally, connections will be lost. When this happens, all
+capabilities (object references) served by the connection will become disconnected. Any further
+calls addressed to these capabilities will throw "disconnected" exceptions. When this happens, the
+client will need to create a new connection and try again. All Cap'n Proto applications with
+long-running connections (and probably short-running ones too) should be prepared to catch
+"disconnected" exceptions and respond appropriately.
+
+On the server side, when all references to an object have been "dropped" (either because the
+clients explicitly dropped them or because they became disconnected), the object will be closed
+(in C++, the destructor is called; in GC'd languages, a `close()` method is called). This allows
+servers to easily allocate per-client resources without having to clean up on a timeout or risk
+leaking memory.

 ### Security