Document inline types, new struct list encoding rules, etc.

268a2235 · Kenton Varda · 1a2d78ea · 268a2235 · 268a2235 · 268a2235
Commit 268a2235 authored Apr 29, 2013 by Kenton Varda
Hide whitespace changes
Inline Side-by-side

Showing with 106 additions and 5 deletions

capnp_lexer.py doc/_plugins/capnp_lexer.py +1 -1

encoding.md doc/encoding.md +29 -0

language.md doc/language.md +67 -4

otherlang.md doc/otherlang.md +9 -0

No files found.
--- a/doc/_plugins/capnp_lexer.py
+++ b/doc/_plugins/capnp_lexer.py
@@ -15,7 +15,7 @@ class CapnpLexer(RegexLexer):
            (r'=', Literal, 'expression'),
            (r':', Name.Class, 'type'),
            (r'\$', Name.Attribute, 'annotation'),
-            (r'(struct|enum|interface|union|import|using|const|annotation|in|of|on|as|with|from)\b',
+            (r'(struct|enum|interface|union|import|using|const|annotation|in|of|on|as|with|from|fixed)\b',
                Token.Keyword),
            (r'[a-zA-Z0-9_.]+', Token.Name),
            (r'[^#@=:$a-zA-Z0-9_]+', Text),

--- a/doc/encoding.md
+++ b/doc/encoding.md
@@ -114,6 +114,12 @@ A list value is encoded as a pointer to a flat array of values.
 The pointed-to values are tightly-packed.  In particular, `Bool`s are packed bit-by-bit in
 little-endian order (the first bit is the least-significant bit of the first byte).
+Lists of structs use the smallest element size in which the struct can fit, except that single-bit
+structs are not allowed.  So, a list of structs that each contain two `UInt8` fields and nothing
+else could be encoded with C = 3 (2-byte elements).  A list of structs that each contain a single
+`Text` field would be encoded as C = 6 (pointer elements).  A list structs which are each more than
+one word in size must be be encoded using C = 7 (composite).
 When C = 7, the elements of the list are fixed-width composite values -- usually, structs.  In
 this case, the list content is prefixed by a "tag" word that describes each individual element.
 The tag has the same layout as a struct pointer, except that the pointer offset (B) instead
@@ -129,6 +135,13 @@ In the future, we could consider implementing matrixes using the "composite" ele
 elements being fixed-size lists rather than structs.  In this case, the tag would look like a list
 pointer rather than a struct pointer.  As of this writing, no such feature has been implemented.
+Notice that because a small struct is encoded as if it were a primitive value, this means that
+if you have a field of type `List(T)` where `T` is a primitive or blob type (other than `Bool`), it
+is possible to change that field to `List(U)` where `U` is a struct whose `@0` field has type `T`,
+without breaking backwards-compatibility.  This comes in handy when you discover too late that you
+need to associate some extra data with each value in a primitive list -- instead of using parallel
+lists (eww), you can just replace it with a struct list.
 ### Structs
 A struct value is encoded as a pointer to its content.  The content is split into two sections:
@@ -150,6 +163,12 @@ A struct pointer looks like this:
 #### Field Positioning
+_WARNING:  You should not attempt to implement the following algorithm.  The compiled schemas
+produced by the Cap'n Proto compiler are already populated with offset information, so code
+generators and other consumers of compiled schemas should never need to compute them manually.
+The algorithm is complicated and easy to get wrong, so it is best to rely on the canonical
+implementation._
 Ignoring unions, the layout of fields within the struct is determined by the following algorithm:
    For each field of the struct, ordered by field number {
@@ -198,6 +217,16 @@ effect of the desire to pack fields in the smallest space where they will fit an
 maintain backwards-compatibility as fields are added.  The worst case should be rare in practice,
 and can be avoided entirely by always declaring a union's largest member first.
+Inline fields add yet more complication.  An inline field may contain some data and some pointers,
+which are positioned independently.  If the data part is non-empty but is less than one word, it is
+rounded up to the nearest of 1, 2, or 4 bytes and treated the same as a field of that size.
+Otherwise, it is added to the end of the data section.  Any pointers are added to the end of the
+pointer section.  When an inline field appears inside a union, it will attempt to overlap with a
+previous union member just like any other field would -- but note that because inline fields can
+have non-power-of-two sizes, such unions can get arbitrarily large, and care should be taken not
+to interleave union field numbers with non-union field numbers due to the problems described in
+the previous paragraph.
 #### Default Values
 A default struct is always all-zeros.  To achieve this, fields in the data section are stored xor'd

--- a/doc/language.md
+++ b/doc/language.md
@@ -363,6 +363,64 @@ annotation corge(file) :MyStruct;
 $corge(string = "hello", number = 123);
 {% endhighlight %}
+## Advanced Topics
+### Inlining Structs
+Say you have a small struct which you know will never add new fields.  For efficiency, you may want
+instance of this struct to be "inlined" into larger structs where it is used.  This saves eight
+bytes of space per usage (the size of a pointer) and may improve cache locality.
+To inline a struct, you must first declare that it has fixed-width, and specify the sizes of its
+data and pointer sections:
+{% highlight capnp %}
+struct Point16 fixed(4 bytes) {
+  x @0 :UInt16;
+  y @1 :UInt16;
+}
+struct Name fixed(2 pointers) {
+  first @0 :Text;
+  last @1 :Text;
+}
+struct TextWithHash fixed(8 bytes, 1 pointers) {
+  hash @0 :UInt64;
+  text @1 :Text;
+}
+{% endhighlight %}
+The compiler will produce an error if the specified size is too small to hold the defined fields,
+so if you are unsure how much space you need, simply delcare your struct `fixed()` and the compiler
+will tell you.
+Once you have a fixed-width struct, you must explicitly declare it `Inline` at the usage site:
+{% highlight capnp %}
+struct Foo {
+  a @0 :Point16;          # NOT inlined
+  b @1 :Inline(Point16);  # inlined!
+}
+{% endhighlight %}
+### Inlining Lists and Data
+You may also inline fixed-length lists and data.
+{% highlight capnp %}
+struct Foo {
+  sha1Hash @0 :InlineData(20);  # 160-bit fixed-width.
+  vertex3 @1 :InlineList(Float32, 3);  # x, y, and z coordinates.
+  vertexList @2 :List(InlineList(Float32, 3));
+  # Much more efficient than List(List(Float32))!
+}
+{% endhighlight %}
+At this time, there is no `InlineText` because text almost always has variable length.
 ## Evolving Your Protocol
 A protocol can be changed in the following ways without breaking backwards-compatibility:
@@ -375,10 +433,15 @@ A protocol can be changed in the following ways without breaking backwards-compa
  parameter list and must have default values.
 * Any symbolic name can be changed, as long as the ordinal numbers stay the same.
 * Types definitions can be moved to different scopes.
-* A field of type `List(T)`, where `T` is NOT a struct type, may be changed to type `List(U)`,
+* A field of type `List(T)`, where `T` is a primitive type (except `Bool`), non-inline blob, or
-  where `U` is a struct type whose `@0` field is of type `T`.  This rule is useful when you
+  non-inline list, may be changed to type `List(U)`, where `U` is a struct type whose `@0` field is
-  realize too late that you need to attach some extra data to each element of your list.  Without
+  of type `T`.  This rule is useful when you realize too late that you need to attach some extra
-  this rule, you would be stuck defining parallel lists, which are ugly and error-prone.
+  data to each element of your list.  Without this rule, you would be stuck defining parallel
+  lists, which are ugly and error-prone.  (`List(Bool)` does not support this transformation
+  because it would be difficult to implement given that booleans are packed 8 to the byte.)
+* A struct that is not already `fixed` can be made `fixed`.  However, once a struct is declared
+  `fixed`, the declaration cannot be removed or changed, as this would change the layout of `Inline`
+  uses of the struct.
 Any other change should be assumed NOT to be safe.  Also, these rules only apply to the Cap'n Proto
 native encoding.  It is sometimes useful to transcode Cap'n Proto types to other formats, like

--- a/doc/otherlang.md
+++ b/doc/otherlang.md
@@ -9,3 +9,12 @@ the future!
 If you'd like to own the implementation of Cap'n Proto in some particular language,
 [let us know](https://groups.google.com/group/capnproto)!
+**You should e-mail the list _before_ you start hacking.**  We don't bite, and we'll probably have
+useful tips that will save you time.  :)
+**Do not implement your own schema parser.**  The schema language is more compilcated than it
+looks, and the algorithm to determine offsets of fields is subtle.  If you reuse `capnpc`'s parser,
+you won't risk getting these wrong, and you won't have to spend time keeping your parser up-to-date.
+It will soon be possible to write code generator "plugins" for `capnpc` in any language, not just
+Haskell.  (A guide will be added here when this is implemented.)