Commit 268a2235 authored by Kenton Varda's avatar Kenton Varda

Document inline types, new struct list encoding rules, etc.

parent 1a2d78ea
...@@ -15,7 +15,7 @@ class CapnpLexer(RegexLexer): ...@@ -15,7 +15,7 @@ class CapnpLexer(RegexLexer):
(r'=', Literal, 'expression'), (r'=', Literal, 'expression'),
(r':', Name.Class, 'type'), (r':', Name.Class, 'type'),
(r'\$', Name.Attribute, 'annotation'), (r'\$', Name.Attribute, 'annotation'),
(r'(struct|enum|interface|union|import|using|const|annotation|in|of|on|as|with|from)\b', (r'(struct|enum|interface|union|import|using|const|annotation|in|of|on|as|with|from|fixed)\b',
Token.Keyword), Token.Keyword),
(r'[a-zA-Z0-9_.]+', Token.Name), (r'[a-zA-Z0-9_.]+', Token.Name),
(r'[^#@=:$a-zA-Z0-9_]+', Text), (r'[^#@=:$a-zA-Z0-9_]+', Text),
......
...@@ -114,6 +114,12 @@ A list value is encoded as a pointer to a flat array of values. ...@@ -114,6 +114,12 @@ A list value is encoded as a pointer to a flat array of values.
The pointed-to values are tightly-packed. In particular, `Bool`s are packed bit-by-bit in The pointed-to values are tightly-packed. In particular, `Bool`s are packed bit-by-bit in
little-endian order (the first bit is the least-significant bit of the first byte). little-endian order (the first bit is the least-significant bit of the first byte).
Lists of structs use the smallest element size in which the struct can fit, except that single-bit
structs are not allowed. So, a list of structs that each contain two `UInt8` fields and nothing
else could be encoded with C = 3 (2-byte elements). A list of structs that each contain a single
`Text` field would be encoded as C = 6 (pointer elements). A list structs which are each more than
one word in size must be be encoded using C = 7 (composite).
When C = 7, the elements of the list are fixed-width composite values -- usually, structs. In When C = 7, the elements of the list are fixed-width composite values -- usually, structs. In
this case, the list content is prefixed by a "tag" word that describes each individual element. this case, the list content is prefixed by a "tag" word that describes each individual element.
The tag has the same layout as a struct pointer, except that the pointer offset (B) instead The tag has the same layout as a struct pointer, except that the pointer offset (B) instead
...@@ -129,6 +135,13 @@ In the future, we could consider implementing matrixes using the "composite" ele ...@@ -129,6 +135,13 @@ In the future, we could consider implementing matrixes using the "composite" ele
elements being fixed-size lists rather than structs. In this case, the tag would look like a list elements being fixed-size lists rather than structs. In this case, the tag would look like a list
pointer rather than a struct pointer. As of this writing, no such feature has been implemented. pointer rather than a struct pointer. As of this writing, no such feature has been implemented.
Notice that because a small struct is encoded as if it were a primitive value, this means that
if you have a field of type `List(T)` where `T` is a primitive or blob type (other than `Bool`), it
is possible to change that field to `List(U)` where `U` is a struct whose `@0` field has type `T`,
without breaking backwards-compatibility. This comes in handy when you discover too late that you
need to associate some extra data with each value in a primitive list -- instead of using parallel
lists (eww), you can just replace it with a struct list.
### Structs ### Structs
A struct value is encoded as a pointer to its content. The content is split into two sections: A struct value is encoded as a pointer to its content. The content is split into two sections:
...@@ -150,6 +163,12 @@ A struct pointer looks like this: ...@@ -150,6 +163,12 @@ A struct pointer looks like this:
#### Field Positioning #### Field Positioning
_WARNING: You should not attempt to implement the following algorithm. The compiled schemas
produced by the Cap'n Proto compiler are already populated with offset information, so code
generators and other consumers of compiled schemas should never need to compute them manually.
The algorithm is complicated and easy to get wrong, so it is best to rely on the canonical
implementation._
Ignoring unions, the layout of fields within the struct is determined by the following algorithm: Ignoring unions, the layout of fields within the struct is determined by the following algorithm:
For each field of the struct, ordered by field number { For each field of the struct, ordered by field number {
...@@ -198,6 +217,16 @@ effect of the desire to pack fields in the smallest space where they will fit an ...@@ -198,6 +217,16 @@ effect of the desire to pack fields in the smallest space where they will fit an
maintain backwards-compatibility as fields are added. The worst case should be rare in practice, maintain backwards-compatibility as fields are added. The worst case should be rare in practice,
and can be avoided entirely by always declaring a union's largest member first. and can be avoided entirely by always declaring a union's largest member first.
Inline fields add yet more complication. An inline field may contain some data and some pointers,
which are positioned independently. If the data part is non-empty but is less than one word, it is
rounded up to the nearest of 1, 2, or 4 bytes and treated the same as a field of that size.
Otherwise, it is added to the end of the data section. Any pointers are added to the end of the
pointer section. When an inline field appears inside a union, it will attempt to overlap with a
previous union member just like any other field would -- but note that because inline fields can
have non-power-of-two sizes, such unions can get arbitrarily large, and care should be taken not
to interleave union field numbers with non-union field numbers due to the problems described in
the previous paragraph.
#### Default Values #### Default Values
A default struct is always all-zeros. To achieve this, fields in the data section are stored xor'd A default struct is always all-zeros. To achieve this, fields in the data section are stored xor'd
......
...@@ -363,6 +363,64 @@ annotation corge(file) :MyStruct; ...@@ -363,6 +363,64 @@ annotation corge(file) :MyStruct;
$corge(string = "hello", number = 123); $corge(string = "hello", number = 123);
{% endhighlight %} {% endhighlight %}
## Advanced Topics
### Inlining Structs
Say you have a small struct which you know will never add new fields. For efficiency, you may want
instance of this struct to be "inlined" into larger structs where it is used. This saves eight
bytes of space per usage (the size of a pointer) and may improve cache locality.
To inline a struct, you must first declare that it has fixed-width, and specify the sizes of its
data and pointer sections:
{% highlight capnp %}
struct Point16 fixed(4 bytes) {
x @0 :UInt16;
y @1 :UInt16;
}
struct Name fixed(2 pointers) {
first @0 :Text;
last @1 :Text;
}
struct TextWithHash fixed(8 bytes, 1 pointers) {
hash @0 :UInt64;
text @1 :Text;
}
{% endhighlight %}
The compiler will produce an error if the specified size is too small to hold the defined fields,
so if you are unsure how much space you need, simply delcare your struct `fixed()` and the compiler
will tell you.
Once you have a fixed-width struct, you must explicitly declare it `Inline` at the usage site:
{% highlight capnp %}
struct Foo {
a @0 :Point16; # NOT inlined
b @1 :Inline(Point16); # inlined!
}
{% endhighlight %}
### Inlining Lists and Data
You may also inline fixed-length lists and data.
{% highlight capnp %}
struct Foo {
sha1Hash @0 :InlineData(20); # 160-bit fixed-width.
vertex3 @1 :InlineList(Float32, 3); # x, y, and z coordinates.
vertexList @2 :List(InlineList(Float32, 3));
# Much more efficient than List(List(Float32))!
}
{% endhighlight %}
At this time, there is no `InlineText` because text almost always has variable length.
## Evolving Your Protocol ## Evolving Your Protocol
A protocol can be changed in the following ways without breaking backwards-compatibility: A protocol can be changed in the following ways without breaking backwards-compatibility:
...@@ -375,10 +433,15 @@ A protocol can be changed in the following ways without breaking backwards-compa ...@@ -375,10 +433,15 @@ A protocol can be changed in the following ways without breaking backwards-compa
parameter list and must have default values. parameter list and must have default values.
* Any symbolic name can be changed, as long as the ordinal numbers stay the same. * Any symbolic name can be changed, as long as the ordinal numbers stay the same.
* Types definitions can be moved to different scopes. * Types definitions can be moved to different scopes.
* A field of type `List(T)`, where `T` is NOT a struct type, may be changed to type `List(U)`, * A field of type `List(T)`, where `T` is a primitive type (except `Bool`), non-inline blob, or
where `U` is a struct type whose `@0` field is of type `T`. This rule is useful when you non-inline list, may be changed to type `List(U)`, where `U` is a struct type whose `@0` field is
realize too late that you need to attach some extra data to each element of your list. Without of type `T`. This rule is useful when you realize too late that you need to attach some extra
this rule, you would be stuck defining parallel lists, which are ugly and error-prone. data to each element of your list. Without this rule, you would be stuck defining parallel
lists, which are ugly and error-prone. (`List(Bool)` does not support this transformation
because it would be difficult to implement given that booleans are packed 8 to the byte.)
* A struct that is not already `fixed` can be made `fixed`. However, once a struct is declared
`fixed`, the declaration cannot be removed or changed, as this would change the layout of `Inline`
uses of the struct.
Any other change should be assumed NOT to be safe. Also, these rules only apply to the Cap'n Proto Any other change should be assumed NOT to be safe. Also, these rules only apply to the Cap'n Proto
native encoding. It is sometimes useful to transcode Cap'n Proto types to other formats, like native encoding. It is sometimes useful to transcode Cap'n Proto types to other formats, like
......
...@@ -9,3 +9,12 @@ the future! ...@@ -9,3 +9,12 @@ the future!
If you'd like to own the implementation of Cap'n Proto in some particular language, If you'd like to own the implementation of Cap'n Proto in some particular language,
[let us know](https://groups.google.com/group/capnproto)! [let us know](https://groups.google.com/group/capnproto)!
**You should e-mail the list _before_ you start hacking.** We don't bite, and we'll probably have
useful tips that will save you time. :)
**Do not implement your own schema parser.** The schema language is more compilcated than it
looks, and the algorithm to determine offsets of fields is subtle. If you reuse `capnpc`'s parser,
you won't risk getting these wrong, and you won't have to spend time keeping your parser up-to-date.
It will soon be possible to write code generator "plugins" for `capnpc` in any language, not just
Haskell. (A guide will be added here when this is implemented.)
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment