Commit 4f4a9c44 authored by Kenton Varda's avatar Kenton Varda

Document encoding of zero-size struct pointers. Fixes #625.

parent 70fcf7f5
...@@ -119,6 +119,44 @@ Field offsets are computed by the Cap'n Proto compiler. The precise algorithm i ...@@ -119,6 +119,44 @@ Field offsets are computed by the Cap'n Proto compiler. The precise algorithm i
to describe here, but you need not implement it yourself, as the compiler can produce a compiled to describe here, but you need not implement it yourself, as the compiler can produce a compiled
schema format which includes offset information. schema format which includes offset information.
#### Default Values
A default struct is always all-zeros. To achieve this, fields in the data section are stored xor'd
with their defined default values. An all-zero pointer is considered "null"; accessor methods
for pointer fields check for null and return a pointer to their default value in this case.
There are several reasons why this is desirable:
* Cap'n Proto messages are often "packed" with a simple compression algorithm that deflates
zero-value bytes.
* Newly-allocated structs only need to be zero-initialized, which is fast and requires no knowledge
of the struct type except its size.
* If a newly-added field is placed in space that was previously padding, messages written by old
binaries that do not know about this field will still have its default value set correctly --
because it is always zero.
#### Zero-sized structs.
As stated above, a pointer whose bits are all zero is considered a null pointer, *not* a struct of
zero size. To encode a struct of zero size, set A, C, and D to zero, and set B (the offset) to -1.
**Historical explanation:** A null pointer is intended to be treated as equivalent to the field's
default value. Early on, it was thought that a zero-sized struct was a suitable synonym for
null, since interpreting an empty struct as any struct type results in a struct whose fields are
all default-valued. So, the pointer encoding was designed such that a zero-sized struct's pointer
would be all-zero, so that it could conveniently be overloaded to mean "null".
However, it turns out there are two important differences between a zero-sized struct and a null
pointer. First, applications often check for null explicitly when implementing optional fields.
Second, an empty struct is technically equivalent to the default value for the struct *type*,
whereas a null pointer is equivalent to the default value for the particular *field*. These are
not necessarily the same.
It therefore became necessary to find a different encoding for zero-sized structs. Since the
struct has zero size, the pointer's offset can validly point to any location so long as it is
in-bounds. Since an offset of -1 points to the beginning of the pointer itself, it is known to
be in-bounds. So, we use an offset of -1 when the struct has zero size.
### Lists ### Lists
A list value is encoded as a pointer to a flat array of values. A list value is encoded as a pointer to a flat array of values.
...@@ -174,23 +212,6 @@ unreasonable implementation burden.) Note that even though struct lists can be d ...@@ -174,23 +212,6 @@ unreasonable implementation burden.) Note that even though struct lists can be d
element size (except C = 1), it is NOT permitted to encode a struct list using any type other than element size (except C = 1), it is NOT permitted to encode a struct list using any type other than
C = 7 because doing so would interfere with the [canonicalization algorithm](#canonicalization). C = 7 because doing so would interfere with the [canonicalization algorithm](#canonicalization).
#### Default Values
A default struct is always all-zeros. To achieve this, fields in the data section are stored xor'd
with their defined default values. An all-zero pointer is considered "null" (such a pointer would
otherwise point to a zero-size struct, which might as well be considered null); accessor methods
for pointer fields check for null and return a pointer to their default value in this case.
There are several reasons why this is desirable:
* Cap'n Proto messages are often "packed" with a simple compression algorithm that deflates
zero-value bytes.
* Newly-allocated structs only need to be zero-initialized, which is fast and requires no knowledge
of the struct type except its size.
* If a newly-added field is placed in space that was previously padding, messages written by old
binaries that do not know about this field will still have its default value set correctly --
because it is always zero.
### Inter-Segment Pointers ### Inter-Segment Pointers
When a pointer needs to point to a different segment, offsets no longer work. We instead encode When a pointer needs to point to a different segment, offsets no longer work. We instead encode
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment