Commit b582a757 authored by Kenton Varda's avatar Kenton Varda

More doc updates and news post for v0.2 release.

parent b2f49c7d
......@@ -15,6 +15,9 @@
#include <capnp/serialize-packed.h>
#include <iostream>
using addressbook::Person;
using addressbook::AddressBook;
void writeAddressBook(int fd) {
::capnp::MallocMessageBuilder message;
......
@0x9eb32e19f86ee174;
using Cxx = import "/capnp/c++.capnp";
$Cxx.namespace("addressbook");
struct Person {
id @0 :UInt32;
name @1 :Text;
......
......@@ -6,7 +6,8 @@
<div id="footer_wrap" class="outer">
<footer class="inner">
<p class="copyright">Cap'n Proto maintained by <a href="https://github.com/kentonv">kentonv</a>
<span class="gplus-followbutton"><span class="g-follow" data-annotation="bubble" data-height="15" data-href="//plus.google.com/118187272963262049674" data-rel="author"></span></span></p>
<span class="gplus-followbutton"><span class="g-follow" data-annotation="bubble" data-height="15" data-href="//plus.google.com/118187272963262049674" data-rel="author"></span></span>
<a href="https://www.gittip.com/kentonv/"><img class="gittip15" src="{{ site.baseurl }}images/gittip15.png" alt="Gittip"></a></p>
<p>Design by <a href="http://www.starfruit-cafe.net/blog">sailorhg</a> ∙ Published with <a href="http://pages.github.com">GitHub Pages</a></p>
</footer>
</div>
......
---
layout: post
title: "Cap'n Proto v0.2: Compiler rewritten Haskell -> C++"
author: kentonv
---
Today I am releasing version 0.2 of Cap'n Proto. The most notable change: the compiler / code
generator, which was previously written in Haskell, has been rewritten in C++11. There are a few
other changes as well, but before I talk about those, let me try to calm the angry mob that is
not doubt reaching for their pitchforks as we speak. There are a few reasons for this change,
some practical, some ideological. I'll start with the practical.
**The practical: Supporting dynamic languages**
Say you are trying to implement Cap'n Proto in an interpreted language like Python. One of the big
draws of such a language is that you can edit your code and then run it without an intervening
compile step, allowing you to iterate faster. But if the Python Cap'n Proto implementation worked
like the C++ one (or like Protobufs), you lose some of that: whenever you change your Cap'n Proto
schema files, you must run a command to regenerate the Python code from them. That sucks.
What you really want to do is parse the schemas at start-up -- the same time that the Python code
itself is parsed. But writing a proper schema parser is harder than it looks; you really should
reuse the existing implementation. If it is written in Haskell, that's going to be problematic.
You either need to invoke the schema parser as a sub-process or you need to call Haskell code from
Python via an FFI. Either approach is going to be a huge hack with lots of problems, not the least
of which is having a runtime dependency on an entire platform that your end users may not otherwise
want.
But with the schema parser written in C++, things become much simpler. Python code calls into
C/C++ all the time. Everyone already has the necessary libraries installed. There's no need to
generate code, even; the parsed schema can be fed into the Cap'n Proto C++ runtime's dynamic API,
and Python bindings can trivially be implemented on top of that in just a few hundred lines of
code. Everyone wins.
**The ideological: I'm an object-oriented programmer**
I really wanted to like Haskell. I used to be a strong proponent of functional programming, and
I actually once wrote a complete web server and CMS in a purely-functional toy language of my own
creation. I love strong static typing, and I find a lot of the constructs in Haskell really
powerful and beautiful. Even monads. _Especially_ monads.
But when it comes down to it, I am an object-oriented programmer, and Haskell is not an
object-oriented language. Yes, you can do object-oriented style if you want to, just like you
can do objects in C. But it's just too painful. I want to write `object.methodName`, not
`ModuleName.objectTypeMethodName object`. I want to be able to write lots of small classes that
encapsulate complex functionality in simple interfaces -- _without_ having to place each one in
a whole separate module and ending up with thousands of source files. I want to be able to build
a list of objects of varying types that implement the same interface without having to re-invent
virtual tables every time I do it (type classes don't quite solve the problem).
And as it turns out, even aside from the lack of object-orientation, I don't acutally like
functional programming as much as I thought. Yes, writing my parser was super-easy (my first
commit message was
"[Day 1: Learn Haskell, write a parser](https://github.com/kentonv/capnproto/commit/6bb49ca775501a9b2c7306992fd0de53c5ee4e95)").
But everything beyond that seemed to require increasing amounts of brain bending. For instance, to
actually encode a Cap'n Proto message, I couldn't just allocate a buffer of zeros and then go
through each field and set its value. Instead, I had to compute all the field values first, sort
them by position, then concatenate the results.
Of course, I'm sure it's the case that if I spent years writing Haskell code, I'd eventually become
as proficient with it as I am with C++. Perhaps I could un-learn object-oriented style and learn
something else that works just as well or better. Basically, though, I decided that this was
going to take a lot longer that it at first appeared, and that this wasn't a good use of my
limited resources. So, I'm cutting my losses.
I still think Haskell is a very interesting language, and if works for you, by all means, use it.
I would love to see someone write at actual Cap'n Proto runtime implementation in Haskell. But
the compiler is now C++.
**Parser Combinators in C++**
A side effect (so to speak) of the compiler rewrite is that Cap'n Proto's companion utility
library, KJ, now includes a parser combinator framework based on C++11 templates and lambdas.
Here's a sample:
{% highlight c++ %}
// Construct a parser that parses a number.
auto number = transform(
sequence(
oneOrMore(charRange('0', '9')),
optional(sequence(
exactChar<'.'>(),
many(charRange('0', '9'))))),
[](Array<char> whole, Maybe<Array<char>> maybeFraction)
-> Number* {
KJ_IF_MAYBE(fraction, maybeFraction) {
return new RealNumber(whole, *fraction);
} else {
return new WholeNumber(whole);
}
});
{% endhighlight %}
An interesting fact about the above code is that constructing the parser itself does not allocate
anything on the heap. The variable `number` in this case ends up being one 96-byte flat object,
most of which is composed of tables for character matching. The whole thing could even be
declared `constexpr`... if the C++ standard allowed empty-capture lambdas to be `constexpr`, which
unfortunately it doesn't (yet).
Unforutnately, KJ is largely undocumented at the moment, since people who just want to use
Cap'n Proto generally don't need to know about it.
**Other New Features**
There are a couple other notable changes in this release, aside from the compiler:
* Cygwin has been added as a supported platform, meaning you can now use Cap'n Proto on Windows.
I am considering supporting MinGW as well. Unfortunately, MSVC is unlikely to be supported any
time soon as its C++11 support is
[woefully lacking](http://blogs.msdn.com/b/somasegar/archive/2013/06/28/cpp-conformance-roadmap.aspx).
* The new compiler binary -- now called `capnp` rather than `capnpc` -- is more of a multi-tool.
It includes the ability to decode binary messages to text as a debugging aid. Type
`capnp help decode` for more information.
* The new "Orphan" API lets you detach objects from a message tree and re-attach them elsewhere.
* Various contributors have declared their intentions to implement
[Ruby](https://github.com/cstrahan/capnp-ruby),
[Rust](https://github.com/dwrensha/capnproto-rust), C#, Java, Erlang, and Delphi bindings. These
are still works in progress, but exciting nonetheless!
**Backwards-compatibility Note**
Cap'n Proto v0.2 contains an obscure wire format incompatibility with v0.1. If you are using
unions containing multiple primitive-type fields of varying sizes, it's possible that the new
compiler will position those fields differently. A work-around to get back to the old layout
exists; if you believe you could be affected, please [send me](mailto:temporal@gmail.com) your
schema and I'll tell you what to do. [Gory details.](https://groups.google.com/d/msg/capnproto/NIYbD0haP38/pH5LildInwIJ)
**Road Map**
v0.3 will come in a couple weeks and will include several new features and clean-ups that can now
be implemented more easily given the new compiler. This will also hopefully be the first release
that officially supports a language other than C++.
The following release, v0.4, will hopefully be the first release implementing RPC.
_PS. If you are wondering, compared to the Haskell version, the new compiler is about 50% more
lines of code and about 4x faster. The speed increase should be taken with a grain of salt,
though, as my Haskell code did all kinds of horribly slow things. The code size is, I think, not
bad, considering that Haskell specializes in concision -- but, again, I'm sure a Haskell expert
could have written shorter code._
......@@ -231,13 +231,13 @@ MyStruct::Reader getMyStructField();
::capnp::List<double> getMyListField();
{% endhighlight %}
`Foo::Builder`, meanwhile, has two or three methods for each field `bar`:
`Foo::Builder`, meanwhile, has several methods for each field `bar`:
* `getBar()`: For primitives, returns the value. For composites, returns a Builder for the
composite. If a composite field has not been initialized (i.e. this is the first time it has
been accessed), it will be initialized to a copy of the field's default value before returning.
* `setBar(x)`: For primitives, sets the value to X. For composites, sets the value to a copy of
x, which must be a Reader for the type.
* `setBar(x)`: For primitives, sets the value to x. For composites, sets the value to a deep copy
of x, which must be a Reader for the type.
* `initBar(n)`: Only for lists and blobs. Sets the field to a newly-allocated list or blob
of size n and returns a Builder for it. The elements of the list are initialized to their empty
state (zero for numbers, default values for structs).
......@@ -245,6 +245,12 @@ MyStruct::Reader getMyStructField();
Builder for it. Note that the newly-allocated struct is initialized to the default value for
the struct's _type_ (i.e., all-zero) rather than the default value for the field `bar` (if it
has one).
* `hasBar()`: Only for pointer fields (e.g. structs, lists, blobs). Returns true if the pointer
has been initialized (non-null). (This method is also available on readers.)
* `adoptBar(x)`: Only for pointer fields. Adopts the orphaned object x, linking it into the field
`bar` without copying. See the section on orphans.
* `disownBar()`: Disowns the value pointed to by `bar`, setting the pointer to null and returning
its previous value as an orphan. See the section on orphans.
{% highlight c++ %}
// Example Builder methods:
......@@ -556,6 +562,45 @@ Notes about the dynamic API:
dynamic API or the schema API, you do not even need to link their implementations into your
executable.
## Orphans
An "orphan" is a Cap'n Proto object that is disconnected from the mesasge structure. That is,
it is not the root of a message, and there is no other Cap'n Proto object holding a pointer to it.
Thus, it has no parents. Orphans are an advanced feature that can help avoid copies and make it
easier to use Cap'n Proto objects as part of your application's internal state. Typical
applications probably won't use orphans.
The class `capnp::Orphan<T>` (defined in `<capnp/orphan.h>`) represents a pointer to an orphaned
object of type `T`. `T` can be any struct type, `List<T>`, `Text`, or `Data`. E.g.
`capnp::Orphan<Person>` would be an orphaned `Person` structure. `Orphan<T>` is a move-only class,
similar to `std::unique_ptr<T>`. This prevents two different objects from adopting the same
orphan, which would result in an invalid message.
An orphan can be "adopted" by another object to link it into the message structure. Conversely,
an object can "disown" one of its pointers, causing the pointed-to object to become an orphan.
Every pointer-typed field `foo` provides builder methods `adoptFoo()` and `disownFoo()` for these
purposes. Again, these methods use C++11 move semantics. To use them, you will need to be
familiar with `std::move()` (or the equivalent but shorter-named `kj::mv()`).
Even though an orphan is unlinked from the message tree, it still resides inside memory allocated
for a particular message (i.e. a particular `MessageBuilder`). An orphan can only be adopted by
objects that live in the same message. To move objects between messages, you must perform a copy.
If the message is serialized while an `Orphan<T>` living within it still exists, the orphan's
content will be part of the serialized message, but the only way the receiver could find it is by
investigating the raw message; the Cap'n Proto API provides no way to detect or read it.
To construct an orphan from scratch (without having some other object disown it), you need an
`Orphanage`, which is essentially an orphan factory associated with some message. You can get one
by calling the `MessageBuilder`'s `getOrphanage()` method, or by calling the static method
`Orphanage::getForMessageContaining(builder)` and passing it any struct or list builder.
Note that when an `Orphan<T>` goes out-of-scope without being adopted, the underlying memory that
it occupied is overwritten with zeros. If you use packed serialization, these zeros will take very
little bandwidth on the wire, but will still waste memory on the sending and receiving ends.
Generally, you should avoid allocating message objects that won't be used, or if you cannot avoid
it, arrange to copy the entire message over to a new `MessageBuilder` before serializing, since
only the reachable objects will be copied.
## Reference
The runtime library contains lots of useful features not described on this page. For now, the
......
......@@ -47,6 +47,9 @@ need to set the environment variable `CXX=g++-4.7` before following the instruct
If you are using Clang, you must use at least version 3.2. To use Clang, set the environment
variable `CXX=clang++` before following any instructions below, otherwise `g++` is used by default.
This package is officially tested on Linux (GCC 4.7, Clang 3.2), Mac OSX (Clang 3.2), and Cygwin
(Windows; GCC 4.7), in 32-bit and 64-bit modes.
##### Clang 3.2 on Mac OSX
As of this writing, Mac OSX 10.8 with Xcode 4.6 command-line tools is not quite good enough to
......@@ -56,8 +59,8 @@ between versions 3.1 and 3.2; it is not sufficient to build Cap'n Proto.
There are two options:
1. Use [Macports](http://www.macports.org/) or [Fink](http://www.finkproject.org/) to get an
up-to-date GCC.
1. Use [Macports](http://www.macports.org/), [Fink](http://www.finkproject.org/), or
[Homebrew](http://brew.sh/) to get an up-to-date GCC.
2. Obtain Clang 3.2
[directly from the LLVM project](http://llvm.org/releases/download.html). (Unfortunately,
Clang 3.3 apparently does NOT work, because the libc++ headers shipped with XCode contain
......@@ -66,7 +69,7 @@ There are two options:
Option 2 is the one preferred by Cap'n Proto's developers. Here are step-by-step instructions
for setting this up:
1. Get the Xcode command-line tools. Download Xcode from the app store. Then, open Xcode,
1. Get the Xcode command-line tools: Download Xcode from the app store. Then, open Xcode,
go to Xcode menu > Preferences > Downloads, and choose to install "Command Line Tools".
2. Download the Clang 3.2 binaries and put them somewhere easy to remember:
......
......@@ -15,6 +15,7 @@ href="https://groups.google.com/group/capnproto-announce">Stay Updated</a></div>
<a href="https://github.com/{{ post.author }}">{{ post.author }}</a>
{% if post.author == 'kentonv' %}
<span class="gplus-followbutton"><span class="g-follow" data-annotation="none" data-height="15" data-href="//plus.google.com/118187272963262049674" data-rel="author"></span></span>
<a href="https://www.gittip.com/kentonv/"><img class="gittip" src="{{ site.baseurl }}images/gittip.png" alt="Gittip"></a>
{% endif %}
on <span class="date">{{ post.date | date_to_string }}</span>
</p>
......
......@@ -20,6 +20,7 @@ maintained by respective authors and have not been reviewed by me
* [Python (via C extensions)](https://github.com/jparyani/capnpc-python-cpp) by
[@jparyani](https://github.com/jparyani)
* [Rust](https://github.com/dwrensha/capnproto-rust) by [@dwrensha](https://github.com/dwrensha)
* [Ruby](https://github.com/cstrahan/capnp-ruby) by [@cstrahan](https://github.com/cstrahan)
## Contribute Your Own!
......
......@@ -154,6 +154,16 @@ img {
max-width: 739px;
}
img.gittip {
width: 51px;
height: 10px;
}
img.gittip15 {
width: 77px;
height: 15px;
}
pre, code {
width: 100%;
color: #222;
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment