Fix two subtle bugs with embargos, and improve docs.

Bug 1 ----- If a Resolve message indicated that the promise had been rejected, the code would see the error cap as a local cap and erroneously believe that the promise had resolved back to a local capability, thereby requiring a Disembargo to be sent. The peer, on receiving the nonsensical Disembargo, would throw an exception and close the connection. The error message seen was: "expected target->getBrand() == this; 'Disembargo' of type 'senderLoopback' sent to an object that does not point back to the sender." Bug 2 ----- Disembargos are sent not only in response to Resolves, but also Returns, since capabilities in a returned message were previously accessible as PromisedAnswers. This means that we must apply the same rule that states that once a promise has been resolved, the promise must from then on forward all messages it receives strictly to the object to which it resolved, even if that object is itself a promise which later resolves to somewhere else. The code which sends Resolve messages was doing this correctly, but the code sending Return messages was not. They now both operate correctly. I've also added more explanation to the documentation in rpc.capnp. The error message seen was: "expected redirect == nullptr; 'Disembargo' of type 'senderLoopback' sent to an object that does not appear to have been the object of a previous 'Resolve' message."

Fix two subtle bugs with embargos, and improve docs.
Bug 1 ----- If a Resolve message indicated that the promise had been rejected, the code would see the error cap as a local cap and erroneously believe that the promise had resolved back to a local capability, thereby requiring a Disembargo to be sent. The peer, on receiving the nonsensical Disembargo, would throw an exception and close the connection. The error message seen was: "expected target->getBrand() == this; 'Disembargo' of type 'senderLoopback' sent to an object that does not point back to the sender." Bug 2 ----- Disembargos are sent not only in response to Resolves, but also Returns, since capabilities in a returned message were previously accessible as PromisedAnswers. This means that we must apply the same rule that states that once a promise has been resolved, the promise must from then on forward all messages it receives strictly to the object to which it resolved, even if that object is itself a promise which later resolves to somewhere else. The code which sends Resolve messages was doing this correctly, but the code sending Return messages was not. They now both operate correctly. I've also added more explanation to the documentation in rpc.capnp. The error message seen was: "expected redirect == nullptr; 'Disembargo' of type 'senderLoopback' sent to an object that does not appear to have been the object of a previous 'Resolve' message."
1cdcc24b · Kenton Varda · c5d67deb · 1cdcc24b · 1cdcc24b · 1cdcc24b
Commit 1cdcc24b authored Aug 19, 2014 by Kenton Varda
Hide whitespace changes
Inline Side-by-side

Showing with 155 additions and 6 deletions

rpc-test.c++ c++/src/capnp/rpc-test.c++ +97 -0

rpc.c++ c++/src/capnp/rpc.c++ +26 -6

rpc.capnp c++/src/capnp/rpc.capnp +32 -0

No files found.
--- a/c++/src/capnp/rpc-test.c++
+++ b/c++/src/capnp/rpc-test.c++
@@ -821,6 +821,103 @@ TEST(Rpc, Embargo) {
  EXPECT_EQ(5, call5.wait(context.waitScope).getN());
 }

+TEST(Rpc, EmbargoError) {
+  TestContext context;
+
+  auto client = context.connect(test::TestSturdyRefObjectId::Tag::TEST_MORE_STUFF)
+      .castAs<test::TestMoreStuff>();
+
+  auto paf = kj::newPromiseAndFulfiller<test::TestCallOrder::Client>();
+
+  auto cap = test::TestCallOrder::Client(kj::mv(paf.promise));
+
+  auto earlyCall = client.getCallSequenceRequest().send();
+
+  auto echoRequest = client.echoRequest();
+  echoRequest.setCap(cap);
+  auto echo = echoRequest.send();
+
+  auto pipeline = echo.getCap();
+
+  auto call0 = getCallSequence(pipeline, 0);
+  auto call1 = getCallSequence(pipeline, 1);
+
+  earlyCall.wait(context.waitScope);
+
+  auto call2 = getCallSequence(pipeline, 2);
+
+  auto resolved = echo.wait(context.waitScope).getCap();
+
+  auto call3 = getCallSequence(pipeline, 3);
+  auto call4 = getCallSequence(pipeline, 4);
+  auto call5 = getCallSequence(pipeline, 5);
+
+  paf.fulfiller->rejectIfThrows([]() { KJ_FAIL_ASSERT("foo"); });
+
+  EXPECT_ANY_THROW(call0.wait(context.waitScope));
+  EXPECT_ANY_THROW(call1.wait(context.waitScope));
+  EXPECT_ANY_THROW(call2.wait(context.waitScope));
+  EXPECT_ANY_THROW(call3.wait(context.waitScope));
+  EXPECT_ANY_THROW(call4.wait(context.waitScope));
+  EXPECT_ANY_THROW(call5.wait(context.waitScope));
+
+  // Verify that we're still connected (there were no protocol errors).
+  getCallSequence(client, 1).wait(context.waitScope);
+}
+
+TEST(Rpc, CallBrokenPromise) {
+  // Tell the server to call back to a promise client, then resolve the promise to an error.
+
+  TestContext context;
+
+  auto client = context.connect(test::TestSturdyRefObjectId::Tag::TEST_MORE_STUFF)
+      .castAs<test::TestMoreStuff>();
+  auto paf = kj::newPromiseAndFulfiller<test::TestInterface::Client>();
+
+  {
+    auto req = client.holdRequest();
+    req.setCap(kj::mv(paf.promise));
+    req.send().wait(context.waitScope);
+  }
+
+  bool returned = false;
+  auto req = client.callHeldRequest().send()
+      .then([&](capnp::Response<test::TestMoreStuff::CallHeldResults>&&) {
+    returned = true;
+  }, [&](kj::Exception&& e) {
+    returned = true;
+    kj::throwRecoverableException(kj::mv(e));
+  }).eagerlyEvaluate(nullptr);
+
+  kj::evalLater([]() {}).wait(context.waitScope);
+  kj::evalLater([]() {}).wait(context.waitScope);
+  kj::evalLater([]() {}).wait(context.waitScope);
+  kj::evalLater([]() {}).wait(context.waitScope);
+  kj::evalLater([]() {}).wait(context.waitScope);
+  kj::evalLater([]() {}).wait(context.waitScope);
+  kj::evalLater([]() {}).wait(context.waitScope);
+  kj::evalLater([]() {}).wait(context.waitScope);
+
+  EXPECT_FALSE(returned);
+
+  paf.fulfiller->rejectIfThrows([]() { KJ_FAIL_ASSERT("foo"); });
+
+  EXPECT_ANY_THROW(req.wait(context.waitScope));
+  EXPECT_TRUE(returned);
+
+  kj::evalLater([]() {}).wait(context.waitScope);
+  kj::evalLater([]() {}).wait(context.waitScope);
+  kj::evalLater([]() {}).wait(context.waitScope);
+  kj::evalLater([]() {}).wait(context.waitScope);
+  kj::evalLater([]() {}).wait(context.waitScope);
+  kj::evalLater([]() {}).wait(context.waitScope);
+  kj::evalLater([]() {}).wait(context.waitScope);
+  kj::evalLater([]() {}).wait(context.waitScope);
+
+  // Verify that we're still connected (there were no protocol errors).
+  getCallSequence(client, 1).wait(context.waitScope);
+}
+
 }  // namespace
 }  // namespace _ (private)
 }  // namespace capnp
--- a/c++/src/capnp/rpc.c++
+++ b/c++/src/capnp/rpc.c++
@@ -1501,6 +1501,17 @@ private:
      auto capTable = message->getCapTable();
      auto exports = connectionState.writeDescriptors(capTable, payload);

+      // Capabilities that we are returning are subject to embargos. See `Disembargo` in rpc.capnp.
+      // As explained there, in order to deal with the Tribble 4-way race condition, we need to
+      // make sure that if we're returning any remote promises, that we ignore any subsequent
+      // resolution of those promises for the purpose of pipelined requests on this answer. Luckily,
+      // we can modify the cap table in-place.
+      for (auto& slot: capTable) {
+        KJ_IF_MAYBE(cap, slot) {
+          slot = connectionState.getInnermostClient(**cap);
+        }
+      }
+
      message->send();
      if (capTable.size() == 0) {
        return nullptr;
@@ -2208,6 +2219,7 @@ private:

  void handleResolve(const rpc::Resolve::Reader& resolve) {
    kj::Own<ClientHook> replacement;
+    kj::Maybe<kj::Exception> exception;

    // Extract the replacement capability.
    switch (resolve.which()) {
@@ -2220,7 +2232,10 @@ private:
        break;

      case rpc::Resolve::EXCEPTION:
-        replacement = newBrokenCap(toException(resolve.getException()));
+        // We can't set `replacement` to a new broken cap here because this will confuse
+        // PromiseClient::Resolve() into thinking that the remote promise resolved to a local
+        // capability and therefore a Disembargo is needed. We must actually reject the promise.
+        exception = toException(resolve.getException());
        break;

      default:
@@ -2231,7 +2246,11 @@ private:
    KJ_IF_MAYBE(import, imports.find(resolve.getPromiseId())) {
      KJ_IF_MAYBE(fulfiller, import->promiseFulfiller) {
        // OK, this is in fact an unfulfilled promise!
-        fulfiller->get()->fulfill(kj::mv(replacement));
+        KJ_IF_MAYBE(e, exception) {
+          fulfiller->get()->reject(kj::mv(*e));
+        } else {
+          fulfiller->get()->fulfill(kj::mv(replacement));
+        }
      } else if (import->importClient != nullptr) {
        // It appears this is a valid entry on the import table, but was not expected to be a
        // promise.
@@ -2314,13 +2333,14 @@ private:
          {
            auto redirect = downcasted.writeTarget(builder.initTarget());

-            // Disembargoes should only be sent to capabilities that were previously the object of
+            // Disembargoes should only be sent to capabilities that were previously the subject of
            // a `Resolve` message.  But `writeTarget` only ever returns non-null when called on
-            // a PromiseClient.  The code which sends `Resolve` should have replaced any promise
-            // with a direct node in order to solve the Tribble 4-way race condition.
+            // a PromiseClient.  The code which sends `Resolve` and `Return` should have replaced
+            // any promise with a direct node in order to solve the Tribble 4-way race condition.
+            // See the documentation of Disembargo in rpc.capnp for more.
            KJ_REQUIRE(redirect == nullptr,
                       "'Disembargo' of type 'senderLoopback' sent to an object that does not "
-                       "appear to have been the object of a previous 'Resolve' message.") {
+                       "appear to have been the subject of a previous 'Resolve' message.") {
              return;
            }
          }

--- a/c++/src/capnp/rpc.capnp
+++ b/c++/src/capnp/rpc.capnp
@@ -458,6 +458,10 @@ struct Resolve {
  # a follow-up `Resolve` will be sent regardless of this release.  The level 0 receiver will reply
  # with an `unimplemented` message.  The sender (of the `Resolve`) can respond to this as if the
  # receiver had immediately released any capability to which the promise resolved.
+  #
+  # When implementing promise resolution, it's important to understand how embargos work and the
+  # tricky case of the Tribble 4-way race condition. See the comments for the Disembargo message,
+  # below.

  promiseId @0 :ExportId;
  # The ID of the promise to be resolved.
@@ -544,6 +548,9 @@ struct Disembargo {
  # already pointed at), no embargo is needed, because the pipelined calls are delivered over the
  # same path as the later direct calls.
  #
+  # Keep in mind that promise resolution happens both in the form of Resolve messages as well as
+  # Return messages (which resolve PromisedAnswers). Embargos apply in both cases.
+  #
  # An alternative strategy for enforcing E-order over promise resolution could be for Vat A to
  # implement the embargo internally.  When Vat A is notified of promise resolution, it could
  # send a dummy no-op call to promise P and wait for it to complete.  Until that call completes,
@@ -552,6 +559,31 @@ struct Disembargo {
  # being delivered directly to from Vat A to Vat C.  The `Disembargo` message allows latency to be
  # reduced.  (In the two-party loopback case, the `Disembargo` message is just a more explicit way
  # of accomplishing the same thing as a no-op call, but isn't any faster.)
+  #
+  # *The Tribble 4-way Race Condition*
+  #
+  # Any implementation of promise resolution and embargos must be aware of what we call the
+  # "Tribble 4-way race condition", after Dean Tribble, who explained the problem in a lively
+  # Friam meeting.
+  #
+  # Embargos are designed to work in the case where a two-hop path is being shortened to one hop.
+  # But sometimes there are more hops. Imagine that Alice has a reference to a remote promise P1
+  # which eventually resolves to _another_ remote promise P2 (in a third vat), which _at the same
+  # time_ happens to resolve to Bob (in a fourth vat). In this case, we're shortening from a 3-hop
+  # path (with four parties) to a 1-hop path (Alice -> Bob).
+  #
+  # Extending the embargo/disembargo protocol to be able to shorted multiple hops at once seems
+  # difficult. Instead, we make a rule that prevents this case from coming up:
+  #
+  # One a promise P has been resolved to a remove object reference R, then all further messages
+  # received addressed to P will be forwarded strictly to R. Even if it turns out later that R is
+  # itself a promise, and has resolved to some other object Q, messages sent to P will still be
+  # forwarded to R, not directly to Q (R will of course further forward the messages to Q).
+  #
+  # This rule does not cause a significant performance burden because once P has resolved to R, it
+  # is expected that people sending messages to P will shortly start sending them to R instead and
+  # drop P. P is at end-of-life anyway, so it doesn't matter if it ignores chances to further
+  # optimize its path.

  target @0 :MessageTarget;
  # What is to be disembargoed.