client.md 51.1 KB
Newer Older
1 2
[中文版](../cn/client.md)

gejun's avatar
gejun committed
3 4
# Example

gejun's avatar
gejun committed
5
[client-side code](https://github.com/brpc/brpc/blob/master/example/echo_c++/client.cpp) of echo.
gejun's avatar
gejun committed
6 7 8 9

# Quick facts

- Channel.Init() is not thread-safe.
gejun's avatar
gejun committed
10 11 12
- Channel.CallMethod() is thread-safe and a Channel can be used by multiple threads simultaneously.
- Channel can be put on stack.
- Channel can be destructed just after sending asynchronous request.
gejun's avatar
gejun committed
13 14 15
- No class named brpc::Client.

# Channel
16
Client-side of RPC sends requests. It's called [Channel](https://github.com/brpc/brpc/blob/master/src/brpc/channel.h) rather than "Client" in brpc. A channel represents a communication line to one server or multiple servers, which can be used for calling services.
gejun's avatar
gejun committed
17

gejun's avatar
gejun committed
18
A Channel can be **shared by all threads** in the process. Yon don't need to create separate Channels for each thread, and you don't need to synchronize Channel.CallMethod with lock. However creation and destroying of Channel is **not** thread-safe,  make sure the channel is initialized and destroyed only by one thread.
gejun's avatar
gejun committed
19

gejun's avatar
gejun committed
20
Some RPC implementations have so-called "ClientManager", including configurations and resource management at the client-side, which is not needed by brpc. "thread-num", "connection-type" such parameters are either in brpc::ChannelOptions or global gflags. Advantages of doing so:
gejun's avatar
gejun committed
21

gejun's avatar
gejun committed
22 23
1. Convenience. You don't have to pass a "ClientManager" when the Channel is created, and you don't have to store the "ClientManager". Otherwise code has to pass "ClientManager" layer by layer, which is troublesome. gflags makes configurations of global behaviors easier.
2. Share resources. For example, servers and channels in brpc share background workers (of bthread).
gejun's avatar
gejun committed
24 25
3. Better management of Lifetime. Destructing a "ClientManager" is very error-prone, which is managed by brpc right now.

gejun's avatar
gejun committed
26
Like most classes, Channel must be **Init()**-ed before usage. Parameters take default values when `options` is NULL. If you want non-default values, code as follows:
gejun's avatar
gejun committed
27 28 29 30 31 32
```c++
brpc::ChannelOptions options;  // including default values
options.xxx = yyy;
...
channel.Init(..., &options);
```
gejun's avatar
gejun committed
33
Note that Channel neither modifies `options` nor accesses `options` after completion of Init(), thus options can be put on stack safely as in above code. Channel.options() gets options being used by the Channel.
gejun's avatar
gejun committed
34

gejun's avatar
gejun committed
35
Init() can connect one server or a cluster(multiple servers).
gejun's avatar
gejun committed
36

gejun's avatar
gejun committed
37
# Connect a server
gejun's avatar
gejun committed
38 39 40 41 42 43 44

```c++
// Take default values when options is NULL.
int Init(EndPoint server_addr_and_port, const ChannelOptions* options);
int Init(const char* server_addr_and_port, const ChannelOptions* options);
int Init(const char* server_addr, int port, const ChannelOptions* options);
```
gejun's avatar
gejun committed
45
The server connected by these Init() has fixed address genrally. The creation does not need NamingService or LoadBalancer, being relatively light-weight.  The address could be a hostname, but don't frequently create Channels connecting to a hostname, which requires a DNS lookup taking at most 10 seconds. (default timeout of DNS lookup). Reuse them.
gejun's avatar
gejun committed
46 47 48

Valid "server_addr_and_port":
- 127.0.0.1:80
gejun's avatar
gejun committed
49
- www.foo.com:8765
gejun's avatar
gejun committed
50 51
- localhost:9000

52
Invalid "server_addr_and_port":
gejun's avatar
gejun committed
53
- 127.0.0.1:90000     # too large port
gejun's avatar
gejun committed
54
- 10.39.2.300:8000   # invalid IP
gejun's avatar
gejun committed
55

gejun's avatar
gejun committed
56
# Connect a cluster
gejun's avatar
gejun committed
57 58 59 60 61 62

```c++
int Init(const char* naming_service_url,
         const char* load_balancer_name,
         const ChannelOptions* options);
```
63
Channels created by above Init() get server list from the NamingService specified by `naming_service_url` periodically or driven-by-events, and send request to one server chosen from the list according to the algorithm specified by `load_balancer_name` .
gejun's avatar
gejun committed
64

gejun's avatar
gejun committed
65
You **should not** create such channels ad-hocly each time before a RPC, because creation and destroying of such channels relate to many resources, say NamingService needs to be accessed once at creation otherwise server candidates are unknown. On the other hand, channels are able to be shared by multiple threads safely and has no need to be created frequently.
gejun's avatar
gejun committed
66

gejun's avatar
gejun committed
67
If `load_balancer_name` is NULL or empty, this Init() is just the one for connecting single server and `naming_service_url` should be "ip:port" or "host:port" of the server. Thus you can unify initialization of all channels with this Init(). For example, you can put values of `naming_service_url` and `load_balancer_name` in configuration file, and set `load_balancer_name` to empty for single server and a valid algorithm for a cluster.
gejun's avatar
gejun committed
68

gejun's avatar
gejun committed
69
## Naming Service
gejun's avatar
gejun committed
70

71
Naming service maps a name to a modifiable list of servers. It's positioned as follows at client-side:
gejun's avatar
gejun committed
72 73 74

![img](../images/ns.png)

gejun's avatar
gejun committed
75
With the help of naming service, the client remembers a name instead of every concrete server. When the servers are added or removed, only mapping in the naming service is changed, rather than telling every client that may access the cluster. This process is called "decoupling up and downstreams". Back to implementation details, the client does remember every server and will access NamingService periodically or be pushed with latest server list. The impl. has minimal impact on RPC latencies and very small pressure on the system providing naming service.
gejun's avatar
gejun committed
76 77

General form of `naming_service_url`  is "**protocol://service_name**".
gejun's avatar
gejun committed
78

gejun's avatar
gejun committed
79
### bns://\<bns-name\>
gejun's avatar
gejun committed
80

gejun's avatar
gejun committed
81
BNS is the most common naming service inside Baidu. In "bns://rdev.matrix.all", "bns" is protocol and "rdev.matrix.all" is service-name. A related gflag is -ns_access_interval: ![img](../images/ns_access_interval.png)
gejun's avatar
gejun committed
82

83
If the list in BNS is non-empty, but Channel says "no servers", the status bit of the machine in BNS is probably non-zero, which means the machine is unavailable and as a correspondence not added as server candidates of the Channel. Status bits can be checked by:
gejun's avatar
gejun committed
84 85 86

`get_instance_by_service [bns_node_name] -s`

gejun's avatar
gejun committed
87
### file://\<path\>
gejun's avatar
gejun committed
88

gejun's avatar
gejun committed
89
Servers are put in the file specified by `path`. In "file://conf/local_machine_list", "conf/local_machine_list" is the file and each line in the file is address of a server. brpc reloads the file when it's updated.
gejun's avatar
gejun committed
90

gejun's avatar
gejun committed
91
### list://\<addr1\>,\<addr2\>...
gejun's avatar
gejun committed
92

gejun's avatar
gejun committed
93
Servers are directly written after list://, separated by comma. For example: "list://db-bce-81-3-186.db01:7000,m1-bce-44-67-72.m1:7000,cp01-rd-cos-006.cp01:7000" has 3 addresses.
gejun's avatar
gejun committed
94

gejun's avatar
gejun committed
95
### http://\<url\>
gejun's avatar
gejun committed
96

gejun's avatar
gejun committed
97
Connect all servers under the domain, for example: http://www.baidu.com:80. Note: although Init() for connecting single server(2 parameters) accepts hostname as well, it only connects one server under the domain.
gejun's avatar
gejun committed
98

gejun's avatar
gejun committed
99
### Naming Service Filter
gejun's avatar
gejun committed
100

gejun's avatar
gejun committed
101
Users can filter servers got from the NamingService before pushing to LoadBalancer.
gejun's avatar
gejun committed
102 103 104

![img](../images/ns_filter.jpg)

105
Interface of the filter:
gejun's avatar
gejun committed
106
```c++
gejun's avatar
gejun committed
107 108 109 110 111 112 113
// naming_service_filter.h
class NamingServiceFilter {
public:
    // Return true to take this `server' as a candidate to issue RPC
    // Return false to filter it out
    virtual bool Accept(const ServerNode& server) const = 0;
};
114

gejun's avatar
gejun committed
115 116 117 118 119 120
// naming_service.h
struct ServerNode {
    butil::EndPoint addr;
    std::string tag;
};
```
gejun's avatar
gejun committed
121 122
The most common usage is filtering by server tags.

gejun's avatar
gejun committed
123
Customized filter is set to ChannelOptions to take effects. NULL by default means not filter.
gejun's avatar
gejun committed
124 125

```c++
gejun's avatar
gejun committed
126 127 128 129 130 131
class MyNamingServiceFilter : public brpc::NamingServiceFilter {
public:
    bool Accept(const brpc::ServerNode& server) const {
        return server.tag == "main";
    }
};
132

gejun's avatar
gejun committed
133 134 135 136 137 138 139 140 141 142
int main() {
    ...
    MyNamingServiceFilter my_filter;
    ...
    brpc::ChannelOptions options;
    options.ns_filter = &my_filter;
    ...
}
```

gejun's avatar
gejun committed
143
## Load Balancer
gejun's avatar
gejun committed
144

gejun's avatar
gejun committed
145
When there're more than one server to access, we need to divide the traffic. The process is called load balancing, which is positioned as follows at client-side.
gejun's avatar
gejun committed
146 147 148

![img](../images/lb.png)

149
The ideal algorithm is to make every request being processed in-time, and crash of any server makes minimal impact. However clients are not able to know delays or congestions happened at servers in realtime, and load balancing algorithms should be light-weight generally, users need to choose proper algorithms for their use cases. Algorithms provided by brpc (specified by `load_balancer_name`):
gejun's avatar
gejun committed
150 151 152

### rr

153
which is round robin. Always choose next server inside the list, next of the last server is the first one. No other settings. For example there're 3 servers: a,b,c, brpc will send requests to a, b, c, a, b, c, … and so on. Note that presumption of using this algorithm is the machine specs, network latencies, server loads are similar.
gejun's avatar
gejun committed
154 155 156

### random

gejun's avatar
gejun committed
157
Randomly choose one server from the list, no other settings. Similarly with round robin, the algorithm assumes that servers to access are similar.
gejun's avatar
gejun committed
158 159 160

### la

gejun's avatar
gejun committed
161
which is locality-aware. Perfer servers with lower latencies, until the latency is higher than others, no other settings. Check out [Locality-aware load balancing](lalb.md) for more details.
gejun's avatar
gejun committed
162 163 164

### c_murmurhash or c_md5

gejun's avatar
gejun committed
165
which is consistent hashing. Adding or removing servers does not make destinations of requests change as dramatically as in simple hashing. It's especially suitable for caching services.
gejun's avatar
gejun committed
166

gejun's avatar
gejun committed
167
Need to set Controller.set_request_code() before RPC otherwise the RPC will fail. request_code is often a 32-bit hash code of "key part" of the request, and the hashing algorithm does not need to be same with the one used by load balancer. Say `c_murmurhash`  can use md5 to compute request_code of the request as well.
gejun's avatar
gejun committed
168

169
[src/brpc/policy/hasher.h](https://github.com/brpc/brpc/blob/master/src/brpc/policy/hasher.h) includes common hash functions. If `std::string key` stands for key part of the request, controller.set_request_code(brpc::policy::MurmurHash32(key.data(), key.size())) sets request_code correctly.
gejun's avatar
gejun committed
170

gejun's avatar
gejun committed
171
Do distinguish "key" and "attributes" of the request. Don't compute request_code by full content of the request just for quick. Minor change in attributes may result in totally different hash code and change destination dramatically. Another cause is padding, for example: `struct Foo { int32_t a; int64_t b; }` has a 4-byte undefined gap between `a` and `b` on 64-bit machines, result of `hash(&foo, sizeof(foo))` is undefined. Fields need to be packed or serialized before hashing.
gejun's avatar
gejun committed
172

173
Check out [Consistent Hashing](consistent_hashing.md) for more details.
gejun's avatar
gejun committed
174

gejun's avatar
gejun committed
175
## Health checking
gejun's avatar
gejun committed
176

gejun's avatar
gejun committed
177
Servers whose connections are lost are isolated temporarily to prevent them from being selected by LoadBalancer. brpc connects isolated servers periodically to test if they're healthy again. The interval is controlled by gflag -health_check_interval:
gejun's avatar
gejun committed
178 179 180

| Name                      | Value | Description                              | Defined At              |
| ------------------------- | ----- | ---------------------------------------- | ----------------------- |
181
| health_check_interval (R) | 3     | seconds between consecutive health-checkings | src/brpc/socket_map.cpp |
gejun's avatar
gejun committed
182

gejun's avatar
gejun committed
183
Once a server is connected, it resumes as a server candidate inside LoadBalancer. If a server is removed from NamingService during health-checking, brpc removes it from health-checking as well.
gejun's avatar
gejun committed
184

gejun's avatar
gejun committed
185
# Launch RPC
gejun's avatar
gejun committed
186

187
Generally, we don't use Channel.CallMethod directly, instead we call XXX_Stub generated by protobuf, which feels more like a "method call". The stub has few member fields, being suitable(and recommended) to be put on stack instead of new(). Surely the stub can be saved and re-used as well. Channel.CallMethod and stub are both **thread-safe** and accessible by multiple threads simultaneously. For example:
gejun's avatar
gejun committed
188 189 190 191
```c++
XXX_Stub stub(&channel);
stub.some_method(controller, request, response, done);
```
gejun's avatar
gejun committed
192
Or even:
gejun's avatar
gejun committed
193 194 195
```c++
XXX_Stub(&channel).some_method(controller, request, response, done);
```
196
A exception is http client, which is not related to protobuf much. Call CallMethod directly to make a http call, setting all parameters to NULL except for `Controller` and `done`, check [Access HTTP](http_client.md) for details.
gejun's avatar
gejun committed
197

gejun's avatar
gejun committed
198
## Synchronous call
gejun's avatar
gejun committed
199

200
CallMethod blocks until response from server is received or error occurred (including timedout).
gejun's avatar
gejun committed
201

gejun's avatar
gejun committed
202
response/controller in synchronous call will not be used by brpc again after CallMethod, they can be put on stack safely. Note: if request/response has many fields and being large on size, they'd better be allocated on heap.
gejun's avatar
gejun committed
203 204 205 206 207
```c++
MyRequest request;
MyResponse response;
brpc::Controller cntl;
XXX_Stub stub(&channel);
208

gejun's avatar
gejun committed
209 210 211 212
request.set_foo(...);
cntl.set_timeout_ms(...);
stub.some_method(&cntl, &request, &response, NULL);
if (cntl->Failed()) {
213
    // RPC failed. fields in response are undefined, don't use.
gejun's avatar
gejun committed
214
} else {
215
    // RPC succeeded, response has what we want.
gejun's avatar
gejun committed
216 217 218
}
```

gejun's avatar
gejun committed
219
## Asynchronous call
gejun's avatar
gejun committed
220

221
Pass a callback `done` to CallMethod, which resumes after sending request, rather than completion of RPC. When the response from server is received  or error occurred(including timedout), done->Run() is called. Post-processing code of the RPC should be put in done->Run() instead of after CallMethod.
gejun's avatar
gejun committed
222

gejun's avatar
gejun committed
223
Because end of CallMethod does not mean completion of RPC, response/controller may still be used by brpc or done->Run(). Generally they should be allocated on heap and deleted in done->Run(). If they're deleted too early, done->Run() may access invalid memory.
gejun's avatar
gejun committed
224

225
You can new these objects individually and create done by [NewCallback](#use-newcallback), or make response/controller be member of done and [new them together](#Inherit-google::protobuf::Closure). Former one is recommended.
gejun's avatar
gejun committed
226

227
**Request and Channel can be destroyed immediately after asynchronous CallMethod**, which is different from response/controller. Note that "immediately" means destruction of request/Channel can happen **after** CallMethod, not during CallMethod. Deleting a Channel just being used by another thread results in undefined behavior (crash at best).
gejun's avatar
gejun committed
228

gejun's avatar
gejun committed
229
### Use NewCallback
gejun's avatar
gejun committed
230 231
```c++
static void OnRPCDone(MyResponse* response, brpc::Controller* cntl) {
232
    // unique_ptr helps us to delete response/cntl automatically. unique_ptr in gcc 3.4 is an emulated version.
gejun's avatar
gejun committed
233 234 235
    std::unique_ptr<MyResponse> response_guard(response);
    std::unique_ptr<brpc::Controller> cntl_guard(cntl);
    if (cntl->Failed()) {
236
        // RPC failed. fields in response are undefined, don't use.
gejun's avatar
gejun committed
237
    } else {
gejun's avatar
gejun committed
238
        // RPC succeeded, response has what we want. Continue the post-processing.
gejun's avatar
gejun committed
239
    }
240
    // Closure created by NewCallback deletes itself at the end of Run.
gejun's avatar
gejun committed
241
}
242

gejun's avatar
gejun committed
243 244 245
MyResponse* response = new MyResponse;
brpc::Controller* cntl = new brpc::Controller;
MyService_Stub stub(&channel);
246

gejun's avatar
gejun committed
247 248 249 250 251
MyRequest request;  // you don't have to new request, even in an asynchronous call.
request.set_foo(...);
cntl->set_timeout_ms(...);
stub.some_method(cntl, &request, response, google::protobuf::NewCallback(OnRPCDone, response, cntl));
```
252
Since protobuf 3 changes NewCallback to private, brpc puts NewCallback in [src/brpc/callback.h](https://github.com/brpc/brpc/blob/master/src/brpc/callback.h) after r32035 (and adds more overloads). If your program has compilation issues with NewCallback, replace google::protobuf::NewCallback with brpc::NewCallback.
gejun's avatar
gejun committed
253

gejun's avatar
gejun committed
254
### Inherit google::protobuf::Closure
gejun's avatar
gejun committed
255

gejun's avatar
gejun committed
256
Drawback of using NewCallback is that you have to allocate memory on heap at least 3 times: response, controller, done. If profiler shows that the memory allocation is a hotspot, you can consider inheriting Closure by your own, and enclose response/controller as member fields. Doing so combines 3 new into one, but the code will be worse to read. Don't do this if memory allocation is not an issue.
gejun's avatar
gejun committed
257 258 259 260
```c++
class OnRPCDone: public google::protobuf::Closure {
public:
    void Run() {
gejun's avatar
gejun committed
261
        // unique_ptr helps us to delete response/cntl automatically. unique_ptr in gcc 3.4 is an emulated version.
gejun's avatar
gejun committed
262
        std::unique_ptr<OnRPCDone> self_guard(this);
263

gejun's avatar
gejun committed
264
        if (cntl->Failed()) {
gejun's avatar
gejun committed
265
            // RPC failed. fields in response are undefined, don't use.
gejun's avatar
gejun committed
266
        } else {
gejun's avatar
gejun committed
267
            // RPC succeeded, response has what we want. Continue the post-processing.
gejun's avatar
gejun committed
268 269
        }
    }
270

gejun's avatar
gejun committed
271 272 273
    MyResponse response;
    brpc::Controller cntl;
}
274

gejun's avatar
gejun committed
275 276
OnRPCDone* done = new OnRPCDone;
MyService_Stub stub(&channel);
277

gejun's avatar
gejun committed
278 279 280 281 282 283
MyRequest request;  // you don't have to new request, even in an asynchronous call.
request.set_foo(...);
done->cntl.set_timeout_ms(...);
stub.some_method(&done->cntl, &request, &done->response, done);
```

gejun's avatar
gejun committed
284
### What will happen when the callback is very complicated?
gejun's avatar
gejun committed
285

gejun's avatar
gejun committed
286
No special impact, the callback will run in separate bthread, without blocking other sessions. You can do all sorts of things in the callback.
gejun's avatar
gejun committed
287

gejun's avatar
gejun committed
288
### Does the callback run in the same thread that CallMethod runs?
gejun's avatar
gejun committed
289

gejun's avatar
gejun committed
290
The callback runs in a different bthread, even the RPC fails just after entering CallMethod. This avoids deadlock when the RPC is ongoing inside a lock(not recommended).
gejun's avatar
gejun committed
291

gejun's avatar
gejun committed
292
## Wait for completion of RPC
gejun's avatar
gejun committed
293
NOTE: [ParallelChannel](combo_channel.md#parallelchannel) is probably more convenient to  launch multiple RPCs in parallel.
gejun's avatar
gejun committed
294

gejun's avatar
gejun committed
295
Following code starts 2 asynchronous RPC and waits them to complete.
gejun's avatar
gejun committed
296 297 298 299 300 301 302 303 304 305
```c++
const brpc::CallId cid1 = controller1->call_id();
const brpc::CallId cid2 = controller2->call_id();
...
stub.method1(controller1, request1, response1, done1);
stub.method2(controller2, request2, response2, done2);
...
brpc::Join(cid1);
brpc::Join(cid2);
```
306
Call `Controller.call_id()` to get an id **before launching RPC**, join the id after the RPC.
gejun's avatar
gejun committed
307

308
Join() blocks until completion of RPC **and end of done->Run()**,  properties of Join:
gejun's avatar
gejun committed
309

gejun's avatar
gejun committed
310
- If the RPC is complete, Join() returns immediately.
311 312
- Multiple threads can Join() one id, all of them will be woken up.
- Synchronous RPC can be Join()-ed in another thread, although we rarely do this.
gejun's avatar
gejun committed
313

314
Join() was called JoinResponse() before, if you meet deprecated issues during compilation, rename to Join().
gejun's avatar
gejun committed
315

316
Calling `Join(controller->call_id())` after completion of RPC is **wrong**, do save call_id before RPC, otherwise the controller may be deleted by done at any time. The Join in following code is **wrong**.
gejun's avatar
gejun committed
317 318 319 320 321 322 323

```c++
static void on_rpc_done(Controller* controller, MyResponse* response) {
    ... Handle response ...
    delete controller;
    delete response;
}
324

gejun's avatar
gejun committed
325 326 327 328 329 330 331 332
Controller* controller1 = new Controller;
Controller* controller2 = new Controller;
MyResponse* response1 = new MyResponse;
MyResponse* response2 = new MyResponse;
...
stub.method1(controller1, &request1, response1, google::protobuf::NewCallback(on_rpc_done, controller1, response1));
stub.method2(controller2, &request2, response2, google::protobuf::NewCallback(on_rpc_done, controller2, response2));
...
gejun's avatar
gejun committed
333 334
brpc::Join(controller1->call_id());   // WRONG, controller1 may be deleted by on_rpc_done
brpc::Join(controller2->call_id());   // WRONG, controller2 may be deleted by on_rpc_done
gejun's avatar
gejun committed
335 336
```

gejun's avatar
gejun committed
337
## Semi-synchronous call
gejun's avatar
gejun committed
338

339
Join can be used for implementing "Semi-synchronous" call: blocks until multiple asynchronous calls to complete. Since the callsite blocks until completion of all RPC, controller/response can be put on stack safely.
gejun's avatar
gejun committed
340 341 342 343 344 345 346 347 348 349 350 351
```c++
brpc::Controller cntl1;
brpc::Controller cntl2;
MyResponse response1;
MyResponse response2;
...
stub1.method1(&cntl1, &request1, &response1, brpc::DoNothing());
stub2.method2(&cntl2, &request2, &response2, brpc::DoNothing());
...
brpc::Join(cntl1.call_id());
brpc::Join(cntl2.call_id());
```
352
brpc::DoNothing() gets a closure doing nothing, specifically for semi-synchronous calls. Its lifetime is managed by brpc.
gejun's avatar
gejun committed
353

gejun's avatar
gejun committed
354
Note that in above example, we access `controller.call_id()` after completion of RPC, which is safe right here, because DoNothing does not delete controller as in `on_rpc_done` in previous example.
gejun's avatar
gejun committed
355

gejun's avatar
gejun committed
356
## Cancel RPC
gejun's avatar
gejun committed
357

358
`brpc::StartCancel(call_id)` cancels corresponding RPC, call_id must be got from Controller.call_id() **before launching RPC**, race conditions may occur at any other time.
gejun's avatar
gejun committed
359

360
NOTE: it is `brpc::StartCancel(call_id)`, not `controller->StartCancel()`, which is forbidden and useless. The latter one is provided by protobuf by default and has serious race conditions on lifetime of controller.
gejun's avatar
gejun committed
361

gejun's avatar
gejun committed
362
As the name implies, RPC may not complete yet after calling StartCancel, you should not touch any field in Controller or delete any associated resources, they should be handled inside done->Run(). If you have to wait for completion of RPC in-place(not recommended), call Join(call_id).
gejun's avatar
gejun committed
363

364
Facts about StartCancel:
gejun's avatar
gejun committed
365

366 367 368 369
- call_id can be cancelled before CallMethod, the RPC will end immediately(and done will be called).
- call_id can be cancelled in another thread.
- Cancel an already-cancelled call_id has no effect. Inference: One call_id can be cancelled by multiple threads simultaneously, but only one of them takes effect.
- Cancel here is a client-only feature, **the server-side may not cancel the operation necessarily**, server cancelation is a separate feature.
gejun's avatar
gejun committed
370

gejun's avatar
gejun committed
371
## Get server-side address and port
gejun's avatar
gejun committed
372

gejun's avatar
gejun committed
373
remote_side() tells where request was sent to, the return type is [butil::EndPoint](https://github.com/brpc/brpc/blob/master/src/butil/endpoint.h), which includes an ipv4 address and port. Calling this method before completion of RPC is undefined.
gejun's avatar
gejun committed
374

375
How to print:
gejun's avatar
gejun committed
376 377 378 379
```c++
LOG(INFO) << "remote_side=" << cntl->remote_side();
printf("remote_side=%s\n", butil::endpoint2str(cntl->remote_side()).c_str());
```
gejun's avatar
gejun committed
380
## Get client-side address and port
gejun's avatar
gejun committed
381

gejun's avatar
gejun committed
382
local_side() gets address and port of the client-side sending RPC after r31384
gejun's avatar
gejun committed
383

384
How to print:
gejun's avatar
gejun committed
385
```c++
386
LOG(INFO) << "local_side=" << cntl->local_side();
gejun's avatar
gejun committed
387 388
printf("local_side=%s\n", butil::endpoint2str(cntl->local_side()).c_str());
```
gejun's avatar
gejun committed
389
## Should brpc::Controller be reused?
gejun's avatar
gejun committed
390

391
Not necessary to reuse deliberately.
gejun's avatar
gejun committed
392

393
Controller has miscellaneous fields, some of them are buffers that can be re-used by calling Reset().
gejun's avatar
gejun committed
394 395

In most use cases, constructing a Controller(snippet1) and re-using a Controller(snippet2) perform similarily.
gejun's avatar
gejun committed
396 397 398 399 400 401 402
```c++
// snippet1
for (int i = 0; i < n; ++i) {
    brpc::Controller controller;
    ...
    stub.CallSomething(..., &controller);
}
403

gejun's avatar
gejun committed
404 405 406 407 408 409 410 411
// snippet2
brpc::Controller controller;
for (int i = 0; i < n; ++i) {
    controller.Reset();
    ...
    stub.CallSomething(..., &controller);
}
```
412
If the Controller in snippet1 is new-ed on heap, snippet1 has extra cost of "heap allcation" and may be a little slower in some cases.
gejun's avatar
gejun committed
413

gejun's avatar
gejun committed
414
# Settings
gejun's avatar
gejun committed
415

416
Client-side settings has 3 parts:
gejun's avatar
gejun committed
417

418 419 420
- brpc::ChannelOptions: defined in [src/brpc/channel.h](https://github.com/brpc/brpc/blob/master/src/brpc/channel.h), for initializing Channel, becoming immutable once the initialization is done.
- brpc::Controller: defined in [src/brpc/controller.h](https://github.com/brpc/brpc/blob/master/src/brpc/controller.h), for overriding fields in brpc::ChannelOptions for some RPC according to contexts.
- global gflags: for tuning global behaviors, being unchanged generally. Read comments in [/flags](flags.md) before setting.
gejun's avatar
gejun committed
421

422
Controller contains data and options that request may not have. server and client share the same Controller class, but they may set different fields. Read comments in Controller carefully before using.
gejun's avatar
gejun committed
423

gejun's avatar
gejun committed
424
A Controller corresponds to a RPC. A Controller can be re-used by another RPC after Reset(), but a Controller can't be used by multiple RPC simultaneously, no matter the RPCs are started from one thread or not.
gejun's avatar
gejun committed
425

426 427 428 429
Properties of Controller:
1.  A Controller can only have one user. Without explicit statement, methods in Controller are **not** thread-safe by default.
2.  Due to the fact that Controller is not shared generally, there's no need to manage Controller by shared_ptr. If you do, something might goes wrong.
3.  Controller is constructed before RPC and destructed after RPC, some common patterns:
gejun's avatar
gejun committed
430
   - Put Controller on stack before synchronous RPC, be destructed when out of scope. Note that Controller of asynchronous RPC **must not** be put on stack, otherwise the RPC may still run when the Controller is being destructed and result in undefined behavior.
431
   - new Controller before asynchronous RPC, delete in done.
gejun's avatar
gejun committed
432

433 434 435 436
## Number of worker pthreads

There's **no** independent thread pool for client in brpc. All Channels and Servers share the same backing threads via [bthread](bthread.md).  Setting number of worker pthreads in Server works for Client as well if Server is in used. Or just specify the [gflag](flags.md) [-bthread_concurrency](brpc.baidu.com:8765/flags/bthread_concurrency) to set the global number of worker pthreads.

gejun's avatar
gejun committed
437
## Timeout
gejun's avatar
gejun committed
438

439
**ChannelOptions.timeout_ms** is timeout in milliseconds for all RPCs via the Channel, Controller.set_timeout_ms() overrides value for one RPC. Default value is 1 second, Maximum value is 2^31 (about 24 days), -1 means wait indefinitely for response or connection error.
gejun's avatar
gejun committed
440

gejun's avatar
gejun committed
441
**ChannelOptions.connect_timeout_ms** is timeout in milliseconds for connecting part of all RPC via the Channel, Default value is 1 second, and -1 means no timeout for connecting. This value is limited to be never greater than timeout_ms. Note that this timeout is different from the connection timeout in TCP, generally this timeout is smaller otherwise establishment of the connection may fail before this timeout due to timeout in TCP layer.
gejun's avatar
gejun committed
442

gejun's avatar
gejun committed
443
NOTE1: timeout_ms in brpc is **deadline**, which means once it's reached, the RPC ends, no retries after the timeout. Other impl. may have session timeout and deadline timeout, do distinguish them before porting to brpc.
gejun's avatar
gejun committed
444 445

NOTE2: error code of RPC timeout is **ERPCTIMEDOUT (1008) **, ETIMEDOUT is connection timeout and retriable.
gejun's avatar
gejun committed
446

gejun's avatar
gejun committed
447
## Retry
gejun's avatar
gejun committed
448

449
ChannelOptions.max_retry is maximum retrying count for all RPC via the channel, Controller.set_max_retry() overrides value for one RPC. Default value is 3. 0 means no retries.
gejun's avatar
gejun committed
450

gejun's avatar
gejun committed
451
Controller.retried_count() returns number of retries.
gejun's avatar
gejun committed
452

gejun's avatar
gejun committed
453
Controller.has_backup_request() tells if backup_request was sent.
gejun's avatar
gejun committed
454

gejun's avatar
gejun committed
455
**servers tried before are not retried by best efforts**
gejun's avatar
gejun committed
456

457
Conditions for retrying (AND relations):
gejun's avatar
gejun committed
458
- Broken connection.
459 460 461
- Timeout is not reached.
- Has retrying quota. Controller.set_max_retry(0) or ChannelOptions.max_retry = 0 disables retries.
- The retry makes sense. If the RPC fails due to request(EREQUEST), no retry will be done because server is very likely to reject the request again, retrying makes no sense here.
gejun's avatar
gejun committed
462

gejun's avatar
gejun committed
463
### Broken connection
gejun's avatar
gejun committed
464

gejun's avatar
gejun committed
465
If the server does not respond and connection is good, retry is not triggered. If you need to send another request after some timeout, use backup request.
gejun's avatar
gejun committed
466

467
How it works: If response does not return within the timeout specified by backup_request_ms, send another request, take whatever the first returned. New request will be sent to a different server that never tried before by best efforts. NOTE: If backup_request_ms is greater than timeout_ms, backup request will never be sent. backup request consumes one retry. backup request does not imply a server-side cancellation.
gejun's avatar
gejun committed
468

469
ChannelOptions.backup_request_ms affects all RPC via the Channel, unit is milliseconds, Default value is -1(disabled), Controller.set_backup_request_ms() overrides value for one RPC.
gejun's avatar
gejun committed
470

gejun's avatar
gejun committed
471
### Timeout is not reached
gejun's avatar
gejun committed
472

473
RPC will be ended soon after the timeout.
gejun's avatar
gejun committed
474

gejun's avatar
gejun committed
475
### Has retrying quota
gejun's avatar
gejun committed
476

477
Controller.set_max_retry(0) or ChannelOptions.max_retry = 0 disables retries.
gejun's avatar
gejun committed
478

gejun's avatar
gejun committed
479
### The retry makes sense
gejun's avatar
gejun committed
480

481
If the RPC fails due to request(EREQUEST), no retry will be done because server is very likely to reject the request again, retrying makes no sense here.
gejun's avatar
gejun committed
482

483
Users can inherit [brpc::RetryPolicy](https://github.com/brpc/brpc/blob/master/src/brpc/retry_policy.h) to customize conditions of retrying. For example brpc does not retry for HTTP related errors by default. If you want to retry for HTTP_STATUS_FORBIDDEN(403) in your app, you can do as follows:
gejun's avatar
gejun committed
484 485 486

```c++
#include <brpc/retry_policy.h>
487

gejun's avatar
gejun committed
488 489 490
class MyRetryPolicy : public brpc::RetryPolicy {
public:
    bool DoRetry(const brpc::Controller* cntl) const {
gejun's avatar
gejun committed
491
        if (cntl->ErrorCode() == brpc::EHTTP && // HTTP error
gejun's avatar
gejun committed
492 493 494
            cntl->http_response().status_code() == brpc::HTTP_STATUS_FORBIDDEN) {
            return true;
        }
gejun's avatar
gejun committed
495
        // Leave other cases to brpc.
gejun's avatar
gejun committed
496 497 498 499
        return brpc::DefaultRetryPolicy()->DoRetry(cntl);
    }
};
...
500 501 502

// Assign the instance to ChannelOptions.retry_policy.
// NOTE: retry_policy must be kept valid during lifetime of Channel, and Channel does not retry_policy, so in most cases RetryPolicy should be created by singleton..
gejun's avatar
gejun committed
503 504 505 506 507 508
brpc::ChannelOptions options;
static MyRetryPolicy g_my_retry_policy;
options.retry_policy = &g_my_retry_policy;
...
```

gejun's avatar
gejun committed
509 510 511 512 513 514 515 516
Some tips:

- Get response of the RPC by cntl->response().
- RPC deadline represented by ERPCTIMEDOUT is never retried, even it's allowed by your derived RetryPolicy.

### Retrying should be conservative

Due to maintaining costs, even very large scale clusters are deployed with "just enough" instances to survive major defects, namely offline of one IDC, which is at most 1/2 of all machines. However aggressive retries may easily make pressures from all clients double or even tripple against servers, and make the whole cluster down: More and more requests stuck in buffers, because servers can't process them in-time. All requests have to wait for a very long time to be processed and finally gets timed out, as if the whole cluster is crashed. The default retrying policy is safe generally: unless the connection is broken, retries are rarely sent. However users are able to customize starting conditions for retries by inheriting RetryPolicy, which may turn retries to be "a storm". When you customized RetryPolicy, you need to carefully consider how clients and servers interact and design corresponding tests to verify that retries work as expected.
gejun's avatar
gejun committed
517

gejun's avatar
gejun committed
518
## Protocols
gejun's avatar
gejun committed
519

gejun's avatar
gejun committed
520
The default protocol used by Channel is baidu_std, which is changeable by setting ChannelOptions.protocol. The field accepts both enum and string.
gejun's avatar
gejun committed
521

gejun's avatar
gejun committed
522
 Supported protocols:
gejun's avatar
gejun committed
523

524 525 526 527
- PROTOCOL_BAIDU_STD or "baidu_std", which is [the standard binary protocol inside Baidu](baidu_std.md), using single connection by default.
- PROTOCOL_HULU_PBRPC or "hulu_pbrpc", which is protocol of hulu-pbrpc, using single connection by default.
- PROTOCOL_NOVA_PBRPC or "nova_pbrpc",  which is protocol of Baidu ads union, using pooled connection by default.
- PROTOCOL_HTTP or "http", which is http 1.0 or 1.1, using pooled connection by default (Keep-Alive). Check out [Access HTTP service](http_client.md) for details.
gejun's avatar
gejun committed
528
- PROTOCOL_SOFA_PBRPC or "sofa_pbrpc", which is protocol of sofa-pbrpc, using single connection by default.
529 530 531
- PROTOCOL_PUBLIC_PBRPC or "public_pbrpc", which is protocol of public_pbrpc, using pooled connection by default.
- PROTOCOL_UBRPC_COMPACK or "ubrpc_compack", which is protocol of public/ubrpc, packing with compack, using pooled connection by default. check out [ubrpc (by protobuf)](ub_client.md) for details. A related protocol is PROTOCOL_UBRPC_MCPACK2 or ubrpc_mcpack2, packing with mcpack2.
- PROTOCOL_NSHEAD_CLIENT or "nshead_client", which is required by UBXXXRequest in baidu-rpc-ub, using pooled connection by default. Check out [Access UB](ub_client.md) for details.
gejun's avatar
gejun committed
532
- PROTOCOL_NSHEAD or "nshead", which is required by sending NsheadMessage, using pooled connection by default. Check out [nshead+blob](ub_client.md#nshead-blob) for details.
533 534 535 536
- PROTOCOL_MEMCACHE or "memcache", which is binary protocol of memcached, using **single connection** by default. Check out [access memcached](memcache_client.md) for details.
- PROTOCOL_REDIS or "redis", which is protocol of redis 1.2+ (the one supported by hiredis), using **single connection** by default. Check out [Access Redis](redis_client.md) for details.
- PROTOCOL_NSHEAD_MCPACK or "nshead_mcpack", which is as the name implies, nshead + mcpack (parsed by protobuf via mcpack2pb), using pooled connection by default.
- PROTOCOL_ESP or "esp", for accessing services with esp protocol, using pooled connection by default.
gejun's avatar
gejun committed
537

gejun's avatar
gejun committed
538
## Connection Type
gejun's avatar
gejun committed
539

gejun's avatar
gejun committed
540
brpc supports following connection types:
gejun's avatar
gejun committed
541

542 543 544
- short connection: Established before each RPC, closed after completion. Since each RPC has to pay the overhead of establishing connection, this type is used for occasionally launched RPC, not frequently launched ones. No protocol use this type by default. Connections in http 1.0 are handled similarly as short connections.
- pooled connection: Pick an unused connection from a pool before each RPC, return after completion. One connection carries at most one request at the same time. One client may have multiple connections to one server.  http and the protocols using nshead use this type by default.
- single connection: all clients in one process has at most one connection to one server, one connection may carry multiple requests at the same time. The sequence of received responses does not need to be same as sending requests. This type is used by baidu_std, hulu_pbrpc, sofa_pbrpc by default.
gejun's avatar
gejun committed
545

546 547 548 549 550 551 552
|                                          | short connection                         | pooled connection                       | single connection                        |
| ---------------------------------------- | ---------------------------------------- | --------------------------------------- | ---------------------------------------- |
| long connection                          | no                                       | yes                                     | yes                                      |
| \#connection at server-side (from a client) | qps*latency ([little's law](https://en.wikipedia.org/wiki/Little%27s_law)) | qps*latency                             | 1                                        |
| peak qps                                 | bad, and limited by max number of ports  | medium                                  | high                                     |
| latency                                  | 1.5RTT(connect) + 1RTT + processing time | 1RTT + processing time                  | 1RTT + processing time                   |
| cpu usage                                | high, tcp connect for each RPC           | medium, every request needs a sys write | low, writes can be combined to reduce overhead. |
gejun's avatar
gejun committed
553

554
brpc chooses best connection type for the protocol by default, users generally have no need to change it. If you do, set ChannelOptions.connection_type to:
gejun's avatar
gejun committed
555

gejun's avatar
gejun committed
556
- CONNECTION_TYPE_SINGLE or "single" : single connection
gejun's avatar
gejun committed
557

558
- CONNECTION_TYPE_POOLED or "pooled": pooled connection. Max number of connections from one client to one server is limited by -max_connection_pool_size:
gejun's avatar
gejun committed
559 560 561 562 563

  | Name                         | Value | Description                              | Defined At          |
  | ---------------------------- | ----- | ---------------------------------------- | ------------------- |
  | max_connection_pool_size (R) | 100   | maximum pooled connection count to a single endpoint | src/brpc/socket.cpp |

gejun's avatar
gejun committed
564
- CONNECTION_TYPE_SHORT or "short" : short connection
gejun's avatar
gejun committed
565

gejun's avatar
gejun committed
566
- "" (empty string) makes brpc chooses the default one.
gejun's avatar
gejun committed
567

568
brpc also supports [Streaming RPC](streaming_rpc.md) which is an application-level connection for transferring streaming data.
gejun's avatar
gejun committed
569

gejun's avatar
gejun committed
570
## Close idle connections in pools
gejun's avatar
gejun committed
571

gejun's avatar
gejun committed
572
If a connection has no read or write within the seconds specified by -idle_timeout_second, it's tagged as "idle", and will be closed automatically. Default value is 10 seconds. This feature is only effective to pooled connections. If -log_idle_connection_close is true, a log is printed before closing.
gejun's avatar
gejun committed
573 574 575 576 577 578

| Name                      | Value | Description                              | Defined At              |
| ------------------------- | ----- | ---------------------------------------- | ----------------------- |
| idle_timeout_second       | 10    | Pooled connections without data transmission for so many seconds will be closed. No effect for non-positive values | src/brpc/socket_map.cpp |
| log_idle_connection_close | false | Print log when an idle connection is closed | src/brpc/socket.cpp     |

gejun's avatar
gejun committed
579
## Defer connection close
gejun's avatar
gejun committed
580

581
Multiple channels may share a connection via referential counting. When a channel releases last reference of the connection, the connection will be closed. But in some scenarios, channels are created just before sending RPC and destroyed after completion, in which case connections are probably closed and re-open again frequently, as costly as short connections.
gejun's avatar
gejun committed
582

gejun's avatar
gejun committed
583
One solution is to cache channels commonly used by user, which avoids frequent creation and destroying of channels.  However brpc does not offer an utility for doing this right now, and it's not trivial for users to implement it correctly.
gejun's avatar
gejun committed
584

gejun's avatar
gejun committed
585
Another solution is setting gflag -defer_close_second
gejun's avatar
gejun committed
586 587 588 589 590

| Name               | Value | Description                              | Defined At              |
| ------------------ | ----- | ---------------------------------------- | ----------------------- |
| defer_close_second | 0     | Defer close of connections for so many seconds even if the connection is not used by anyone. Close immediately for non-positive values | src/brpc/socket_map.cpp |

591
After setting, connection is not closed immediately after last referential count, instead it will be closed after so many seconds. If a channel references the connection again during the wait, the connection resumes to normal. No matter how frequent channels are created, this flag limits the frequency of closing connections. Side effect of the flag is that file descriptors are not closed immediately after destroying of channels, if the flag is wrongly set to be large, number of active file descriptors in the process may be large as well.
gejun's avatar
gejun committed
592

593
## Buffer size of connections
gejun's avatar
gejun committed
594

595
-socket_recv_buffer_size sets receiving buffer size of all connections, -1 by default (not modified)
gejun's avatar
gejun committed
596

597
-socket_send_buffer_size sets sending buffer size of all connections, -1 by default (not modified)
gejun's avatar
gejun committed
598 599 600 601 602 603 604 605

| Name                    | Value | Description                              | Defined At          |
| ----------------------- | ----- | ---------------------------------------- | ------------------- |
| socket_recv_buffer_size | -1    | Set the recv buffer size of socket if this value is positive | src/brpc/socket.cpp |
| socket_send_buffer_size | -1    | Set send buffer size of sockets if this value is positive | src/brpc/socket.cpp |

## log_id

606
set_log_id() sets a 64-bit integral log_id, which is sent to the server-side along with the request, and often printed in server logs to associate different services accessed in a session. String-type log-id must be converted to 64-bit integer before setting.
gejun's avatar
gejun committed
607

608
## Attachment
gejun's avatar
gejun committed
609

610
baidu_std and hulu_pbrpc supports attachments which are sent along with messages and set by users to bypass serialization of protobuf. As a client, data set in Controller::request_attachment() will be received by server and response_attachment() contains attachment sent back by the server.
gejun's avatar
gejun committed
611

612 613 614
Attachment is not compressed by framework.

In http, attachment corresponds to [message body](http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html), namely the data to post to server is stored in request_attachment().
gejun's avatar
gejun committed
615

616
## Authentication
617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638
Generally there are 2 ways of authentication at the client side:

1. Request-based authentication: Each request carries authentication information. It's more flexible since the authentication information can contain fields based on this particular request. However, this leads to a performance loss due to the extra payload in each request.
2. Connection-based authentication: Once a TCP connection has been established, the client sends an authentication packet. After it has been verfied by the server, subsequent requests on this connection no longer needs authentication. Compared with the former, this method can only some static information such as local IP in the authentication packet. However, it has better performance especially under single connection / connection pool scenario.

It's very simple to implement the first method by just adding authentication data format into the request proto definition. Then send it as normal RPC in each request. To achieve the second one, brpc provides an interface for users to implement:

```c++
class Authenticator {
public:
    virtual ~Authenticator() {}

    // Implement this method to generate credential information
    // into `auth_str' which will be sent to `VerifyCredential'
    // at server side. This method will be called on client side.
    // Returns 0 on success, error code otherwise
    virtual int GenerateCredential(std::string* auth_str) const = 0;
};
```

When the user calls the RPC interface with a single connection to the same server, the framework guarantee that once the TCP connection has been established, the first request on the connection will contain the authentication string generated by `GenerateCredential`. Subsequent requests will not carried that string. The entire sending process is still highly concurrent since it won't wait for the authentication result. If the verification succeeds, all requests return without error. Otherwise, if the verification fails, generally the server will close the connection and those requests will receive the corresponding error.

gejun's avatar
gejun committed
639
Currently only those protocols support client authentication: [baidu_std](../cn/baidu_std.md) (default protocol), HTTP, hulu_pbrpc, ESP. For customized protocols, generally speaking, users could call the `Authenticator`'s interface to generate authentication string during the request packing process in order to support authentication.
gejun's avatar
gejun committed
640

641
## Reset
gejun's avatar
gejun committed
642

643
This method makes Controller back to the state as if it's just created.
gejun's avatar
gejun committed
644

645
Don't call Reset() during a RPC, which is undefined.
gejun's avatar
gejun committed
646

647
## Compression
gejun's avatar
gejun committed
648

649
set_request_compress_type() sets compress-type of the request, no compression by default.
gejun's avatar
gejun committed
650

651
NOTE: Attachment is not compressed by brpc.
gejun's avatar
gejun committed
652

653
Check out [compress request body](http_client#压缩request-body) to compress http body.
gejun's avatar
gejun committed
654

655
Supported compressions:
gejun's avatar
gejun committed
656

657 658 659
- brpc::CompressTypeSnappy : [snanpy](http://google.github.io/snappy/), compression and decompression are very fast, but compression ratio is low.
- brpc::CompressTypeGzip : [gzip](http://en.wikipedia.org/wiki/Gzip), significantly slower than snappy, with a higher compression ratio.
- brpc::CompressTypeZlib : [zlib](http://en.wikipedia.org/wiki/Zlib), 10%~20% faster than gzip but still significantly slower than snappy, with slightly better compression ratio than gzip.
gejun's avatar
gejun committed
660

661
Following table lists performance of different methods compressing and decompressing **data with a lot of duplications**, just for reference.
gejun's avatar
gejun committed
662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677

| Compress method | Compress size(B) | Compress time(us) | Decompress time(us) | Compress throughput(MB/s) | Decompress throughput(MB/s) | Compress ratio |
| --------------- | ---------------- | ----------------- | ------------------- | ------------------------- | --------------------------- | -------------- |
| Snappy          | 128              | 0.753114          | 0.890815            | 162.0875                  | 137.0322                    | 37.50%         |
| Gzip            | 10.85185         | 1.849199          | 11.2488             | 66.01252                  | 47.66%                      |                |
| Zlib            | 10.71955         | 1.66522           | 11.38763            | 73.30581                  | 38.28%                      |                |
| Snappy          | 1024             | 1.404812          | 1.374915            | 695.1555                  | 710.2713                    | 8.79%          |
| Gzip            | 16.97748         | 3.950946          | 57.52106            | 247.1718                  | 6.64%                       |                |
| Zlib            | 15.98913         | 3.06195           | 61.07665            | 318.9348                  | 5.47%                       |                |
| Snappy          | 16384            | 8.822967          | 9.865008            | 1770.946                  | 1583.881                    | 4.96%          |
| Gzip            | 160.8642         | 43.85911          | 97.13162            | 356.2544                  | 0.78%                       |                |
| Zlib            | 147.6828         | 29.06039          | 105.8011            | 537.6734                  | 0.71%                       |                |
| Snappy          | 32768            | 16.16362          | 19.43596            | 1933.354                  | 1607.844                    | 4.82%          |
| Gzip            | 229.7803         | 82.71903          | 135.9995            | 377.7849                  | 0.54%                       |                |
| Zlib            | 240.7464         | 54.44099          | 129.8046            | 574.0161                  | 0.50%                       |                |

678
Following table lists performance of different methods compressing and decompressing **data with very few duplications**, just for reference.
gejun's avatar
gejun committed
679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696

| Compress method | Compress size(B) | Compress time(us) | Decompress time(us) | Compress throughput(MB/s) | Decompress throughput(MB/s) | Compress ratio |
| --------------- | ---------------- | ----------------- | ------------------- | ------------------------- | --------------------------- | -------------- |
| Snappy          | 128              | 0.866002          | 0.718052            | 140.9584                  | 170.0021                    | 105.47%        |
| Gzip            | 15.89855         | 4.936242          | 7.678077            | 24.7294                   | 116.41%                     |                |
| Zlib            | 15.88757         | 4.793953          | 7.683384            | 25.46339                  | 107.03%                     |                |
| Snappy          | 1024             | 2.087972          | 1.06572             | 467.7087                  | 916.3403                    | 100.78%        |
| Gzip            | 32.54279         | 12.27744          | 30.00857            | 79.5412                   | 79.79%                      |                |
| Zlib            | 31.51397         | 11.2374           | 30.98824            | 86.90288                  | 78.61%                      |                |
| Snappy          | 16384            | 12.598            | 6.306592            | 1240.276                  | 2477.566                    | 100.06%        |
| Gzip            | 537.1803         | 129.7558          | 29.08707            | 120.4185                  | 75.32%                      |                |
| Zlib            | 519.5705         | 115.1463          | 30.07291            | 135.697                   | 75.24%                      |                |
| Snappy          | 32768            | 22.68531          | 12.39793            | 1377.543                  | 2520.582                    | 100.03%        |
| Gzip            | 1403.974         | 258.9239          | 22.25825            | 120.6919                  | 75.25%                      |                |
| Zlib            | 1370.201         | 230.3683          | 22.80687            | 135.6524                  | 75.21%                      |                |

# FAQ

697
### Q: Does brpc support unix domain socket?
gejun's avatar
gejun committed
698

699
No. Local TCP sockets performs just a little slower than unix domain socket since traffic over local TCP sockets bypasses network. Some scenarios where TCP sockets can't be used may require unix domain sockets. We may consider the capability in future.
gejun's avatar
gejun committed
700

701
### Q: Fail to connect to xx.xx.xx.xx:xxxx, Connection refused
gejun's avatar
gejun committed
702

703
The remote server does not serve any more (probably crashed).
gejun's avatar
gejun committed
704

705
### Q: often met Connection timedout to another IDC
gejun's avatar
gejun committed
706 707 708

![img](../images/connection_timedout.png)

709
The TCP connection is not established within connection_timeout_ms, you have to tweak options:
gejun's avatar
gejun committed
710 711 712 713 714 715 716 717 718

```c++
struct ChannelOptions {
    ...
    // Issue error when a connection is not established after so many
    // milliseconds. -1 means wait indefinitely.
    // Default: 200 (milliseconds)
    // Maximum: 0x7fffffff (roughly 30 days)
    int32_t connect_timeout_ms;
719

gejun's avatar
gejun committed
720 721 722 723 724 725 726 727 728
    // Max duration of RPC over this Channel. -1 means wait indefinitely.
    // Overridable by Controller.set_timeout_ms().
    // Default: 500 (milliseconds)
    // Maximum: 0x7fffffff (roughly 30 days)
    int32_t timeout_ms;
    ...
};
```

729
NOTE: Connection timeout is not RPC timeout, which is printed as "Reached timeout=...".
gejun's avatar
gejun committed
730

731
### Q: synchronous call is good, asynchronous call crashes
gejun's avatar
gejun committed
732

733
Check lifetime of Controller, Response and done. In asynchronous call, finish of CallMethod is not completion of RPC which is entering of done->Run(). So the objects should not deleted just after CallMethod, instead they should be delete in done->Run(). Generally you should allocate the objects on heap instead of putting them on stack. Check out [Asynchronous call](client.md#asynchronous-call) for details.
gejun's avatar
gejun committed
734

735
### Q: How to make requests be processed once and only once
gejun's avatar
gejun committed
736

737
This issue is not solved on RPC layer. When response returns and being successful, we know the RPC is processed at server-side. When response returns and being rejected, we know the RPC is not processed at server-side. But when response is not returned, server may or may not process the RPC. If we retry, same request may be processed twice at server-side. Generally RPC services with side effects must consider [idempotence](http://en.wikipedia.org/wiki/Idempotence) of the service, otherwise retries may make side effects be done more than once and result in unexpected behavior. Search services with only read often have no side effects (during a search), being idempotent natually. But storage services that need to write have to design versioning or serial-number mechanisms to reject side effects that already happen, to keep idempoent.
gejun's avatar
gejun committed
738

739
### Q: Invalid address=`bns://group.user-persona.dumi.nj03'
gejun's avatar
gejun committed
740
```
gejun's avatar
gejun committed
741
FATAL 04-07 20:00:03 7778 src/brpc/channel.cpp:123] Invalid address=`bns://group.user-persona.dumi.nj03'. You should use Init(naming_service_name, load_balancer_name, options) to access multiple servers.
gejun's avatar
gejun committed
742
```
743
Accessing servers under naming service needs the Init() with 3 parameters(the second param is `load_balancer_name`). The Init() here is with 2 parameters and treated by brpc as accessing single server, producing the error.
gejun's avatar
gejun committed
744

745
### Q: Both sides use protobuf, why can't they communicate with each other
gejun's avatar
gejun committed
746

747
**protocol != protobuf**. protobuf serializes one package and a message of a protocol may contain multiple packages along with extra lengths, checksums, magic numbers. The capability offered by brpc that "write code once and serve multiple protocols" is implemented by converting data from different protocols to unified API, not on protobuf layer.
gejun's avatar
gejun committed
748

749
### Q: Why C++ client/server may fail to talk to client/server in other languages
gejun's avatar
gejun committed
750

751
Check if the C++ version turns on compression (Controller::set_compress_type), Currently RPC impl. in other languages do not support compression yet.
gejun's avatar
gejun committed
752

753
# PS: Workflow at Client-side
gejun's avatar
gejun committed
754 755 756

![img](../images/client_side.png)

757 758 759 760 761 762 763 764 765 766 767 768 769 770 771
Steps:

1. Create a [bthread_id](https://github.com/brpc/brpc/blob/master/src/bthread/id.h) as correlation_id of current RPC.
2. According to how the Channel is initialized, choose a server from global [SocketMap](https://github.com/brpc/brpc/blob/master/src/brpc/socket_map.h) or [LoadBalancer](https://github.com/brpc/brpc/blob/master/src/brpc/load_balancer.h) as  destination of the request.
3. Choose a [Socket](https://github.com/brpc/brpc/blob/master/src/brpc/socket.h) according to connection type (single, pooled, short)
4. If authentication is turned on and the Socket is not authenticated yet, first request enters authenticating branch, other requests block until the branch writes authenticating information into the Socket. Server-side only verifies the first request.
5. According to protocol of the Channel, choose corresponding serialization callback to serialize request into [IOBuf](https://github.com/brpc/brpc/blob/master/src/butil/iobuf.h).
6. If timeout is set, setup timer. From this point on, avoid using Controller, since the timer may be triggered at anytime and calls user's callback for timeout, which may delete Controller.
7. Sending phase is completed. If error occurs at any step, Channel::HandleSendFailed is called.
8. Write IOBuf with serialized data into the Socket and add Channel::HandleSocketFailed into id_wait_list of the Socket. The callback will be called when the write is failed or connection is broken before completion of RPC.
9. In synchronous call, Join correlation_id; otherwise CallMethod() returns.
10. Send/receive messages to/from network.
11. After receiving response, get the correlation_id inside, find out associated Controller within O(1) time. The lookup does not need to lock a global hashmap, and scales well.
12. Parse response according to the protocol
13. Call Controller::OnRPCReturned, which may retry errorous RPC, or complete the RPC. Call user's done in asynchronous call. Destroy correlation_id and wakeup joining threads.