Commit c9039fd4 authored by gejun's avatar gejun

Complete translation of client.md

parent 16fbb258
...@@ -530,13 +530,13 @@ brpc支持以下连接方式: ...@@ -530,13 +530,13 @@ brpc支持以下连接方式:
- 连接池:每次RPC前取用空闲连接,结束后归还,一个连接上最多只有一个请求,一个client对一台server可能有多条连接。http 1.1和各类使用nshead的协议都是这个方式。 - 连接池:每次RPC前取用空闲连接,结束后归还,一个连接上最多只有一个请求,一个client对一台server可能有多条连接。http 1.1和各类使用nshead的协议都是这个方式。
- 单连接:进程内所有client与一台server最多只有一个连接,一个连接上可能同时有多个请求,回复返回顺序和请求顺序不需要一致,这是baidu_std,hulu_pbrpc,sofa_pbrpc协议的默认选项。 - 单连接:进程内所有client与一台server最多只有一个连接,一个连接上可能同时有多个请求,回复返回顺序和请求顺序不需要一致,这是baidu_std,hulu_pbrpc,sofa_pbrpc协议的默认选项。
| | 短连接 | 连接池 | 单连接 | | | 短连接 | 连接池 | 单连接 |
| ---------- | ---------------------------------------- | --------------------- | ------------------- | | ------------------- | ---------------------------------------- | --------------------- | ------------------- |
| 长连接 | 否 | 是 | 是 | | 长连接 | 否 | 是 | 是 |
| server端连接数 | qps*latency (原理见[little's law](https://en.wikipedia.org/wiki/Little%27s_law)) | qps*latency | 1 | | server端连接数(单client) | qps*latency (原理见[little's law](https://en.wikipedia.org/wiki/Little%27s_law)) | qps*latency | 1 |
| 极限qps | 差,且受限于单机端口数 | 中等 | 高 | | 极限qps | 差,且受限于单机端口数 | 中等 | 高 |
| latency | 1.5RTT(connect) + 1RTT + 处理时间 | 1RTT + 处理时间 | 1RTT + 处理时间 | | latency | 1.5RTT(connect) + 1RTT + 处理时间 | 1RTT + 处理时间 | 1RTT + 处理时间 |
| cpu占用 | 高, 每次都要tcp connect | 中等, 每个请求都要一次sys write | 低, 合并写出在大流量时减少cpu占用 | | cpu占用 | 高, 每次都要tcp connect | 中等, 每个请求都要一次sys write | 低, 合并写出在大流量时减少cpu占用 |
框架会为协议选择默认的连接方式,用户**一般不用修改**。若需要,把ChannelOptions.connection_type设为: 框架会为协议选择默认的连接方式,用户**一般不用修改**。若需要,把ChannelOptions.connection_type设为:
...@@ -590,7 +590,7 @@ brpc支持[Streaming RPC](streaming_rpc.md),这是一种应用层的连接, ...@@ -590,7 +590,7 @@ brpc支持[Streaming RPC](streaming_rpc.md),这是一种应用层的连接,
## log_id ## log_id
通过set_log_id()可设置log_id。这个id会被送到服务器端,一般会被打在日志里,从而把一次检索经过的所有服务串联起来。不同产品线可能有不同的叫法。一些产品线有字符串格式的“s值”,内容也是64位的16进制数,可以转成整型后再设入log_id。 通过set_log_id()可设置64位整型log_id。这个id会和请求一起被送到服务器端,一般会被打在日志里,从而把一次检索经过的所有服务串联起来。字符串格式的需要转化为64位整形才能设入log_id。
## 附件 ## 附件
...@@ -598,23 +598,8 @@ baidu_std和hulu_pbrpc协议支持附件,这段数据由用户自定义,不 ...@@ -598,23 +598,8 @@ baidu_std和hulu_pbrpc协议支持附件,这段数据由用户自定义,不
在http协议中,附件对应[message body](http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html),比如要POST的数据就设置在request_attachment()中。 在http协议中,附件对应[message body](http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html),比如要POST的数据就设置在request_attachment()中。
## giano认证 ## 认证
``` TODO: Describe how authentication methods are extended.
// Create a baas::CredentialGenerator using Giano's API
baas::CredentialGenerator generator = CREATE_MOCK_PERSONAL_GENERATOR(
"mock_user", "mock_roles", "mock_group", baas::sdk::BAAS_OK);
// Create a brpc::policy::GianoAuthenticator using the generator we just created
// and then pass it into brpc::ChannelOptions
brpc::policy::GianoAuthenticator auth(&generator, NULL);
brpc::ChannelOptions option;
option.auth = &auth;
```
首先通过调用Giano API生成验证器baas::CredentialGenerator,具体可参看[Giano快速上手手册.pdf](http://wiki.baidu.com/download/attachments/37774685/Giano%E5%BF%A
B%E9%80%9F%E4%B8%8A%E6%89%8B%E6%89%8B%E5%86%8C.pdf?version=1&modificationDate=1421990746000&api=v2)。然后按照如上代码一步步将其设置到brpc::ChannelOptions里去。
当client设置认证后,任何一个新连接建立后都必须首先发送一段验证信息(通过Giano认证器生成),才能发送后续请求。认证成功后,该连接上的后续请求不会再带有验证消息。
## 重置 ## 重置
...@@ -624,7 +609,11 @@ B%E9%80%9F%E4%B8%8A%E6%89%8B%E6%89%8B%E5%86%8C.pdf?version=1&modificationDate=14 ...@@ -624,7 +609,11 @@ B%E9%80%9F%E4%B8%8A%E6%89%8B%E6%89%8B%E5%86%8C.pdf?version=1&modificationDate=14
## 压缩 ## 压缩
set_request_compress_type()设置request的压缩方式,默认不压缩。注意:附件不会被压缩。HTTP body的压缩方法见[client压缩request body](http_client#压缩request-body) set_request_compress_type()设置request的压缩方式,默认不压缩。
注意:附件不会被压缩。
HTTP body的压缩方法见[client压缩request body](http_client#压缩request-body)
支持的压缩方法有: 支持的压缩方法有:
...@@ -670,13 +659,13 @@ set_request_compress_type()设置request的压缩方式,默认不压缩。注 ...@@ -670,13 +659,13 @@ set_request_compress_type()设置request的压缩方式,默认不压缩。注
### Q: brpc能用unix domain socket吗 ### Q: brpc能用unix domain socket吗
不能。因为同机socket并不走网络,相比domain socket性能只会略微下降,替换为domain socket意义不大。以后可能会扩展支持。 不能。同机TCP socket并不走网络,相比unix domain socket性能只会略微下降。一些不能用TCP socket的特殊场景可能会需要,以后可能会扩展支持。
### Q: Fail to connect to xx.xx.xx.xx:xxxx, Connection refused是什么意思 ### Q: Fail to connect to xx.xx.xx.xx:xxxx, Connection refused
一般是对端server没打开端口(很可能挂了)。 一般是对端server没打开端口(很可能挂了)。
### Q: 经常遇到Connection timedout(不在一个机房) ### Q: 经常遇到至另一个机房的Connection timedout
![img](../images/connection_timedout.png) ![img](../images/connection_timedout.png)
...@@ -700,33 +689,29 @@ struct ChannelOptions { ...@@ -700,33 +689,29 @@ struct ChannelOptions {
}; };
``` ```
注意连接超时不是RPC超时,RPC超时打印的日志是"Reached timeout=..."。 注意: 连接超时不是RPC超时,RPC超时打印的日志是"Reached timeout=..."。
### Q: 为什么同步方式是好的,异步就crash了 ### Q: 为什么同步方式是好的,异步就crash了
重点检查Controller,Response和done的生命周期。在异步访问中,RPC调用结束并不意味着RPC整个过程结束,而是要在done被调用后才会结束。所以这些对象不应在调用RPC后就释放,而是要在done里面释放。所以你一般不能把这些对象分配在栈上,而应该使用NewCallback等方式分配在堆上。详见[异步访问](client.md#异步访问) 重点检查Controller,Response和done的生命周期。在异步访问中,RPC调用结束并不意味着RPC整个过程结束,而是在进入done->Run()时才会结束。所以这些对象不应在调用RPC后就释放,而是要在done->Run()里释放。你一般不能把这些对象分配在栈上,而应该分配在堆上。详见[异步访问](client.md#异步访问)
### Q: 我怎么确认server处理了我的请求
不一定能。当response返回且成功时,我们确认这个过程一定成功了。当response返回且失败时,我们确认这个过程一定失败了。但当response没有返回时,它可能失败,也可能成功。如果我们选择重试,那一个成功的过程也可能会被再执行一次。所以一般来说RPC服务都应当考虑[幂等](http://en.wikipedia.org/wiki/Idempotence)问题,否则重试可能会导致多次叠加副作用而产生意向不到的结果。比如以读为主的检索服务大都没有副作用而天然幂等,无需特殊处理。而像写也很多的存储服务则要在设计时就加入版本号或序列号之类的机制以拒绝已经发生的过程,保证幂等。
### Q: BNS中机器列表已经配置了,但是RPC报"Fail to select server, No data available"错误 ### Q: 怎么确保请求只被处理一次
使用get_instance_by_service -s your_bns_name 来检查一下所有机器的status状态, 只有status为0的机器才能被client访问. 这不是RPC层面的事情。当response返回且成功时,我们确认这个过程一定成功了。当response返回且失败时,我们确认这个过程一定失败了。但当response没有返回时,它可能失败,也可能成功。如果我们选择重试,那一个成功的过程也可能会被再执行一次。一般来说带副作用的RPC服务都应当考虑[幂等](http://en.wikipedia.org/wiki/Idempotence)问题,否则重试可能会导致多次叠加副作用而产生意向不到的结果。只有读的检索服务大都没有副作用而天然幂等,无需特殊处理。而带写的存储服务则要在设计时就加入版本号或序列号之类的机制以拒绝已经发生的过程,保证幂等。
### Q: Invalid address=`bns://group.user-persona.dumi.nj03'是什么意思 ### Q: Invalid address=`bns://group.user-persona.dumi.nj03'
``` ```
FATAL 04-07 20:00:03 7778 src/brpc/channel.cpp:123] Invalid address=`bns://group.user-persona.dumi.nj03'. You should use Init(naming_service_name, load_balancer_name, options) to access multiple servers. FATAL 04-07 20:00:03 7778 src/brpc/channel.cpp:123] Invalid address=`bns://group.user-persona.dumi.nj03'. You should use Init(naming_service_name, load_balancer_name, options) to access multiple servers.
``` ```
访问bns要使用三个参数的Init,它第二个参数是load_balancer_name,而你这里用的是两个参数的Init,框架当你是访问单点,就会报这个错。 访问名字服务要使用三个参数的Init,其中第二个参数是load_balancer_name,而这里用的是两个参数的Init,框架认为是访问单点,就会报这个错。
### Q: 两个产品线都使用protobuf,为什么不能互相访问 ### Q: 两端都用protobuf,为什么不能互相访问
协议 !=protobuf。protobuf负责打包,协议负责定字段。打包格式相同不意味着字段可以互通。协议中可能会包含多个protobuf包,以及额外的长度、校验码、magic number等等。协议的互通是通过在RPC框架内转化为统一的编程接口完成的,而不是在protobuf层面。从广义上来说,protobuf也可以作为打包框架使用,生成其他序列化格式的包,像[idl<=>protobuf](mcpack2pb.md)就是通过protobuf生成了解析idl的代码 **协议 !=protobuf**。protobuf负责一个包的序列化,协议中的一个消息可能会包含多个protobuf包,以及额外的长度、校验码、magic number等等。打包格式相同不意味着协议可以互通。在brpc中写一份代码就能服务多协议的能力是通过把不同协议的数据转化为统一的编程接口完成的,而不是在protobuf层面
### Q: 为什么C++ client/server 能够互相通信, 和其他语言的client/server 通信会报序列化失败的错误 ### Q: 为什么C++ client/server 能够互相通信, 和其他语言的client/server 通信会报序列化失败的错误
检查一下C++ 版本是否开启了压缩 (Controller::set_compress_type), 目前 python/JAVA版的rpc框架还没有实现压缩,互相返回会出现问题。 检查一下C++ 版本是否开启了压缩 (Controller::set_compress_type), 目前其他语言的rpc框架还没有实现压缩,互相返回会出现问题。
# 附:Client端基本流程 # 附:Client端基本流程
...@@ -737,13 +722,13 @@ FATAL 04-07 20:00:03 7778 src/brpc/channel.cpp:123] Invalid address=`bns://group ...@@ -737,13 +722,13 @@ FATAL 04-07 20:00:03 7778 src/brpc/channel.cpp:123] Invalid address=`bns://group
1. 创建一个[bthread_id](https://github.com/brpc/brpc/blob/master/src/bthread/id.h)作为本次RPC的correlation_id。 1. 创建一个[bthread_id](https://github.com/brpc/brpc/blob/master/src/bthread/id.h)作为本次RPC的correlation_id。
2. 根据Channel的创建方式,从进程级的[SocketMap](https://github.com/brpc/brpc/blob/master/src/brpc/socket_map.h)中或从[LoadBalancer](https://github.com/brpc/brpc/blob/master/src/brpc/load_balancer.h)中选择一台下游server作为本次RPC发送的目的地。 2. 根据Channel的创建方式,从进程级的[SocketMap](https://github.com/brpc/brpc/blob/master/src/brpc/socket_map.h)中或从[LoadBalancer](https://github.com/brpc/brpc/blob/master/src/brpc/load_balancer.h)中选择一台下游server作为本次RPC发送的目的地。
3. 根据连接方式(单连接、连接池、短连接),选择一个[Socket](https://github.com/brpc/brpc/blob/master/src/brpc/socket.h) 3. 根据连接方式(单连接、连接池、短连接),选择一个[Socket](https://github.com/brpc/brpc/blob/master/src/brpc/socket.h)
4. 如果开启验证且当前Socket没有被验证过时,第一个请求进入验证分支,其余请求会阻塞直到第一个包含认证信息的请求写入Socket。这是因为server端只对第一个请求进行验证。 4. 如果开启验证且当前Socket没有被验证过时,第一个请求进入验证分支,其余请求会阻塞直到第一个包含认证信息的请求写入Socket。server端只对第一个请求进行验证。
5. 根据Channel的协议,选择对应的序列化函数把request序列化至[IOBuf](https://github.com/brpc/brpc/blob/master/src/butil/iobuf.h) 5. 根据Channel的协议,选择对应的序列化函数把request序列化至[IOBuf](https://github.com/brpc/brpc/blob/master/src/butil/iobuf.h)
6. 如果配置了超时,设置定时器。从这个点开始要避免使用Controller对象,因为在设定定时器后->有可能触发超时机制->调用到用户的异步回调->用户在回调中析构Controller。 6. 如果配置了超时,设置定时器。从这个点开始要避免使用Controller对象,因为在设定定时器后随时可能触发超时->调用到用户的超时回调->用户在回调中析构Controller。
7. 发送准备阶段结束,若上述任何步骤出错,会调用Channel::HandleSendFailed。 7. 发送准备阶段结束,若上述任何步骤出错,会调用Channel::HandleSendFailed。
8. 将之前序列化好的IOBuf写出到Socket上,同时传入回调Channel::HandleSocketFailed,当连接断开、写失败等错误发生时会调用此回调。 8. 将之前序列化好的IOBuf写出到Socket上,同时传入回调Channel::HandleSocketFailed,当连接断开、写失败等错误发生时会调用此回调。
9. 如果是同步发送,Join correlation_id;如果是异步则至此client端返回 9. 如果是同步发送,Join correlation_id;否则至此CallMethod结束
10. 网络上发消息+收消息。 10. 网络上发消息+收消息。
11. 收到response后,提取出其中的correlation_id,在O(1)时间内找到对应的Controller。这个过程中不需要查找全局哈希表,有良好的多核扩展性。 11. 收到response后,提取出其中的correlation_id,在O(1)时间内找到对应的Controller。这个过程中不需要查找全局哈希表,有良好的多核扩展性。
12. 根据协议格式反序列化response。 12. 根据协议格式反序列化response。
13. 调用Controller::OnRPCReturned,其中会根据错误码判断是否需要重试。如果是异步发送,调用用户回调。最后摧毁correlation_id唤醒Join着的线程。 13. 调用Controller::OnRPCReturned,可能会根据错误码判断是否需要重试,或让RPC结束。如果是异步发送,调用用户回调。最后摧毁correlation_id唤醒Join着的线程。
...@@ -11,7 +11,7 @@ ...@@ -11,7 +11,7 @@
- No class named brpc::Client. - No class named brpc::Client.
# Channel # Channel
Client-side sends requests. It's called [Channel](https://github.com/brpc/brpc/blob/master/src/brpc/channel.h) rather than "Client" in brpc. A channel represents a communication line to one server or multiple servers, which can be used for calling services. Client-side sends requests. It's called [Channel](https://github.com/brpc/brpc/blob/master/src/brpc/channel.h) rather than "Client" in brpc. A channel represents a communication line to one server or multiple servers, which can be used for calling services.
A Channel can be **shared by all threads** in the process. Yon don't need to create separate Channels for each thread, and you don't need to synchronize Channel.CallMethod with lock. However creation and destroying of Channel is **not** thread-safe, make sure the channel is initialized and destroyed only by one thread. A Channel can be **shared by all threads** in the process. Yon don't need to create separate Channels for each thread, and you don't need to synchronize Channel.CallMethod with lock. However creation and destroying of Channel is **not** thread-safe, make sure the channel is initialized and destroyed only by one thread.
...@@ -47,7 +47,7 @@ Valid "server_addr_and_port": ...@@ -47,7 +47,7 @@ Valid "server_addr_and_port":
- www.foo.com:8765 - www.foo.com:8765
- localhost:9000 - localhost:9000
Invalid "server_addr_and_port": Invalid "server_addr_and_port":
- 127.0.0.1:90000 # too large port - 127.0.0.1:90000 # too large port
- 10.39.2.300:8000 # invalid IP - 10.39.2.300:8000 # invalid IP
...@@ -58,7 +58,7 @@ int Init(const char* naming_service_url, ...@@ -58,7 +58,7 @@ int Init(const char* naming_service_url,
const char* load_balancer_name, const char* load_balancer_name,
const ChannelOptions* options); const ChannelOptions* options);
``` ```
Channels created by above Init() get server list from the NamingService specified by `naming_service_url` periodically or driven-by-events, and send request to one server chosen from the list according to the algorithm specified by `load_balancer_name` . Channels created by above Init() get server list from the NamingService specified by `naming_service_url` periodically or driven-by-events, and send request to one server chosen from the list according to the algorithm specified by `load_balancer_name` .
You **should not** create such channels ad-hocly each time before a RPC, because creation and destroying of such channels relate to many resources, say NamingService needs to be accessed once at creation otherwise server candidates are unknown. On the other hand, channels are able to be shared by multiple threads safely and has no need to be created frequently. You **should not** create such channels ad-hocly each time before a RPC, because creation and destroying of such channels relate to many resources, say NamingService needs to be accessed once at creation otherwise server candidates are unknown. On the other hand, channels are able to be shared by multiple threads safely and has no need to be created frequently.
...@@ -66,7 +66,7 @@ If `load_balancer_name` is NULL or empty, this Init() is just the one for connec ...@@ -66,7 +66,7 @@ If `load_balancer_name` is NULL or empty, this Init() is just the one for connec
## Naming Service ## Naming Service
Naming service maps a name to a modifiable list of servers. It's positioned as follows at client-side: Naming service maps a name to a modifiable list of servers. It's positioned as follows at client-side:
![img](../images/ns.png) ![img](../images/ns.png)
...@@ -78,7 +78,7 @@ General form of `naming_service_url` is "**protocol://service_name**". ...@@ -78,7 +78,7 @@ General form of `naming_service_url` is "**protocol://service_name**".
BNS is the most common naming service inside Baidu. In "bns://rdev.matrix.all", "bns" is protocol and "rdev.matrix.all" is service-name. A related gflag is -ns_access_interval: ![img](../images/ns_access_interval.png) BNS is the most common naming service inside Baidu. In "bns://rdev.matrix.all", "bns" is protocol and "rdev.matrix.all" is service-name. A related gflag is -ns_access_interval: ![img](../images/ns_access_interval.png)
If the list in BNS is non-empty, but Channel says "no servers", the status bit of the machine in BNS is probably non-zero, which means the machine is unavailable and as a correspondence not added as server candidates of the Channel. Status bits can be checked by: If the list in BNS is non-empty, but Channel says "no servers", the status bit of the machine in BNS is probably non-zero, which means the machine is unavailable and as a correspondence not added as server candidates of the Channel. Status bits can be checked by:
`get_instance_by_service [bns_node_name] -s` `get_instance_by_service [bns_node_name] -s`
...@@ -100,7 +100,7 @@ Users can filter servers got from the NamingService before pushing to LoadBalanc ...@@ -100,7 +100,7 @@ Users can filter servers got from the NamingService before pushing to LoadBalanc
![img](../images/ns_filter.jpg) ![img](../images/ns_filter.jpg)
Interface of the filter: Interface of the filter:
```c++ ```c++
// naming_service_filter.h // naming_service_filter.h
class NamingServiceFilter { class NamingServiceFilter {
...@@ -109,7 +109,7 @@ public: ...@@ -109,7 +109,7 @@ public:
// Return false to filter it out // Return false to filter it out
virtual bool Accept(const ServerNode& server) const = 0; virtual bool Accept(const ServerNode& server) const = 0;
}; };
// naming_service.h // naming_service.h
struct ServerNode { struct ServerNode {
butil::EndPoint addr; butil::EndPoint addr;
...@@ -127,7 +127,7 @@ public: ...@@ -127,7 +127,7 @@ public:
return server.tag == "main"; return server.tag == "main";
} }
}; };
int main() { int main() {
... ...
MyNamingServiceFilter my_filter; MyNamingServiceFilter my_filter;
...@@ -144,11 +144,11 @@ When there're more than one server to access, we need to divide the traffic. The ...@@ -144,11 +144,11 @@ When there're more than one server to access, we need to divide the traffic. The
![img](../images/lb.png) ![img](../images/lb.png)
The ideal algorithm is to make every request being processed in-time, and crash of any server makes minimal impact. However clients are not able to know delays or congestions happened at servers in realtime, and load balancing algorithms should be light-weight generally, users need to choose proper algorithms for their use cases. Algorithms provided by brpc (specified by `load_balancer_name`): The ideal algorithm is to make every request being processed in-time, and crash of any server makes minimal impact. However clients are not able to know delays or congestions happened at servers in realtime, and load balancing algorithms should be light-weight generally, users need to choose proper algorithms for their use cases. Algorithms provided by brpc (specified by `load_balancer_name`):
### rr ### rr
which is round robin. Always choose next server inside the list, next of the last server is the first one. No other settings. For example there're 3 servers: a,b,c, brpc will send requests to a, b, c, a, b, c, … and so on. Note that presumption of using this algorithm is the machine specs, network latencies, server loads are similar. which is round robin. Always choose next server inside the list, next of the last server is the first one. No other settings. For example there're 3 servers: a,b,c, brpc will send requests to a, b, c, a, b, c, … and so on. Note that presumption of using this algorithm is the machine specs, network latencies, server loads are similar.
### random ### random
...@@ -164,11 +164,11 @@ which is consistent hashing. Adding or removing servers does not make destinatio ...@@ -164,11 +164,11 @@ which is consistent hashing. Adding or removing servers does not make destinatio
Need to set Controller.set_request_code() before RPC otherwise the RPC will fail. request_code is often a 32-bit hash code of "key part" of the request, and the hashing algorithm does not need to be same with the one used by load balancer. Say `c_murmurhash` can use md5 to compute request_code of the request as well. Need to set Controller.set_request_code() before RPC otherwise the RPC will fail. request_code is often a 32-bit hash code of "key part" of the request, and the hashing algorithm does not need to be same with the one used by load balancer. Say `c_murmurhash` can use md5 to compute request_code of the request as well.
[src/brpc/policy/hasher.h](https://github.com/brpc/brpc/blob/master/src/brpc/policy/hasher.h) includes common hash functions. If `std::string key` stands for key part of the request, controller.set_request_code(brpc::policy::MurmurHash32(key.data(), key.size())) sets request_code correctly. [src/brpc/policy/hasher.h](https://github.com/brpc/brpc/blob/master/src/brpc/policy/hasher.h) includes common hash functions. If `std::string key` stands for key part of the request, controller.set_request_code(brpc::policy::MurmurHash32(key.data(), key.size())) sets request_code correctly.
Do distinguish "key" and "attributes" of the request. Don't compute request_code by full content of the request just for quick. Minor change in attributes may result in totally different hash code and change destination dramatically. Another cause is padding, for example: `struct Foo { int32_t a; int64_t b; }` has a 4-byte undefined gap between `a` and `b` on 64-bit machines, result of `hash(&foo, sizeof(foo))` is undefined. Fields need to be packed or serialized before hashing. Do distinguish "key" and "attributes" of the request. Don't compute request_code by full content of the request just for quick. Minor change in attributes may result in totally different hash code and change destination dramatically. Another cause is padding, for example: `struct Foo { int32_t a; int64_t b; }` has a 4-byte undefined gap between `a` and `b` on 64-bit machines, result of `hash(&foo, sizeof(foo))` is undefined. Fields need to be packed or serialized before hashing.
Check out [Consistent Hashing](consistent_hashing.md) for more details. Check out [Consistent Hashing](consistent_hashing.md) for more details.
## Health checking ## Health checking
...@@ -176,13 +176,13 @@ Servers whose connections are lost are isolated temporarily to prevent them from ...@@ -176,13 +176,13 @@ Servers whose connections are lost are isolated temporarily to prevent them from
| Name | Value | Description | Defined At | | Name | Value | Description | Defined At |
| ------------------------- | ----- | ---------------------------------------- | ----------------------- | | ------------------------- | ----- | ---------------------------------------- | ----------------------- |
| health_check_interval (R) | 3 | seconds between consecutive health-checkings | src/brpc/socket_map.cpp | | health_check_interval (R) | 3 | seconds between consecutive health-checkings | src/brpc/socket_map.cpp |
Once a server is connected, it resumes as a server candidate inside LoadBalancer. If a server is removed from NamingService during health-checking, brpc removes it from health-checking as well. Once a server is connected, it resumes as a server candidate inside LoadBalancer. If a server is removed from NamingService during health-checking, brpc removes it from health-checking as well.
# Launch RPC # Launch RPC
Generally, we don't use Channel.CallMethod directly, instead we call XXX_Stub generated by protobuf, which feels more like a "method call". The stub has few member fields, being suitable(and recommended) to be put on stack instead of new(). Surely the stub can be saved and re-used as well. Channel.CallMethod and stub are both **thread-safe** and accessible by multiple threads simultaneously. For example: Generally, we don't use Channel.CallMethod directly, instead we call XXX_Stub generated by protobuf, which feels more like a "method call". The stub has few member fields, being suitable(and recommended) to be put on stack instead of new(). Surely the stub can be saved and re-used as well. Channel.CallMethod and stub are both **thread-safe** and accessible by multiple threads simultaneously. For example:
```c++ ```c++
XXX_Stub stub(&channel); XXX_Stub stub(&channel);
stub.some_method(controller, request, response, done); stub.some_method(controller, request, response, done);
...@@ -191,11 +191,11 @@ Or even: ...@@ -191,11 +191,11 @@ Or even:
```c++ ```c++
XXX_Stub(&channel).some_method(controller, request, response, done); XXX_Stub(&channel).some_method(controller, request, response, done);
``` ```
A exception is http client, which is not related to protobuf much. Call CallMethod directly to make a http call, setting all parameters to NULL except for `Controller` and `done`, check [Access HTTP](http_client.md) for details. A exception is http client, which is not related to protobuf much. Call CallMethod directly to make a http call, setting all parameters to NULL except for `Controller` and `done`, check [Access HTTP](http_client.md) for details.
## Synchronous call ## Synchronous call
CallMethod blocks until response from server is received or error occurred (including timedout). CallMethod blocks until response from server is received or error occurred (including timedout).
response/controller in synchronous call will not be used by brpc again after CallMethod, they can be put on stack safely. Note: if request/response has many fields and being large on size, they'd better be allocated on heap. response/controller in synchronous call will not be used by brpc again after CallMethod, they can be put on stack safely. Note: if request/response has many fields and being large on size, they'd better be allocated on heap.
```c++ ```c++
...@@ -203,51 +203,51 @@ MyRequest request; ...@@ -203,51 +203,51 @@ MyRequest request;
MyResponse response; MyResponse response;
brpc::Controller cntl; brpc::Controller cntl;
XXX_Stub stub(&channel); XXX_Stub stub(&channel);
request.set_foo(...); request.set_foo(...);
cntl.set_timeout_ms(...); cntl.set_timeout_ms(...);
stub.some_method(&cntl, &request, &response, NULL); stub.some_method(&cntl, &request, &response, NULL);
if (cntl->Failed()) { if (cntl->Failed()) {
// RPC failed. fields in response are undefined, don't use. // RPC failed. fields in response are undefined, don't use.
} else { } else {
// RPC succeeded, response has what we want. // RPC succeeded, response has what we want.
} }
``` ```
## Asynchronous call ## Asynchronous call
Pass a callback `done` to CallMethod, which resumes after sending request, rather than completion of RPC. When the response from server is received or error occurred(including timedout), done->Run() is called. Post-processing code of the RPC should be put in done->Run() instead of after CallMethod. Pass a callback `done` to CallMethod, which resumes after sending request, rather than completion of RPC. When the response from server is received or error occurred(including timedout), done->Run() is called. Post-processing code of the RPC should be put in done->Run() instead of after CallMethod.
Because end of CallMethod does not mean completion of RPC, response/controller may still be used by brpc or done->Run(). Generally they should be allocated on heap and deleted in done->Run(). If they're deleted too early, done->Run() may access invalid memory. Because end of CallMethod does not mean completion of RPC, response/controller may still be used by brpc or done->Run(). Generally they should be allocated on heap and deleted in done->Run(). If they're deleted too early, done->Run() may access invalid memory.
You can new these objects individually and create done by [NewCallback](#use-newcallback), or make response/controller be member of done and [new them together](#Inherit-google::protobuf::Closure). Former one is recommended. You can new these objects individually and create done by [NewCallback](#use-newcallback), or make response/controller be member of done and [new them together](#Inherit-google::protobuf::Closure). Former one is recommended.
**Request and Channel can be destroyed immediately after asynchronous CallMethod**, which is different from response/controller. Note that "immediately" means destruction of request/Channel can happen **after** CallMethod, not during CallMethod. Deleting a Channel just being used by another thread results in undefined behavior (crash at best). **Request and Channel can be destroyed immediately after asynchronous CallMethod**, which is different from response/controller. Note that "immediately" means destruction of request/Channel can happen **after** CallMethod, not during CallMethod. Deleting a Channel just being used by another thread results in undefined behavior (crash at best).
### Use NewCallback ### Use NewCallback
```c++ ```c++
static void OnRPCDone(MyResponse* response, brpc::Controller* cntl) { static void OnRPCDone(MyResponse* response, brpc::Controller* cntl) {
// unique_ptr helps us to delete response/cntl automatically. unique_ptr in gcc 3.4 is an emulated version. // unique_ptr helps us to delete response/cntl automatically. unique_ptr in gcc 3.4 is an emulated version.
std::unique_ptr<MyResponse> response_guard(response); std::unique_ptr<MyResponse> response_guard(response);
std::unique_ptr<brpc::Controller> cntl_guard(cntl); std::unique_ptr<brpc::Controller> cntl_guard(cntl);
if (cntl->Failed()) { if (cntl->Failed()) {
// RPC failed. fields in response are undefined, don't use. // RPC failed. fields in response are undefined, don't use.
} else { } else {
// RPC succeeded, response has what we want. Continue the post-processing. // RPC succeeded, response has what we want. Continue the post-processing.
} }
// Closure created by NewCallback deletes itself at the end of Run. // Closure created by NewCallback deletes itself at the end of Run.
} }
MyResponse* response = new MyResponse; MyResponse* response = new MyResponse;
brpc::Controller* cntl = new brpc::Controller; brpc::Controller* cntl = new brpc::Controller;
MyService_Stub stub(&channel); MyService_Stub stub(&channel);
MyRequest request; // you don't have to new request, even in an asynchronous call. MyRequest request; // you don't have to new request, even in an asynchronous call.
request.set_foo(...); request.set_foo(...);
cntl->set_timeout_ms(...); cntl->set_timeout_ms(...);
stub.some_method(cntl, &request, response, google::protobuf::NewCallback(OnRPCDone, response, cntl)); stub.some_method(cntl, &request, response, google::protobuf::NewCallback(OnRPCDone, response, cntl));
``` ```
Since protobuf 3 changes NewCallback to private, brpc puts NewCallback in [src/brpc/callback.h](https://github.com/brpc/brpc/blob/master/src/brpc/callback.h) after r32035 (and adds more overloads). If your program has compilation issues with NewCallback, replace google::protobuf::NewCallback with brpc::NewCallback. Since protobuf 3 changes NewCallback to private, brpc puts NewCallback in [src/brpc/callback.h](https://github.com/brpc/brpc/blob/master/src/brpc/callback.h) after r32035 (and adds more overloads). If your program has compilation issues with NewCallback, replace google::protobuf::NewCallback with brpc::NewCallback.
### Inherit google::protobuf::Closure ### Inherit google::protobuf::Closure
...@@ -258,21 +258,21 @@ public: ...@@ -258,21 +258,21 @@ public:
void Run() { void Run() {
// unique_ptr helps us to delete response/cntl automatically. unique_ptr in gcc 3.4 is an emulated version. // unique_ptr helps us to delete response/cntl automatically. unique_ptr in gcc 3.4 is an emulated version.
std::unique_ptr<OnRPCDone> self_guard(this); std::unique_ptr<OnRPCDone> self_guard(this);
if (cntl->Failed()) { if (cntl->Failed()) {
// RPC failed. fields in response are undefined, don't use. // RPC failed. fields in response are undefined, don't use.
} else { } else {
// RPC succeeded, response has what we want. Continue the post-processing. // RPC succeeded, response has what we want. Continue the post-processing.
} }
} }
MyResponse response; MyResponse response;
brpc::Controller cntl; brpc::Controller cntl;
} }
OnRPCDone* done = new OnRPCDone; OnRPCDone* done = new OnRPCDone;
MyService_Stub stub(&channel); MyService_Stub stub(&channel);
MyRequest request; // you don't have to new request, even in an asynchronous call. MyRequest request; // you don't have to new request, even in an asynchronous call.
request.set_foo(...); request.set_foo(...);
done->cntl.set_timeout_ms(...); done->cntl.set_timeout_ms(...);
...@@ -301,17 +301,17 @@ stub.method2(controller2, request2, response2, done2); ...@@ -301,17 +301,17 @@ stub.method2(controller2, request2, response2, done2);
brpc::Join(cid1); brpc::Join(cid1);
brpc::Join(cid2); brpc::Join(cid2);
``` ```
Call `Controller.call_id()` to get an id **before launching RPC**, join the id after the RPC. Call `Controller.call_id()` to get an id **before launching RPC**, join the id after the RPC.
Join() blocks until completion of RPC **and end of done->Run()**, properties of Join: Join() blocks until completion of RPC **and end of done->Run()**, properties of Join:
- If the RPC is complete, Join() returns immediately. - If the RPC is complete, Join() returns immediately.
- Multiple threads can Join() one id, all of them will be woken up. - Multiple threads can Join() one id, all of them will be woken up.
- Synchronous RPC can be Join()-ed in another thread, although we rarely do this. - Synchronous RPC can be Join()-ed in another thread, although we rarely do this.
Join() was called JoinResponse() before, if you meet deprecated issues during compilation, rename to Join(). Join() was called JoinResponse() before, if you meet deprecated issues during compilation, rename to Join().
Calling `Join(controller->call_id())` after completion of RPC is **wrong**, do save call_id before RPC, otherwise the controller may be deleted by done at any time. The Join in following code is **wrong**. Calling `Join(controller->call_id())` after completion of RPC is **wrong**, do save call_id before RPC, otherwise the controller may be deleted by done at any time. The Join in following code is **wrong**.
```c++ ```c++
static void on_rpc_done(Controller* controller, MyResponse* response) { static void on_rpc_done(Controller* controller, MyResponse* response) {
...@@ -319,7 +319,7 @@ static void on_rpc_done(Controller* controller, MyResponse* response) { ...@@ -319,7 +319,7 @@ static void on_rpc_done(Controller* controller, MyResponse* response) {
delete controller; delete controller;
delete response; delete response;
} }
Controller* controller1 = new Controller; Controller* controller1 = new Controller;
Controller* controller2 = new Controller; Controller* controller2 = new Controller;
MyResponse* response1 = new MyResponse; MyResponse* response1 = new MyResponse;
...@@ -334,7 +334,7 @@ brpc::Join(controller2->call_id()); // WRONG, controller2 may be deleted by on ...@@ -334,7 +334,7 @@ brpc::Join(controller2->call_id()); // WRONG, controller2 may be deleted by on
## Semi-synchronous ## Semi-synchronous
Join can be used for implementing "Semi-synchronous" call: blocks until multiple asynchronous calls to complete. Since the callsite blocks until completion of all RPC, controller/response can be put on stack safely. Join can be used for implementing "Semi-synchronous" call: blocks until multiple asynchronous calls to complete. Since the callsite blocks until completion of all RPC, controller/response can be put on stack safely.
```c++ ```c++
brpc::Controller cntl1; brpc::Controller cntl1;
brpc::Controller cntl2; brpc::Controller cntl2;
...@@ -347,30 +347,30 @@ stub2.method2(&cntl2, &request2, &response2, brpc::DoNothing()); ...@@ -347,30 +347,30 @@ stub2.method2(&cntl2, &request2, &response2, brpc::DoNothing());
brpc::Join(cntl1.call_id()); brpc::Join(cntl1.call_id());
brpc::Join(cntl2.call_id()); brpc::Join(cntl2.call_id());
``` ```
brpc::DoNothing() gets a closure doing nothing, specifically for semi-synchronous calls. Its lifetime is managed by brpc. brpc::DoNothing() gets a closure doing nothing, specifically for semi-synchronous calls. Its lifetime is managed by brpc.
Note that in above example, we access `controller.call_id()` after completion of RPC, which is safe right here, because DoNothing does not delete controller as in `on_rpc_done` in previous example. Note that in above example, we access `controller.call_id()` after completion of RPC, which is safe right here, because DoNothing does not delete controller as in `on_rpc_done` in previous example.
## Cancel RPC ## Cancel RPC
`brpc::StartCancel(call_id)` cancels corresponding RPC, call_id must be got from Controller.call_id() **before launching RPC**, race conditions may occur at any other time. `brpc::StartCancel(call_id)` cancels corresponding RPC, call_id must be got from Controller.call_id() **before launching RPC**, race conditions may occur at any other time.
NOTE: it is `brpc::StartCancel(call_id)`, not `controller->StartCancel()`, which is forbidden and useless. The latter one is provided by protobuf by default and has serious race conditions on lifetime of controller. NOTE: it is `brpc::StartCancel(call_id)`, not `controller->StartCancel()`, which is forbidden and useless. The latter one is provided by protobuf by default and has serious race conditions on lifetime of controller.
As the name implies, RPC may not complete yet after calling StartCancel, you should not touch any field in Controller or delete any associated resources, they should be handled inside done->Run(). If you have to wait for completion of RPC in-place(not recommended), call Join(call_id). As the name implies, RPC may not complete yet after calling StartCancel, you should not touch any field in Controller or delete any associated resources, they should be handled inside done->Run(). If you have to wait for completion of RPC in-place(not recommended), call Join(call_id).
Facts about StartCancel: Facts about StartCancel:
- call_id can be cancelled before CallMethod, the RPC will end immediately(and done will be called). - call_id can be cancelled before CallMethod, the RPC will end immediately(and done will be called).
- call_id can be cancelled in another thread. - call_id can be cancelled in another thread.
- Cancel an already-cancelled call_id has no effect. Inference: One call_id can be cancelled by multiple threads simultaneously, but only one of them takes effect. - Cancel an already-cancelled call_id has no effect. Inference: One call_id can be cancelled by multiple threads simultaneously, but only one of them takes effect.
- Cancel here is a client-only feature, **the server-side may not cancel the operation necessarily**, server cancelation is a separate feature. - Cancel here is a client-only feature, **the server-side may not cancel the operation necessarily**, server cancelation is a separate feature.
## Get server-side address and port ## Get server-side address and port
remote_side() tells where request was sent to, the return type is [butil::EndPoint](https://github.com/brpc/brpc/blob/master/src/butil/endpoint.h), which includes an ipv4 address and port. Calling this method before completion of RPC is undefined. remote_side() tells where request was sent to, the return type is [butil::EndPoint](https://github.com/brpc/brpc/blob/master/src/butil/endpoint.h), which includes an ipv4 address and port. Calling this method before completion of RPC is undefined.
How to print: How to print:
```c++ ```c++
LOG(INFO) << "remote_side=" << cntl->remote_side(); LOG(INFO) << "remote_side=" << cntl->remote_side();
printf("remote_side=%s\n", butil::endpoint2str(cntl->remote_side()).c_str()); printf("remote_side=%s\n", butil::endpoint2str(cntl->remote_side()).c_str());
...@@ -379,16 +379,16 @@ printf("remote_side=%s\n", butil::endpoint2str(cntl->remote_side()).c_str()); ...@@ -379,16 +379,16 @@ printf("remote_side=%s\n", butil::endpoint2str(cntl->remote_side()).c_str());
local_side() gets address and port of the client-side sending RPC after r31384 local_side() gets address and port of the client-side sending RPC after r31384
How to print: How to print:
```c++ ```c++
LOG(INFO) << "local_side=" << cntl->local_side(); LOG(INFO) << "local_side=" << cntl->local_side();
printf("local_side=%s\n", butil::endpoint2str(cntl->local_side()).c_str()); printf("local_side=%s\n", butil::endpoint2str(cntl->local_side()).c_str());
``` ```
## Should brpc::Controller be reused? ## Should brpc::Controller be reused?
Not necessary to reuse deliberately. Not necessary to reuse deliberately.
Controller has miscellaneous fields, some of them are buffers that can be re-used by calling Reset(). Controller has miscellaneous fields, some of them are buffers that can be re-used by calling Reset().
In most use cases, constructing a Controller(snippet1) and re-using a Controller(snippet2) perform similarily. In most use cases, constructing a Controller(snippet1) and re-using a Controller(snippet2) perform similarily.
```c++ ```c++
...@@ -398,7 +398,7 @@ for (int i = 0; i < n; ++i) { ...@@ -398,7 +398,7 @@ for (int i = 0; i < n; ++i) {
... ...
stub.CallSomething(..., &controller); stub.CallSomething(..., &controller);
} }
// snippet2 // snippet2
brpc::Controller controller; brpc::Controller controller;
for (int i = 0; i < n; ++i) { for (int i = 0; i < n; ++i) {
...@@ -407,30 +407,30 @@ for (int i = 0; i < n; ++i) { ...@@ -407,30 +407,30 @@ for (int i = 0; i < n; ++i) {
stub.CallSomething(..., &controller); stub.CallSomething(..., &controller);
} }
``` ```
If the Controller in snippet1 is new-ed on heap, snippet1 has extra cost of "heap allcation" and may be a little slower in some cases. If the Controller in snippet1 is new-ed on heap, snippet1 has extra cost of "heap allcation" and may be a little slower in some cases.
# Settings # Settings
Client-side settings has 3 parts: Client-side settings has 3 parts:
- brpc::ChannelOptions: defined in [src/brpc/channel.h](https://github.com/brpc/brpc/blob/master/src/brpc/channel.h), for initializing Channel, becoming immutable once the initialization is done. - brpc::ChannelOptions: defined in [src/brpc/channel.h](https://github.com/brpc/brpc/blob/master/src/brpc/channel.h), for initializing Channel, becoming immutable once the initialization is done.
- brpc::Controller: defined in [src/brpc/controller.h](https://github.com/brpc/brpc/blob/master/src/brpc/controller.h), for overriding fields in brpc::ChannelOptions for some RPC according to contexts. - brpc::Controller: defined in [src/brpc/controller.h](https://github.com/brpc/brpc/blob/master/src/brpc/controller.h), for overriding fields in brpc::ChannelOptions for some RPC according to contexts.
- global gflags: for tuning global behaviors, being unchanged generally. Read comments in [/flags](flags.md) before setting. - global gflags: for tuning global behaviors, being unchanged generally. Read comments in [/flags](flags.md) before setting.
Controller contains data and options that request may not have. server and client share the same Controller class, but they may set different fields. Read comments in Controller carefully before using. Controller contains data and options that request may not have. server and client share the same Controller class, but they may set different fields. Read comments in Controller carefully before using.
A Controller corresponds to a RPC. A Controller can be re-used by another RPC after Reset(), but a Controller can't be used by multiple RPC simultaneously, no matter the RPCs are started from one thread or not. A Controller corresponds to a RPC. A Controller can be re-used by another RPC after Reset(), but a Controller can't be used by multiple RPC simultaneously, no matter the RPCs are started from one thread or not.
Properties of Controller: Properties of Controller:
1. A Controller can only have one user. Without explicit statement, methods in Controller are **not** thread-safe by default. 1. A Controller can only have one user. Without explicit statement, methods in Controller are **not** thread-safe by default.
2. Due to the fact that Controller is not shared generally, there's no need to manage Controller by shared_ptr. If you do, something might goes wrong. 2. Due to the fact that Controller is not shared generally, there's no need to manage Controller by shared_ptr. If you do, something might goes wrong.
3. Controller is constructed before RPC and destructed after RPC, some common patterns: 3. Controller is constructed before RPC and destructed after RPC, some common patterns:
- Put Controller on stack before synchronous RPC, be destructed when out of scope. Note that Controller of asynchronous RPC **must not** be put on stack, otherwise the RPC may still run when the Controller is being destructed and result in undefined behavior. - Put Controller on stack before synchronous RPC, be destructed when out of scope. Note that Controller of asynchronous RPC **must not** be put on stack, otherwise the RPC may still run when the Controller is being destructed and result in undefined behavior.
- new Controller before asynchronous RPC, delete in done. - new Controller before asynchronous RPC, delete in done.
## Timeout ## Timeout
**ChannelOptions.timeout_ms** is timeout in milliseconds for all RPCs via the Channel, Controller.set_timeout_ms() overrides value for one RPC. Default value is 1 second, Maximum value is 2^31 (about 24 days), -1 means wait indefinitely for response or connection error. **ChannelOptions.timeout_ms** is timeout in milliseconds for all RPCs via the Channel, Controller.set_timeout_ms() overrides value for one RPC. Default value is 1 second, Maximum value is 2^31 (about 24 days), -1 means wait indefinitely for response or connection error.
**ChannelOptions.connect_timeout_ms** is timeout in milliseconds for connecting part of all RPC via the Channel, Default value is 1 second, and -1 means no timeout for connecting. This value is limited to be never greater than timeout_ms. Note that this timeout is different from the connection timeout in TCP, generally this timeout is smaller otherwise establishment of the connection may fail before this timeout due to timeout in TCP layer. **ChannelOptions.connect_timeout_ms** is timeout in milliseconds for connecting part of all RPC via the Channel, Default value is 1 second, and -1 means no timeout for connecting. This value is limited to be never greater than timeout_ms. Note that this timeout is different from the connection timeout in TCP, generally this timeout is smaller otherwise establishment of the connection may fail before this timeout due to timeout in TCP layer.
...@@ -440,7 +440,7 @@ NOTE2: error code of RPC timeout is **ERPCTIMEDOUT (1008) **, ETIMEDOUT is conne ...@@ -440,7 +440,7 @@ NOTE2: error code of RPC timeout is **ERPCTIMEDOUT (1008) **, ETIMEDOUT is conne
## Retry ## Retry
ChannelOptions.max_retry is maximum retrying count for all RPC via the channel, Controller.set_max_retry() overrides value for one RPC. Default value is 3. 0 means no retries. ChannelOptions.max_retry is maximum retrying count for all RPC via the channel, Controller.set_max_retry() overrides value for one RPC. Default value is 3. 0 means no retries.
Controller.retried_count() returns number of retries. Controller.retried_count() returns number of retries.
...@@ -448,37 +448,37 @@ Controller.has_backup_request() tells if backup_request was sent. ...@@ -448,37 +448,37 @@ Controller.has_backup_request() tells if backup_request was sent.
**servers tried before are not retried by best efforts** **servers tried before are not retried by best efforts**
Conditions for retrying (AND relations): Conditions for retrying (AND relations):
- Broken connection. - Broken connection.
- Timeout is not reached. - Timeout is not reached.
- Has retrying quota. Controller.set_max_retry(0) or ChannelOptions.max_retry = 0 disables retries. - Has retrying quota. Controller.set_max_retry(0) or ChannelOptions.max_retry = 0 disables retries.
- The retry makes sense. If the RPC fails due to request(EREQUEST), no retry will be done because server is very likely to reject the request again, retrying makes no sense here. - The retry makes sense. If the RPC fails due to request(EREQUEST), no retry will be done because server is very likely to reject the request again, retrying makes no sense here.
### Broken connection ### Broken connection
If the server does not respond and connection is good, retry is not triggered. If you need to send another request after some timeout, use backup request. If the server does not respond and connection is good, retry is not triggered. If you need to send another request after some timeout, use backup request.
How it works: If response does not return within the timeout specified by backup_request_ms, send another request, take whatever the first returned. New request will be sent to a different server that never tried before by best efforts. NOTE: If backup_request_ms is greater than timeout_ms, backup request will never be sent. backup request consumes one retry. backup request does not imply a server-side cancellation. How it works: If response does not return within the timeout specified by backup_request_ms, send another request, take whatever the first returned. New request will be sent to a different server that never tried before by best efforts. NOTE: If backup_request_ms is greater than timeout_ms, backup request will never be sent. backup request consumes one retry. backup request does not imply a server-side cancellation.
ChannelOptions.backup_request_ms affects all RPC via the Channel, unit is milliseconds, Default value is -1(disabled), Controller.set_backup_request_ms() overrides value for one RPC. ChannelOptions.backup_request_ms affects all RPC via the Channel, unit is milliseconds, Default value is -1(disabled), Controller.set_backup_request_ms() overrides value for one RPC.
### Timeout is not reached ### Timeout is not reached
RPC will be ended soon after the timeout. RPC will be ended soon after the timeout.
### Has retrying quota ### Has retrying quota
Controller.set_max_retry(0) or ChannelOptions.max_retry = 0 disables retries. Controller.set_max_retry(0) or ChannelOptions.max_retry = 0 disables retries.
### The retry makes sense ### The retry makes sense
If the RPC fails due to request(EREQUEST), no retry will be done because server is very likely to reject the request again, retrying makes no sense here. If the RPC fails due to request(EREQUEST), no retry will be done because server is very likely to reject the request again, retrying makes no sense here.
Users can inherit [brpc::RetryPolicy](https://github.com/brpc/brpc/blob/master/src/brpc/retry_policy.h) to customize conditions of retrying. For example brpc does not retry for HTTP related errors by default. If you want to retry for HTTP_STATUS_FORBIDDEN(403) in your app, you can do as follows: Users can inherit [brpc::RetryPolicy](https://github.com/brpc/brpc/blob/master/src/brpc/retry_policy.h) to customize conditions of retrying. For example brpc does not retry for HTTP related errors by default. If you want to retry for HTTP_STATUS_FORBIDDEN(403) in your app, you can do as follows:
```c++ ```c++
#include <brpc/retry_policy.h> #include <brpc/retry_policy.h>
class MyRetryPolicy : public brpc::RetryPolicy { class MyRetryPolicy : public brpc::RetryPolicy {
public: public:
bool DoRetry(const brpc::Controller* cntl) const { bool DoRetry(const brpc::Controller* cntl) const {
...@@ -491,9 +491,9 @@ public: ...@@ -491,9 +491,9 @@ public:
} }
}; };
... ...
// Assign the instance to ChannelOptions.retry_policy. // Assign the instance to ChannelOptions.retry_policy.
// NOTE: retry_policy must be kept valid during lifetime of Channel, and Channel does not retry_policy, so in most cases RetryPolicy should be created by singleton.. // NOTE: retry_policy must be kept valid during lifetime of Channel, and Channel does not retry_policy, so in most cases RetryPolicy should be created by singleton..
brpc::ChannelOptions options; brpc::ChannelOptions options;
static MyRetryPolicy g_my_retry_policy; static MyRetryPolicy g_my_retry_policy;
options.retry_policy = &g_my_retry_policy; options.retry_policy = &g_my_retry_policy;
...@@ -515,41 +515,41 @@ The default protocol used by Channel is baidu_std, which is changeable by settin ...@@ -515,41 +515,41 @@ The default protocol used by Channel is baidu_std, which is changeable by settin
Supported protocols: Supported protocols:
- PROTOCOL_BAIDU_STD or "baidu_std", which is [the standard binary protocol inside Baidu](baidu_std.md), using single connection by default. - PROTOCOL_BAIDU_STD or "baidu_std", which is [the standard binary protocol inside Baidu](baidu_std.md), using single connection by default.
- PROTOCOL_HULU_PBRPC or "hulu_pbrpc", which is protocol of hulu-pbrpc, using single connection by default. - PROTOCOL_HULU_PBRPC or "hulu_pbrpc", which is protocol of hulu-pbrpc, using single connection by default.
- PROTOCOL_NOVA_PBRPC or "nova_pbrpc", which is protocol of Baidu ads union, using pooled connection by default. - PROTOCOL_NOVA_PBRPC or "nova_pbrpc", which is protocol of Baidu ads union, using pooled connection by default.
- PROTOCOL_HTTP or "http", which is http 1.0 or 1.1, using pooled connection by default (Keep-Alive). Check out [Access HTTP service](http_client.md) for details. - PROTOCOL_HTTP or "http", which is http 1.0 or 1.1, using pooled connection by default (Keep-Alive). Check out [Access HTTP service](http_client.md) for details.
- PROTOCOL_SOFA_PBRPC or "sofa_pbrpc", which is protocol of sofa-pbrpc, using single connection by default. - PROTOCOL_SOFA_PBRPC or "sofa_pbrpc", which is protocol of sofa-pbrpc, using single connection by default.
- PROTOCOL_PUBLIC_PBRPC or "public_pbrpc", which is protocol of public_pbrpc, using pooled connection by default. - PROTOCOL_PUBLIC_PBRPC or "public_pbrpc", which is protocol of public_pbrpc, using pooled connection by default.
- PROTOCOL_UBRPC_COMPACK or "ubrpc_compack", which is protocol of public/ubrpc, packing with compack, using pooled connection by default. check out [ubrpc (by protobuf)](ub_client.md) for details. A related protocol is PROTOCOL_UBRPC_MCPACK2 or ubrpc_mcpack2, packing with mcpack2. - PROTOCOL_UBRPC_COMPACK or "ubrpc_compack", which is protocol of public/ubrpc, packing with compack, using pooled connection by default. check out [ubrpc (by protobuf)](ub_client.md) for details. A related protocol is PROTOCOL_UBRPC_MCPACK2 or ubrpc_mcpack2, packing with mcpack2.
- PROTOCOL_NSHEAD_CLIENT or "nshead_client", which is required by UBXXXRequest in baidu-rpc-ub, using pooled connection by default. Check out [Access UB](ub_client.md) for details. - PROTOCOL_NSHEAD_CLIENT or "nshead_client", which is required by UBXXXRequest in baidu-rpc-ub, using pooled connection by default. Check out [Access UB](ub_client.md) for details.
- PROTOCOL_NSHEAD or "nshead", which is required by sending NsheadMessage, using pooled connection by default. Check out [nshead+blob](ub_client.md#nshead-blob) for details. - PROTOCOL_NSHEAD or "nshead", which is required by sending NsheadMessage, using pooled connection by default. Check out [nshead+blob](ub_client.md#nshead-blob) for details.
- PROTOCOL_MEMCACHE or "memcache", which is binary protocol of memcached, using **single connection** by default. Check out [access memcached](memcache_client.md) for details. - PROTOCOL_MEMCACHE or "memcache", which is binary protocol of memcached, using **single connection** by default. Check out [access memcached](memcache_client.md) for details.
- PROTOCOL_REDIS or "redis", which is protocol of redis 1.2+ (the one supported by hiredis), using **single connection** by default. Check out [Access Redis](redis_client.md) for details. - PROTOCOL_REDIS or "redis", which is protocol of redis 1.2+ (the one supported by hiredis), using **single connection** by default. Check out [Access Redis](redis_client.md) for details.
- PROTOCOL_NSHEAD_MCPACK or "nshead_mcpack", which is as the name implies, nshead + mcpack (parsed by protobuf via mcpack2pb), using pooled connection by default. - PROTOCOL_NSHEAD_MCPACK or "nshead_mcpack", which is as the name implies, nshead + mcpack (parsed by protobuf via mcpack2pb), using pooled connection by default.
- PROTOCOL_ESP or "esp", for accessing services with esp protocol, using pooled connection by default. - PROTOCOL_ESP or "esp", for accessing services with esp protocol, using pooled connection by default.
## Connection Type ## Connection Type
brpc supports following connection types: brpc supports following connection types:
- short connection: Established before each RPC, closed after completion. Since each RPC has to pay the overhead of establishing connection, this type is used for occasionally launched RPC, not frequently launched ones. No protocol use this type by default. Connections in http 1.0 are handled similarly as short connections. - short connection: Established before each RPC, closed after completion. Since each RPC has to pay the overhead of establishing connection, this type is used for occasionally launched RPC, not frequently launched ones. No protocol use this type by default. Connections in http 1.0 are handled similarly as short connections.
- pooled connection: Pick an idle connection from a pool before each RPC, return after completion. One connection carries at most one request at the same time. One client may have multiple connections to one server. http and the protocols using nshead use this type by default. - pooled connection: Pick an unused connection from a pool before each RPC, return after completion. One connection carries at most one request at the same time. One client may have multiple connections to one server. http and the protocols using nshead use this type by default.
- single connection: all clients in one process has at most one connection to one server, one connection may carry multiple requests at the same time. The sequence of returning responses does not need to be same as sending requests. This type is used by baidu_std, hulu_pbrpc, sofa_pbrpc by default. - single connection: all clients in one process has at most one connection to one server, one connection may carry multiple requests at the same time. The sequence of received responses does not need to be same as sending requests. This type is used by baidu_std, hulu_pbrpc, sofa_pbrpc by default.
| | short connection | pooled connection | single connection | | | short connection | pooled connection | single connection |
| --------------------------- | ---------------------------------------- | --------------------------------------- | ---------------------------------------- | | ---------------------------------------- | ---------------------------------------- | --------------------------------------- | ---------------------------------------- |
| long connection | no | yes | yes | | long connection | no | yes | yes |
| \#connection at server-side | qps*latency ([little's law](https://en.wikipedia.org/wiki/Little%27s_law)) | qps*latency | 1 | | \#connection at server-side (from a client) | qps*latency ([little's law](https://en.wikipedia.org/wiki/Little%27s_law)) | qps*latency | 1 |
| peak qps | bad, and limited by max number of ports | medium | high | | peak qps | bad, and limited by max number of ports | medium | high |
| latency | 1.5RTT(connect) + 1RTT + processing time | 1RTT + processing time | 1RTT + processing time | | latency | 1.5RTT(connect) + 1RTT + processing time | 1RTT + processing time | 1RTT + processing time |
| cpu usage | high, tcp connect for each RPC | medium, every request needs a sys write | low, writes can be combined to reduce overhead. | | cpu usage | high, tcp connect for each RPC | medium, every request needs a sys write | low, writes can be combined to reduce overhead. |
brpc chooses best connection type for the protocol by default, users generally have no need to change it. If you do, set ChannelOptions.connection_type to: brpc chooses best connection type for the protocol by default, users generally have no need to change it. If you do, set ChannelOptions.connection_type to:
- CONNECTION_TYPE_SINGLE or "single" : single connection - CONNECTION_TYPE_SINGLE or "single" : single connection
- CONNECTION_TYPE_POOLED or "pooled": pooled connection. Max number of connections to one server is limited by -max_connection_pool_size: - CONNECTION_TYPE_POOLED or "pooled": pooled connection. Max number of connections from one client to one server is limited by -max_connection_pool_size:
| Name | Value | Description | Defined At | | Name | Value | Description | Defined At |
| ---------------------------- | ----- | ---------------------------------------- | ------------------- | | ---------------------------- | ----- | ---------------------------------------- | ------------------- |
...@@ -559,7 +559,7 @@ brpc chooses best connection type for the protocol by default, users generally h ...@@ -559,7 +559,7 @@ brpc chooses best connection type for the protocol by default, users generally h
- "" (empty string) makes brpc chooses the default one. - "" (empty string) makes brpc chooses the default one.
brpc also supports [Streaming RPC](streaming_rpc.md) which is an application-level connection for transferring streaming data. brpc also supports [Streaming RPC](streaming_rpc.md) which is an application-level connection for transferring streaming data.
## Close idle connections in pools ## Close idle connections in pools
...@@ -572,7 +572,7 @@ If a connection has no read or write within the seconds specified by -idle_timeo ...@@ -572,7 +572,7 @@ If a connection has no read or write within the seconds specified by -idle_timeo
## Defer connection close ## Defer connection close
Multiple channels may share a connection via referential counting. When a channel releases last reference of the connection, the connection will be closed. But in some scenarios, channels are created just before sending RPC and destroyed after completion, in which case connections are probably closed and re-open again frequently, as costly as short connections. Multiple channels may share a connection via referential counting. When a channel releases last reference of the connection, the connection will be closed. But in some scenarios, channels are created just before sending RPC and destroyed after completion, in which case connections are probably closed and re-open again frequently, as costly as short connections.
One solution is to cache channels commonly used by user, which avoids frequent creation and destroying of channels. However brpc does not offer an utility for doing this right now, and it's not trivial for users to implement it correctly. One solution is to cache channels commonly used by user, which avoids frequent creation and destroying of channels. However brpc does not offer an utility for doing this right now, and it's not trivial for users to implement it correctly.
...@@ -582,13 +582,13 @@ Another solution is setting gflag -defer_close_second ...@@ -582,13 +582,13 @@ Another solution is setting gflag -defer_close_second
| ------------------ | ----- | ---------------------------------------- | ----------------------- | | ------------------ | ----- | ---------------------------------------- | ----------------------- |
| defer_close_second | 0 | Defer close of connections for so many seconds even if the connection is not used by anyone. Close immediately for non-positive values | src/brpc/socket_map.cpp | | defer_close_second | 0 | Defer close of connections for so many seconds even if the connection is not used by anyone. Close immediately for non-positive values | src/brpc/socket_map.cpp |
After setting, connection is not closed immediately after last referential count, instead it will be closed after so many seconds. If a channel references the connection again during the wait, the connection resumes to normal. No matter how frequent channels are created, this flag limits the frequency of closing connections. Side effect of the flag is that file descriptors are not closed immediately after destroying of channels, if the flag is wrongly set to be large, number of used file descriptors in the process may be large as well. After setting, connection is not closed immediately after last referential count, instead it will be closed after so many seconds. If a channel references the connection again during the wait, the connection resumes to normal. No matter how frequent channels are created, this flag limits the frequency of closing connections. Side effect of the flag is that file descriptors are not closed immediately after destroying of channels, if the flag is wrongly set to be large, number of active file descriptors in the process may be large as well.
## 连接的缓冲区大小 ## Buffer size of connections
-socket_recv_buffer_size设置所有连接的接收缓冲区大小, 默认-1(不修改) -socket_recv_buffer_size sets receiving buffer size of all connections, -1 by default (not modified)
-socket_send_buffer_size设置所有连接的发送缓冲区大小, 默认-1(不修改) -socket_send_buffer_size sets sending buffer size of all connections, -1 by default (not modified)
| Name | Value | Description | Defined At | | Name | Value | Description | Defined At |
| ----------------------- | ----- | ---------------------------------------- | ------------------- | | ----------------------- | ----- | ---------------------------------------- | ------------------- |
...@@ -597,49 +597,38 @@ After setting, connection is not closed immediately after last referential coun ...@@ -597,49 +597,38 @@ After setting, connection is not closed immediately after last referential coun
## log_id ## log_id
通过set_log_id()可设置log_id. 这个id会被送到服务器端, 一般会被打在日志里, 从而把一次检索经过的所有服务串联起来. 不同产品线可能有不同的叫法. 一些产品线有字符串格式的"s值", 内容也是64位的16进制数, 可以转成整型后再设入log_id. set_log_id() sets a 64-bit integral log_id, which is sent to the server-side along with the request, and often printed in server logs to associate different services accessed in a session. String-type log-id must be converted to 64-bit integer before setting.
## 附件 ## Attachment
标准协议和hulu协议支持附件, 这段数据由用户自定义, 不经过protobuf的序列化. 站在client的角度, 设置在Controller::request_attachment()的附件会被server端收到, response_attachment()则包含了server端送回的附件. 附件不受压缩选项影响. baidu_std and hulu_pbrpc supports attachment, which is set by user to bypass serialization of protobuf. As a client, the data in Controller::request_attachment() will be received by the server and response_attachment() contains attachment sent back by the server. Attachment is not compressed by brpc.
在http协议中, 附件对应[message body](http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html), 比如要POST的数据就设置在request_attachment()中. In http, attachment corresponds to [message body](http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html), namely the data to post is stored in request_attachment().
## giano认证 ## Authentication
``` TODO: Describe how authentication methods are extended.
// Create a baas::CredentialGenerator using Giano's API ## Reset
baas::CredentialGenerator generator = CREATE_MOCK_PERSONAL_GENERATOR(
"mock_user", "mock_roles", "mock_group", baas::sdk::BAAS_OK);
// Create a brpc::policy::GianoAuthenticator using the generator we just created
// and then pass it into brpc::ChannelOptions
brpc::policy::GianoAuthenticator auth(&generator, NULL);
brpc::ChannelOptions option;
option.auth = &auth;
```
首先通过调用Giano API生成验证器baas::CredentialGenerator, 具体可参看[Giano快速上手手册.pdf](http://wiki.baidu.com/download/attachments/37774685/Giano%E5%BF%A
B%E9%80%9F%E4%B8%8A%E6%89%8B%E6%89%8B%E5%86%8C.pdf?version=1&modificationDate=1421990746000&api=v2). 然后按照如上代码一步步将其设置到brpc::ChannelOptions里去.
当client设置认证后, 任何一个新连接建立后都必须首先发送一段验证信息(通过Giano认证器生成), 才能发送后续请求. 认证成功后, 该连接上的后续请求不会再带有验证消息. This method makes Controller back to the state as if it's just created.
## 重置 Don't call Reset() during a RPC, which is undefined.
调用Reset方法可让Controller回到刚创建时的状态. ## Compression
别在RPC结束前重置Controller, 行为是未定义的. set_request_compress_type() sets compress-type of the request, no compression by default.
## 压缩 NOTE: Attachment is not compressed by brpc.
set_request_compress_type()设置request的压缩方式, 默认不压缩. 注意: 附件不会被压缩. HTTP body的压缩方法见[client压缩request body](http_client#压缩request-body). Check out [compress request body](http_client#压缩request-body) to compress http body.
支持的压缩方法有: Supported compressions:
- brpc::CompressTypeSnappy : [snanpy压缩](http://google.github.io/snappy/), 压缩和解压显著快于其他压缩方法, 但压缩率最低. - brpc::CompressTypeSnappy : [snanpy](http://google.github.io/snappy/), compression and decompression are very fast, but compression ratio is low.
- brpc::CompressTypeGzip : [gzip压缩](http://en.wikipedia.org/wiki/Gzip), 显著慢于snappy, 但压缩率高 - brpc::CompressTypeGzip : [gzip](http://en.wikipedia.org/wiki/Gzip), significantly slower than snappy, with a higher compression ratio.
- brpc::CompressTypeZlib : [zlib压缩](http://en.wikipedia.org/wiki/Zlib), 比gzip快10%~20%, 压缩率略好于gzip, 但速度仍明显慢于snappy. - brpc::CompressTypeZlib : [zlib](http://en.wikipedia.org/wiki/Zlib), 10%~20% faster than gzip but still significantly slower than snappy, with slightly better compression ratio than gzip.
下表是多种压缩算法应对重复率很高的数据时的性能, 仅供参考. Following table lists performance of different methods compressing and decompressing **data with a lot of duplications**, just for reference.
| Compress method | Compress size(B) | Compress time(us) | Decompress time(us) | Compress throughput(MB/s) | Decompress throughput(MB/s) | Compress ratio | | Compress method | Compress size(B) | Compress time(us) | Decompress time(us) | Compress throughput(MB/s) | Decompress throughput(MB/s) | Compress ratio |
| --------------- | ---------------- | ----------------- | ------------------- | ------------------------- | --------------------------- | -------------- | | --------------- | ---------------- | ----------------- | ------------------- | ------------------------- | --------------------------- | -------------- |
...@@ -656,7 +645,7 @@ set_request_compress_type()设置request的压缩方式, 默认不压缩. 注意 ...@@ -656,7 +645,7 @@ set_request_compress_type()设置request的压缩方式, 默认不压缩. 注意
| Gzip | 229.7803 | 82.71903 | 135.9995 | 377.7849 | 0.54% | | | Gzip | 229.7803 | 82.71903 | 135.9995 | 377.7849 | 0.54% | |
| Zlib | 240.7464 | 54.44099 | 129.8046 | 574.0161 | 0.50% | | | Zlib | 240.7464 | 54.44099 | 129.8046 | 574.0161 | 0.50% | |
下表是多种压缩算法应对重复率很低的数据时的性能, 仅供参考. Following table lists performance of different methods compressing and decompressing **data with very few duplications**, just for reference.
| Compress method | Compress size(B) | Compress time(us) | Decompress time(us) | Compress throughput(MB/s) | Decompress throughput(MB/s) | Compress ratio | | Compress method | Compress size(B) | Compress time(us) | Decompress time(us) | Compress throughput(MB/s) | Decompress throughput(MB/s) | Compress ratio |
| --------------- | ---------------- | ----------------- | ------------------- | ------------------------- | --------------------------- | -------------- | | --------------- | ---------------- | ----------------- | ------------------- | ------------------------- | --------------------------- | -------------- |
...@@ -675,19 +664,19 @@ set_request_compress_type()设置request的压缩方式, 默认不压缩. 注意 ...@@ -675,19 +664,19 @@ set_request_compress_type()设置request的压缩方式, 默认不压缩. 注意
# FAQ # FAQ
### Q: brpc能用unix domain socket吗 ### Q: Does brpc support unix domain socket?
不能. 因为同机socket并不走网络, 相比domain socket性能只会略微下降, 替换为domain socket意义不大. 以后可能会扩展支持. No. Local TCP sockets performs just a little slower than unix domain socket since traffic over local TCP sockets bypasses network. Some scenarios where TCP sockets can't be used may require unix domain sockets. We may consider the capability in future.
### Q: Fail to connect to xx.xx.xx.xx:xxxx, Connection refused是什么意思 ### Q: Fail to connect to xx.xx.xx.xx:xxxx, Connection refused
一般是对端server没打开端口(很可能挂了). The remote server does not serve any more (probably crashed).
### Q: 经常遇到Connection timedout(不在一个机房) ### Q: often met Connection timedout to another IDC
![img](../images/connection_timedout.png) ![img](../images/connection_timedout.png)
这个就是连接超时了, 调大连接和RPC超时: The TCP connection is not established within connection_timeout_ms, you have to tweak options:
```c++ ```c++
struct ChannelOptions { struct ChannelOptions {
...@@ -697,7 +686,7 @@ struct ChannelOptions { ...@@ -697,7 +686,7 @@ struct ChannelOptions {
// Default: 200 (milliseconds) // Default: 200 (milliseconds)
// Maximum: 0x7fffffff (roughly 30 days) // Maximum: 0x7fffffff (roughly 30 days)
int32_t connect_timeout_ms; int32_t connect_timeout_ms;
// Max duration of RPC over this Channel. -1 means wait indefinitely. // Max duration of RPC over this Channel. -1 means wait indefinitely.
// Overridable by Controller.set_timeout_ms(). // Overridable by Controller.set_timeout_ms().
// Default: 500 (milliseconds) // Default: 500 (milliseconds)
...@@ -707,50 +696,46 @@ struct ChannelOptions { ...@@ -707,50 +696,46 @@ struct ChannelOptions {
}; };
``` ```
注意连接超时不是RPC超时, RPC超时打印的日志是"Reached timeout=...". NOTE: Connection timeout is not RPC timeout, which is printed as "Reached timeout=...".
### Q: 为什么同步方式是好的, 异步就crash了
重点检查Controller, Response和done的生命周期. 在异步访问中, RPC调用结束并不意味着RPC整个过程结束, 而是要在done被调用后才会结束. 所以这些对象不应在调用RPC后就释放, 而是要在done里面释放. 所以你一般不能把这些对象分配在栈上, 而应该使用NewCallback等方式分配在堆上. 详见[异步访问](client.md#异步访问).
### Q: 我怎么确认server处理了我的请求 ### Q: synchronous call is good, asynchronous call crashes
不一定能. 当response返回且成功时, 我们确认这个过程一定成功了. 当response返回且失败时, 我们确认这个过程一定失败了. 但当response没有返回时, 它可能失败, 也可能成功. 如果我们选择重试, 那一个成功的过程也可能会被再执行一次. 所以一般来说RPC服务都应当考虑[幂等](http://en.wikipedia.org/wiki/Idempotence)问题, 否则重试可能会导致多次叠加副作用而产生意向不到的结果. 比如以读为主的检索服务大都没有副作用而天然幂等, 无需特殊处理. 而像写也很多的存储服务则要在设计时就加入版本号或序列号之类的机制以拒绝已经发生的过程, 保证幂等. Check lifetime of Controller, Response and done. In asynchronous call, finish of CallMethod is not completion of RPC which is entering of done->Run(). So the objects should not deleted just after CallMethod, instead they should be delete in done->Run(). Generally you should allocate the objects on heap instead of putting them on stack. Check out [Asynchronous call](client.md#asynchronous-call) for details.
### Q: BNS中机器列表已经配置了,但是RPC报"Fail to select server, No data available"错误 ### Q: How to make requests be processed once and only once
使用get_instance_by_service -s your_bns_name 来检查一下所有机器的status状态, 只有status为0的机器才能被client访问. This issue is not solved on RPC layer. When response returns and being successful, we know the RPC is processed at server-side. When response returns and being rejected, we know the RPC is not processed at server-side. But when response is not returned, server may or may not process the RPC. If we retry, same request may be processed twice at server-side. Generally RPC services with side effects must consider [idempotence](http://en.wikipedia.org/wiki/Idempotence) of the service, otherwise retries may make side effects be done more than once and result in unexpected behavior. Search services with only read often have no side effects (during a search), being idempotent natually. But storage services that need to write have to design versioning or serial-number mechanisms to reject side effects that already happen, to keep idempoent.
### Q: Invalid address=`bns://group.user-persona.dumi.nj03'是什么意思 ### Q: Invalid address=`bns://group.user-persona.dumi.nj03'
``` ```
FATAL 04-07 20:00:03 7778 src/brpc/channel.cpp:123] Invalid address=`bns://group.user-persona.dumi.nj03'. You should use Init(naming_service_name, load_balancer_name, options) to access multiple servers. FATAL 04-07 20:00:03 7778 src/brpc/channel.cpp:123] Invalid address=`bns://group.user-persona.dumi.nj03'. You should use Init(naming_service_name, load_balancer_name, options) to access multiple servers.
``` ```
访问bns要使用三个参数的Init, 它第二个参数是load_balancer_name, 而你这里用的是两个参数的Init, 框架当你是访问单点, 就会报这个错. Accessing servers under naming service needs the Init() with 3 parameters(the second param is `load_balancer_name`). The Init() here is with 2 parameters and treated by brpc as accessing single server, producing the error.
### Q: 两个产品线都使用protobuf, 为什么不能互相访问 ### Q: Both sides use protobuf, why can't they communicate with each other
协议 !=protobuf. protobuf负责打包, 协议负责定字段. 打包格式相同不意味着字段可以互通. 协议中可能会包含多个protobuf包, 以及额外的长度、校验码、magic number等等. 协议的互通是通过在RPC框架内转化为统一的编程接口完成的, 而不是在protobuf层面. 从广义上来说, protobuf也可以作为打包框架使用, 生成其他序列化格式的包, 像[idl<=>protobuf](mcpack2pb.md)就是通过protobuf生成了解析idl的代码. **protocol != protobuf**. protobuf serializes one package and a message of a protocol may contain multiple packages along with extra lengths, checksums, magic numbers. The capability offered by brpc that "write code once and serve multiple protocols" is implemented by converting data from different protocols to unified API, not on protobuf layer.
### Q: 为什么C++ client/server 能够互相通信, 和其他语言的client/server 通信会报序列化失败的错误 ### Q: Why C++ client/server may fail to talk to client/server in other languages
检查一下C++ 版本是否开启了压缩 (Controller::set_compress_type), 目前 python/JAVA版的rpc框架还没有实现压缩, 互相返回会出现问题. Check if the C++ version turns on compression (Controller::set_compress_type), Currently RPC impl. in other languages do not support compression yet.
# 附:Client端基本流程 # PS: Workflow at Client-side
![img](../images/client_side.png) ![img](../images/client_side.png)
主要步骤: Steps:
1. 创建一个[bthread_id](https://github.com/brpc/brpc/blob/master/src/bthread/id.h)作为本次RPC的correlation_id. 1. Create a [bthread_id](https://github.com/brpc/brpc/blob/master/src/bthread/id.h) as correlation_id of current RPC.
2. 根据Channel的创建方式, 从进程级的[SocketMap](https://github.com/brpc/brpc/blob/master/src/brpc/socket_map.h)中或从[LoadBalancer](https://github.com/brpc/brpc/blob/master/src/brpc/load_balancer.h)中选择一台下游server作为本次RPC发送的目的地. 2. According to how the Channel is initialized, choose a server from global [SocketMap](https://github.com/brpc/brpc/blob/master/src/brpc/socket_map.h) or [LoadBalancer](https://github.com/brpc/brpc/blob/master/src/brpc/load_balancer.h) as destination of the request.
3. 根据连接方式(单连接、连接池、短连接), 选择一个[Socket](https://github.com/brpc/brpc/blob/master/src/brpc/socket.h). 3. Choose a [Socket](https://github.com/brpc/brpc/blob/master/src/brpc/socket.h) according to connection type (single, pooled, short)
4. 如果开启验证且当前Socket没有被验证过时, 第一个请求进入验证分支, 其余请求会阻塞直到第一个包含认证信息的请求写入Socket. 这是因为server端只对第一个请求进行验证. 4. If authentication is turned on and the Socket is not authenticated yet, first request enters authenticating branch, other requests block until the branch writes authenticating information into the Socket. Server-side only verifies the first request.
5. 根据Channel的协议, 选择对应的序列化函数把request序列化至[IOBuf](https://github.com/brpc/brpc/blob/master/src/butil/iobuf.h). 5. According to protocol of the Channel, choose corresponding serialization callback to serialize request into [IOBuf](https://github.com/brpc/brpc/blob/master/src/butil/iobuf.h).
6. 如果配置了超时, 设置定时器. 从这个点开始要避免使用Controller对象, 因为在设定定时器后->有可能触发超时机制->调用到用户的异步回调->用户在回调中析构Controller. 6. If timeout is set, setup timer. From this point on, avoid using Controller, since the timer may be triggered at anytime and calls user's callback for timeout, which may delete Controller.
7. 发送准备阶段结束, 若上述任何步骤出错, 会调用Channel::HandleSendFailed. 7. Sending phase is completed. If error occurs at any step, Channel::HandleSendFailed is called.
8. 将之前序列化好的IOBuf写出到Socket上, 同时传入回调Channel::HandleSocketFailed, 当连接断开、写失败等错误发生时会调用此回调. 8. Write IOBuf with serialized data into the Socket and add Channel::HandleSocketFailed into id_wait_list of the Socket. The callback will be called when the write is failed or connection is broken before completion of RPC.
9. 如果是同步发送, Join correlation_id;如果是异步则至此client端返回. 9. In synchronous call, Join correlation_id; otherwise CallMethod() returns.
10. 网络上发消息+收消息. 10. Send/receive messages to/from network.
11. 收到response后, 提取出其中的correlation_id, 在O(1)时间内找到对应的Controller. 这个过程中不需要查找全局哈希表, 有良好的多核扩展性. 11. After receiving response, get the correlation_id inside, find out associated Controller within O(1) time. The lookup does not need to lock a global hashmap, and scales well.
12. 根据协议格式反序列化response. 12. Parse response according to the protocol
13. 调用Controller::OnRPCReturned, 其中会根据错误码判断是否需要重试. 如果是异步发送, 调用用户回调. 最后摧毁correlation_id唤醒Join着的线程. 13. Call Controller::OnRPCReturned, which may retry errorous RPC, or complete the RPC. Call user's done in asynchronous call. Destroy correlation_id and wakeup joining threads.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment