FATAL 04-07 20:00:03 7778 src/brpc/channel.cpp:123] Invalid address=`bns://group.user-persona.dumi.nj03'. You should use Init(naming_service_name, load_balancer_name, options) to access multiple servers.
FATAL 04-07 20:00:03 7778 src/brpc/channel.cpp:123] Invalid address=`bns://group.user-persona.dumi.nj03'. You should use Init(naming_service_name, load_balancer_name, options) to access multiple servers.
Client-side sends requests. It's called [Channel](https://github.com/brpc/brpc/blob/master/src/brpc/channel.h) rather than "Client" in brpc. A channel represents a communication line to one server or multiple servers, which can be used for calling services.
Client-side sends requests. It's called [Channel](https://github.com/brpc/brpc/blob/master/src/brpc/channel.h) rather than "Client" in brpc. A channel represents a communication line to one server or multiple servers, which can be used for calling services.
A Channel can be **shared by all threads** in the process. Yon don't need to create separate Channels for each thread, and you don't need to synchronize Channel.CallMethod with lock. However creation and destroying of Channel is **not** thread-safe, make sure the channel is initialized and destroyed only by one thread.
A Channel can be **shared by all threads** in the process. Yon don't need to create separate Channels for each thread, and you don't need to synchronize Channel.CallMethod with lock. However creation and destroying of Channel is **not** thread-safe, make sure the channel is initialized and destroyed only by one thread.
...
@@ -47,7 +47,7 @@ Valid "server_addr_and_port":
...
@@ -47,7 +47,7 @@ Valid "server_addr_and_port":
- www.foo.com:8765
- www.foo.com:8765
- localhost:9000
- localhost:9000
Invalid "server_addr_and_port":
Invalid "server_addr_and_port":
- 127.0.0.1:90000 # too large port
- 127.0.0.1:90000 # too large port
- 10.39.2.300:8000 # invalid IP
- 10.39.2.300:8000 # invalid IP
...
@@ -58,7 +58,7 @@ int Init(const char* naming_service_url,
...
@@ -58,7 +58,7 @@ int Init(const char* naming_service_url,
constchar*load_balancer_name,
constchar*load_balancer_name,
constChannelOptions*options);
constChannelOptions*options);
```
```
Channels created by above Init() get server list from the NamingService specified by `naming_service_url` periodically or driven-by-events, and send request to one server chosen from the list according to the algorithm specified by `load_balancer_name` .
Channels created by above Init() get server list from the NamingService specified by `naming_service_url` periodically or driven-by-events, and send request to one server chosen from the list according to the algorithm specified by `load_balancer_name` .
You **should not** create such channels ad-hocly each time before a RPC, because creation and destroying of such channels relate to many resources, say NamingService needs to be accessed once at creation otherwise server candidates are unknown. On the other hand, channels are able to be shared by multiple threads safely and has no need to be created frequently.
You **should not** create such channels ad-hocly each time before a RPC, because creation and destroying of such channels relate to many resources, say NamingService needs to be accessed once at creation otherwise server candidates are unknown. On the other hand, channels are able to be shared by multiple threads safely and has no need to be created frequently.
...
@@ -66,7 +66,7 @@ If `load_balancer_name` is NULL or empty, this Init() is just the one for connec
...
@@ -66,7 +66,7 @@ If `load_balancer_name` is NULL or empty, this Init() is just the one for connec
## Naming Service
## Naming Service
Naming service maps a name to a modifiable list of servers. It's positioned as follows at client-side:
Naming service maps a name to a modifiable list of servers. It's positioned as follows at client-side:
![img](../images/ns.png)
![img](../images/ns.png)
...
@@ -78,7 +78,7 @@ General form of `naming_service_url` is "**protocol://service_name**".
...
@@ -78,7 +78,7 @@ General form of `naming_service_url` is "**protocol://service_name**".
BNS is the most common naming service inside Baidu. In "bns://rdev.matrix.all", "bns" is protocol and "rdev.matrix.all" is service-name. A related gflag is -ns_access_interval: ![img](../images/ns_access_interval.png)
BNS is the most common naming service inside Baidu. In "bns://rdev.matrix.all", "bns" is protocol and "rdev.matrix.all" is service-name. A related gflag is -ns_access_interval: ![img](../images/ns_access_interval.png)
If the list in BNS is non-empty, but Channel says "no servers", the status bit of the machine in BNS is probably non-zero, which means the machine is unavailable and as a correspondence not added as server candidates of the Channel. Status bits can be checked by:
If the list in BNS is non-empty, but Channel says "no servers", the status bit of the machine in BNS is probably non-zero, which means the machine is unavailable and as a correspondence not added as server candidates of the Channel. Status bits can be checked by:
`get_instance_by_service [bns_node_name] -s`
`get_instance_by_service [bns_node_name] -s`
...
@@ -100,7 +100,7 @@ Users can filter servers got from the NamingService before pushing to LoadBalanc
...
@@ -100,7 +100,7 @@ Users can filter servers got from the NamingService before pushing to LoadBalanc
![img](../images/ns_filter.jpg)
![img](../images/ns_filter.jpg)
Interface of the filter:
Interface of the filter:
```c++
```c++
// naming_service_filter.h
// naming_service_filter.h
classNamingServiceFilter{
classNamingServiceFilter{
...
@@ -109,7 +109,7 @@ public:
...
@@ -109,7 +109,7 @@ public:
// Return false to filter it out
// Return false to filter it out
virtualboolAccept(constServerNode&server)const=0;
virtualboolAccept(constServerNode&server)const=0;
};
};
// naming_service.h
// naming_service.h
structServerNode{
structServerNode{
butil::EndPointaddr;
butil::EndPointaddr;
...
@@ -127,7 +127,7 @@ public:
...
@@ -127,7 +127,7 @@ public:
returnserver.tag=="main";
returnserver.tag=="main";
}
}
};
};
intmain(){
intmain(){
...
...
MyNamingServiceFiltermy_filter;
MyNamingServiceFiltermy_filter;
...
@@ -144,11 +144,11 @@ When there're more than one server to access, we need to divide the traffic. The
...
@@ -144,11 +144,11 @@ When there're more than one server to access, we need to divide the traffic. The
![img](../images/lb.png)
![img](../images/lb.png)
The ideal algorithm is to make every request being processed in-time, and crash of any server makes minimal impact. However clients are not able to know delays or congestions happened at servers in realtime, and load balancing algorithms should be light-weight generally, users need to choose proper algorithms for their use cases. Algorithms provided by brpc (specified by `load_balancer_name`):
The ideal algorithm is to make every request being processed in-time, and crash of any server makes minimal impact. However clients are not able to know delays or congestions happened at servers in realtime, and load balancing algorithms should be light-weight generally, users need to choose proper algorithms for their use cases. Algorithms provided by brpc (specified by `load_balancer_name`):
### rr
### rr
which is round robin. Always choose next server inside the list, next of the last server is the first one. No other settings. For example there're 3 servers: a,b,c, brpc will send requests to a, b, c, a, b, c, … and so on. Note that presumption of using this algorithm is the machine specs, network latencies, server loads are similar.
which is round robin. Always choose next server inside the list, next of the last server is the first one. No other settings. For example there're 3 servers: a,b,c, brpc will send requests to a, b, c, a, b, c, … and so on. Note that presumption of using this algorithm is the machine specs, network latencies, server loads are similar.
### random
### random
...
@@ -164,11 +164,11 @@ which is consistent hashing. Adding or removing servers does not make destinatio
...
@@ -164,11 +164,11 @@ which is consistent hashing. Adding or removing servers does not make destinatio
Need to set Controller.set_request_code() before RPC otherwise the RPC will fail. request_code is often a 32-bit hash code of "key part" of the request, and the hashing algorithm does not need to be same with the one used by load balancer. Say `c_murmurhash` can use md5 to compute request_code of the request as well.
Need to set Controller.set_request_code() before RPC otherwise the RPC will fail. request_code is often a 32-bit hash code of "key part" of the request, and the hashing algorithm does not need to be same with the one used by load balancer. Say `c_murmurhash` can use md5 to compute request_code of the request as well.
[src/brpc/policy/hasher.h](https://github.com/brpc/brpc/blob/master/src/brpc/policy/hasher.h) includes common hash functions. If `std::string key` stands for key part of the request, controller.set_request_code(brpc::policy::MurmurHash32(key.data(), key.size())) sets request_code correctly.
[src/brpc/policy/hasher.h](https://github.com/brpc/brpc/blob/master/src/brpc/policy/hasher.h) includes common hash functions. If `std::string key` stands for key part of the request, controller.set_request_code(brpc::policy::MurmurHash32(key.data(), key.size())) sets request_code correctly.
Do distinguish "key" and "attributes" of the request. Don't compute request_code by full content of the request just for quick. Minor change in attributes may result in totally different hash code and change destination dramatically. Another cause is padding, for example: `struct Foo { int32_t a; int64_t b; }` has a 4-byte undefined gap between `a` and `b` on 64-bit machines, result of `hash(&foo, sizeof(foo))` is undefined. Fields need to be packed or serialized before hashing.
Do distinguish "key" and "attributes" of the request. Don't compute request_code by full content of the request just for quick. Minor change in attributes may result in totally different hash code and change destination dramatically. Another cause is padding, for example: `struct Foo { int32_t a; int64_t b; }` has a 4-byte undefined gap between `a` and `b` on 64-bit machines, result of `hash(&foo, sizeof(foo))` is undefined. Fields need to be packed or serialized before hashing.
Check out [Consistent Hashing](consistent_hashing.md) for more details.
Check out [Consistent Hashing](consistent_hashing.md) for more details.
## Health checking
## Health checking
...
@@ -176,13 +176,13 @@ Servers whose connections are lost are isolated temporarily to prevent them from
...
@@ -176,13 +176,13 @@ Servers whose connections are lost are isolated temporarily to prevent them from
Once a server is connected, it resumes as a server candidate inside LoadBalancer. If a server is removed from NamingService during health-checking, brpc removes it from health-checking as well.
Once a server is connected, it resumes as a server candidate inside LoadBalancer. If a server is removed from NamingService during health-checking, brpc removes it from health-checking as well.
# Launch RPC
# Launch RPC
Generally, we don't use Channel.CallMethod directly, instead we call XXX_Stub generated by protobuf, which feels more like a "method call". The stub has few member fields, being suitable(and recommended) to be put on stack instead of new(). Surely the stub can be saved and re-used as well. Channel.CallMethod and stub are both **thread-safe** and accessible by multiple threads simultaneously. For example:
Generally, we don't use Channel.CallMethod directly, instead we call XXX_Stub generated by protobuf, which feels more like a "method call". The stub has few member fields, being suitable(and recommended) to be put on stack instead of new(). Surely the stub can be saved and re-used as well. Channel.CallMethod and stub are both **thread-safe** and accessible by multiple threads simultaneously. For example:
A exception is http client, which is not related to protobuf much. Call CallMethod directly to make a http call, setting all parameters to NULL except for `Controller` and `done`, check [Access HTTP](http_client.md) for details.
A exception is http client, which is not related to protobuf much. Call CallMethod directly to make a http call, setting all parameters to NULL except for `Controller` and `done`, check [Access HTTP](http_client.md) for details.
## Synchronous call
## Synchronous call
CallMethod blocks until response from server is received or error occurred (including timedout).
CallMethod blocks until response from server is received or error occurred (including timedout).
response/controller in synchronous call will not be used by brpc again after CallMethod, they can be put on stack safely. Note: if request/response has many fields and being large on size, they'd better be allocated on heap.
response/controller in synchronous call will not be used by brpc again after CallMethod, they can be put on stack safely. Note: if request/response has many fields and being large on size, they'd better be allocated on heap.
```c++
```c++
...
@@ -203,51 +203,51 @@ MyRequest request;
...
@@ -203,51 +203,51 @@ MyRequest request;
MyResponseresponse;
MyResponseresponse;
brpc::Controllercntl;
brpc::Controllercntl;
XXX_Stubstub(&channel);
XXX_Stubstub(&channel);
request.set_foo(...);
request.set_foo(...);
cntl.set_timeout_ms(...);
cntl.set_timeout_ms(...);
stub.some_method(&cntl,&request,&response,NULL);
stub.some_method(&cntl,&request,&response,NULL);
if(cntl->Failed()){
if(cntl->Failed()){
// RPC failed. fields in response are undefined, don't use.
// RPC failed. fields in response are undefined, don't use.
}else{
}else{
// RPC succeeded, response has what we want.
// RPC succeeded, response has what we want.
}
}
```
```
## Asynchronous call
## Asynchronous call
Pass a callback `done` to CallMethod, which resumes after sending request, rather than completion of RPC. When the response from server is received or error occurred(including timedout), done->Run() is called. Post-processing code of the RPC should be put in done->Run() instead of after CallMethod.
Pass a callback `done` to CallMethod, which resumes after sending request, rather than completion of RPC. When the response from server is received or error occurred(including timedout), done->Run() is called. Post-processing code of the RPC should be put in done->Run() instead of after CallMethod.
Because end of CallMethod does not mean completion of RPC, response/controller may still be used by brpc or done->Run(). Generally they should be allocated on heap and deleted in done->Run(). If they're deleted too early, done->Run() may access invalid memory.
Because end of CallMethod does not mean completion of RPC, response/controller may still be used by brpc or done->Run(). Generally they should be allocated on heap and deleted in done->Run(). If they're deleted too early, done->Run() may access invalid memory.
You can new these objects individually and create done by [NewCallback](#use-newcallback), or make response/controller be member of done and [new them together](#Inherit-google::protobuf::Closure). Former one is recommended.
You can new these objects individually and create done by [NewCallback](#use-newcallback), or make response/controller be member of done and [new them together](#Inherit-google::protobuf::Closure). Former one is recommended.
**Request and Channel can be destroyed immediately after asynchronous CallMethod**, which is different from response/controller. Note that "immediately" means destruction of request/Channel can happen **after** CallMethod, not during CallMethod. Deleting a Channel just being used by another thread results in undefined behavior (crash at best).
**Request and Channel can be destroyed immediately after asynchronous CallMethod**, which is different from response/controller. Note that "immediately" means destruction of request/Channel can happen **after** CallMethod, not during CallMethod. Deleting a Channel just being used by another thread results in undefined behavior (crash at best).
Since protobuf 3 changes NewCallback to private, brpc puts NewCallback in [src/brpc/callback.h](https://github.com/brpc/brpc/blob/master/src/brpc/callback.h) after r32035 (and adds more overloads). If your program has compilation issues with NewCallback, replace google::protobuf::NewCallback with brpc::NewCallback.
Since protobuf 3 changes NewCallback to private, brpc puts NewCallback in [src/brpc/callback.h](https://github.com/brpc/brpc/blob/master/src/brpc/callback.h) after r32035 (and adds more overloads). If your program has compilation issues with NewCallback, replace google::protobuf::NewCallback with brpc::NewCallback.
### Inherit google::protobuf::Closure
### Inherit google::protobuf::Closure
...
@@ -258,21 +258,21 @@ public:
...
@@ -258,21 +258,21 @@ public:
voidRun(){
voidRun(){
// unique_ptr helps us to delete response/cntl automatically. unique_ptr in gcc 3.4 is an emulated version.
// unique_ptr helps us to delete response/cntl automatically. unique_ptr in gcc 3.4 is an emulated version.
std::unique_ptr<OnRPCDone>self_guard(this);
std::unique_ptr<OnRPCDone>self_guard(this);
if(cntl->Failed()){
if(cntl->Failed()){
// RPC failed. fields in response are undefined, don't use.
// RPC failed. fields in response are undefined, don't use.
}else{
}else{
// RPC succeeded, response has what we want. Continue the post-processing.
// RPC succeeded, response has what we want. Continue the post-processing.
}
}
}
}
MyResponseresponse;
MyResponseresponse;
brpc::Controllercntl;
brpc::Controllercntl;
}
}
OnRPCDone*done=newOnRPCDone;
OnRPCDone*done=newOnRPCDone;
MyService_Stubstub(&channel);
MyService_Stubstub(&channel);
MyRequestrequest;// you don't have to new request, even in an asynchronous call.
MyRequestrequest;// you don't have to new request, even in an asynchronous call.
Call `Controller.call_id()` to get an id **before launching RPC**, join the id after the RPC.
Call `Controller.call_id()` to get an id **before launching RPC**, join the id after the RPC.
Join() blocks until completion of RPC **and end of done->Run()**, properties of Join:
Join() blocks until completion of RPC **and end of done->Run()**, properties of Join:
- If the RPC is complete, Join() returns immediately.
- If the RPC is complete, Join() returns immediately.
- Multiple threads can Join() one id, all of them will be woken up.
- Multiple threads can Join() one id, all of them will be woken up.
- Synchronous RPC can be Join()-ed in another thread, although we rarely do this.
- Synchronous RPC can be Join()-ed in another thread, although we rarely do this.
Join() was called JoinResponse() before, if you meet deprecated issues during compilation, rename to Join().
Join() was called JoinResponse() before, if you meet deprecated issues during compilation, rename to Join().
Calling `Join(controller->call_id())` after completion of RPC is **wrong**, do save call_id before RPC, otherwise the controller may be deleted by done at any time. The Join in following code is **wrong**.
Calling `Join(controller->call_id())` after completion of RPC is **wrong**, do save call_id before RPC, otherwise the controller may be deleted by done at any time. The Join in following code is **wrong**.
@@ -334,7 +334,7 @@ brpc::Join(controller2->call_id()); // WRONG, controller2 may be deleted by on
...
@@ -334,7 +334,7 @@ brpc::Join(controller2->call_id()); // WRONG, controller2 may be deleted by on
## Semi-synchronous
## Semi-synchronous
Join can be used for implementing "Semi-synchronous" call: blocks until multiple asynchronous calls to complete. Since the callsite blocks until completion of all RPC, controller/response can be put on stack safely.
Join can be used for implementing "Semi-synchronous" call: blocks until multiple asynchronous calls to complete. Since the callsite blocks until completion of all RPC, controller/response can be put on stack safely.
brpc::DoNothing() gets a closure doing nothing, specifically for semi-synchronous calls. Its lifetime is managed by brpc.
brpc::DoNothing() gets a closure doing nothing, specifically for semi-synchronous calls. Its lifetime is managed by brpc.
Note that in above example, we access `controller.call_id()` after completion of RPC, which is safe right here, because DoNothing does not delete controller as in `on_rpc_done` in previous example.
Note that in above example, we access `controller.call_id()` after completion of RPC, which is safe right here, because DoNothing does not delete controller as in `on_rpc_done` in previous example.
## Cancel RPC
## Cancel RPC
`brpc::StartCancel(call_id)` cancels corresponding RPC, call_id must be got from Controller.call_id() **before launching RPC**, race conditions may occur at any other time.
`brpc::StartCancel(call_id)` cancels corresponding RPC, call_id must be got from Controller.call_id() **before launching RPC**, race conditions may occur at any other time.
NOTE: it is `brpc::StartCancel(call_id)`, not `controller->StartCancel()`, which is forbidden and useless. The latter one is provided by protobuf by default and has serious race conditions on lifetime of controller.
NOTE: it is `brpc::StartCancel(call_id)`, not `controller->StartCancel()`, which is forbidden and useless. The latter one is provided by protobuf by default and has serious race conditions on lifetime of controller.
As the name implies, RPC may not complete yet after calling StartCancel, you should not touch any field in Controller or delete any associated resources, they should be handled inside done->Run(). If you have to wait for completion of RPC in-place(not recommended), call Join(call_id).
As the name implies, RPC may not complete yet after calling StartCancel, you should not touch any field in Controller or delete any associated resources, they should be handled inside done->Run(). If you have to wait for completion of RPC in-place(not recommended), call Join(call_id).
Facts about StartCancel:
Facts about StartCancel:
- call_id can be cancelled before CallMethod, the RPC will end immediately(and done will be called).
- call_id can be cancelled before CallMethod, the RPC will end immediately(and done will be called).
- call_id can be cancelled in another thread.
- call_id can be cancelled in another thread.
- Cancel an already-cancelled call_id has no effect. Inference: One call_id can be cancelled by multiple threads simultaneously, but only one of them takes effect.
- Cancel an already-cancelled call_id has no effect. Inference: One call_id can be cancelled by multiple threads simultaneously, but only one of them takes effect.
- Cancel here is a client-only feature, **the server-side may not cancel the operation necessarily**, server cancelation is a separate feature.
- Cancel here is a client-only feature, **the server-side may not cancel the operation necessarily**, server cancelation is a separate feature.
## Get server-side address and port
## Get server-side address and port
remote_side() tells where request was sent to, the return type is [butil::EndPoint](https://github.com/brpc/brpc/blob/master/src/butil/endpoint.h), which includes an ipv4 address and port. Calling this method before completion of RPC is undefined.
remote_side() tells where request was sent to, the return type is [butil::EndPoint](https://github.com/brpc/brpc/blob/master/src/butil/endpoint.h), which includes an ipv4 address and port. Calling this method before completion of RPC is undefined.
Controller has miscellaneous fields, some of them are buffers that can be re-used by calling Reset().
Controller has miscellaneous fields, some of them are buffers that can be re-used by calling Reset().
In most use cases, constructing a Controller(snippet1) and re-using a Controller(snippet2) perform similarily.
In most use cases, constructing a Controller(snippet1) and re-using a Controller(snippet2) perform similarily.
```c++
```c++
...
@@ -398,7 +398,7 @@ for (int i = 0; i < n; ++i) {
...
@@ -398,7 +398,7 @@ for (int i = 0; i < n; ++i) {
...
...
stub.CallSomething(...,&controller);
stub.CallSomething(...,&controller);
}
}
// snippet2
// snippet2
brpc::Controllercontroller;
brpc::Controllercontroller;
for(inti=0;i<n;++i){
for(inti=0;i<n;++i){
...
@@ -407,30 +407,30 @@ for (int i = 0; i < n; ++i) {
...
@@ -407,30 +407,30 @@ for (int i = 0; i < n; ++i) {
stub.CallSomething(...,&controller);
stub.CallSomething(...,&controller);
}
}
```
```
If the Controller in snippet1 is new-ed on heap, snippet1 has extra cost of "heap allcation" and may be a little slower in some cases.
If the Controller in snippet1 is new-ed on heap, snippet1 has extra cost of "heap allcation" and may be a little slower in some cases.
# Settings
# Settings
Client-side settings has 3 parts:
Client-side settings has 3 parts:
- brpc::ChannelOptions: defined in [src/brpc/channel.h](https://github.com/brpc/brpc/blob/master/src/brpc/channel.h), for initializing Channel, becoming immutable once the initialization is done.
- brpc::ChannelOptions: defined in [src/brpc/channel.h](https://github.com/brpc/brpc/blob/master/src/brpc/channel.h), for initializing Channel, becoming immutable once the initialization is done.
- brpc::Controller: defined in [src/brpc/controller.h](https://github.com/brpc/brpc/blob/master/src/brpc/controller.h), for overriding fields in brpc::ChannelOptions for some RPC according to contexts.
- brpc::Controller: defined in [src/brpc/controller.h](https://github.com/brpc/brpc/blob/master/src/brpc/controller.h), for overriding fields in brpc::ChannelOptions for some RPC according to contexts.
- global gflags: for tuning global behaviors, being unchanged generally. Read comments in [/flags](flags.md) before setting.
- global gflags: for tuning global behaviors, being unchanged generally. Read comments in [/flags](flags.md) before setting.
Controller contains data and options that request may not have. server and client share the same Controller class, but they may set different fields. Read comments in Controller carefully before using.
Controller contains data and options that request may not have. server and client share the same Controller class, but they may set different fields. Read comments in Controller carefully before using.
A Controller corresponds to a RPC. A Controller can be re-used by another RPC after Reset(), but a Controller can't be used by multiple RPC simultaneously, no matter the RPCs are started from one thread or not.
A Controller corresponds to a RPC. A Controller can be re-used by another RPC after Reset(), but a Controller can't be used by multiple RPC simultaneously, no matter the RPCs are started from one thread or not.
Properties of Controller:
Properties of Controller:
1. A Controller can only have one user. Without explicit statement, methods in Controller are **not** thread-safe by default.
1. A Controller can only have one user. Without explicit statement, methods in Controller are **not** thread-safe by default.
2. Due to the fact that Controller is not shared generally, there's no need to manage Controller by shared_ptr. If you do, something might goes wrong.
2. Due to the fact that Controller is not shared generally, there's no need to manage Controller by shared_ptr. If you do, something might goes wrong.
3. Controller is constructed before RPC and destructed after RPC, some common patterns:
3. Controller is constructed before RPC and destructed after RPC, some common patterns:
- Put Controller on stack before synchronous RPC, be destructed when out of scope. Note that Controller of asynchronous RPC **must not** be put on stack, otherwise the RPC may still run when the Controller is being destructed and result in undefined behavior.
- Put Controller on stack before synchronous RPC, be destructed when out of scope. Note that Controller of asynchronous RPC **must not** be put on stack, otherwise the RPC may still run when the Controller is being destructed and result in undefined behavior.
- new Controller before asynchronous RPC, delete in done.
- new Controller before asynchronous RPC, delete in done.
## Timeout
## Timeout
**ChannelOptions.timeout_ms** is timeout in milliseconds for all RPCs via the Channel, Controller.set_timeout_ms() overrides value for one RPC. Default value is 1 second, Maximum value is 2^31 (about 24 days), -1 means wait indefinitely for response or connection error.
**ChannelOptions.timeout_ms** is timeout in milliseconds for all RPCs via the Channel, Controller.set_timeout_ms() overrides value for one RPC. Default value is 1 second, Maximum value is 2^31 (about 24 days), -1 means wait indefinitely for response or connection error.
**ChannelOptions.connect_timeout_ms** is timeout in milliseconds for connecting part of all RPC via the Channel, Default value is 1 second, and -1 means no timeout for connecting. This value is limited to be never greater than timeout_ms. Note that this timeout is different from the connection timeout in TCP, generally this timeout is smaller otherwise establishment of the connection may fail before this timeout due to timeout in TCP layer.
**ChannelOptions.connect_timeout_ms** is timeout in milliseconds for connecting part of all RPC via the Channel, Default value is 1 second, and -1 means no timeout for connecting. This value is limited to be never greater than timeout_ms. Note that this timeout is different from the connection timeout in TCP, generally this timeout is smaller otherwise establishment of the connection may fail before this timeout due to timeout in TCP layer.
...
@@ -440,7 +440,7 @@ NOTE2: error code of RPC timeout is **ERPCTIMEDOUT (1008) **, ETIMEDOUT is conne
...
@@ -440,7 +440,7 @@ NOTE2: error code of RPC timeout is **ERPCTIMEDOUT (1008) **, ETIMEDOUT is conne
## Retry
## Retry
ChannelOptions.max_retry is maximum retrying count for all RPC via the channel, Controller.set_max_retry() overrides value for one RPC. Default value is 3. 0 means no retries.
ChannelOptions.max_retry is maximum retrying count for all RPC via the channel, Controller.set_max_retry() overrides value for one RPC. Default value is 3. 0 means no retries.
Controller.retried_count() returns number of retries.
Controller.retried_count() returns number of retries.
...
@@ -448,37 +448,37 @@ Controller.has_backup_request() tells if backup_request was sent.
...
@@ -448,37 +448,37 @@ Controller.has_backup_request() tells if backup_request was sent.
**servers tried before are not retried by best efforts**
**servers tried before are not retried by best efforts**
Conditions for retrying (AND relations):
Conditions for retrying (AND relations):
- Broken connection.
- Broken connection.
- Timeout is not reached.
- Timeout is not reached.
- Has retrying quota. Controller.set_max_retry(0) or ChannelOptions.max_retry = 0 disables retries.
- Has retrying quota. Controller.set_max_retry(0) or ChannelOptions.max_retry = 0 disables retries.
- The retry makes sense. If the RPC fails due to request(EREQUEST), no retry will be done because server is very likely to reject the request again, retrying makes no sense here.
- The retry makes sense. If the RPC fails due to request(EREQUEST), no retry will be done because server is very likely to reject the request again, retrying makes no sense here.
### Broken connection
### Broken connection
If the server does not respond and connection is good, retry is not triggered. If you need to send another request after some timeout, use backup request.
If the server does not respond and connection is good, retry is not triggered. If you need to send another request after some timeout, use backup request.
How it works: If response does not return within the timeout specified by backup_request_ms, send another request, take whatever the first returned. New request will be sent to a different server that never tried before by best efforts. NOTE: If backup_request_ms is greater than timeout_ms, backup request will never be sent. backup request consumes one retry. backup request does not imply a server-side cancellation.
How it works: If response does not return within the timeout specified by backup_request_ms, send another request, take whatever the first returned. New request will be sent to a different server that never tried before by best efforts. NOTE: If backup_request_ms is greater than timeout_ms, backup request will never be sent. backup request consumes one retry. backup request does not imply a server-side cancellation.
ChannelOptions.backup_request_ms affects all RPC via the Channel, unit is milliseconds, Default value is -1(disabled), Controller.set_backup_request_ms() overrides value for one RPC.
ChannelOptions.backup_request_ms affects all RPC via the Channel, unit is milliseconds, Default value is -1(disabled), Controller.set_backup_request_ms() overrides value for one RPC.
### Timeout is not reached
### Timeout is not reached
RPC will be ended soon after the timeout.
RPC will be ended soon after the timeout.
### Has retrying quota
### Has retrying quota
Controller.set_max_retry(0) or ChannelOptions.max_retry = 0 disables retries.
Controller.set_max_retry(0) or ChannelOptions.max_retry = 0 disables retries.
### The retry makes sense
### The retry makes sense
If the RPC fails due to request(EREQUEST), no retry will be done because server is very likely to reject the request again, retrying makes no sense here.
If the RPC fails due to request(EREQUEST), no retry will be done because server is very likely to reject the request again, retrying makes no sense here.
Users can inherit [brpc::RetryPolicy](https://github.com/brpc/brpc/blob/master/src/brpc/retry_policy.h) to customize conditions of retrying. For example brpc does not retry for HTTP related errors by default. If you want to retry for HTTP_STATUS_FORBIDDEN(403) in your app, you can do as follows:
Users can inherit [brpc::RetryPolicy](https://github.com/brpc/brpc/blob/master/src/brpc/retry_policy.h) to customize conditions of retrying. For example brpc does not retry for HTTP related errors by default. If you want to retry for HTTP_STATUS_FORBIDDEN(403) in your app, you can do as follows:
```c++
```c++
#include <brpc/retry_policy.h>
#include <brpc/retry_policy.h>
classMyRetryPolicy:publicbrpc::RetryPolicy{
classMyRetryPolicy:publicbrpc::RetryPolicy{
public:
public:
boolDoRetry(constbrpc::Controller*cntl)const{
boolDoRetry(constbrpc::Controller*cntl)const{
...
@@ -491,9 +491,9 @@ public:
...
@@ -491,9 +491,9 @@ public:
}
}
};
};
...
...
// Assign the instance to ChannelOptions.retry_policy.
// Assign the instance to ChannelOptions.retry_policy.
// NOTE: retry_policy must be kept valid during lifetime of Channel, and Channel does not retry_policy, so in most cases RetryPolicy should be created by singleton..
// NOTE: retry_policy must be kept valid during lifetime of Channel, and Channel does not retry_policy, so in most cases RetryPolicy should be created by singleton..
brpc::ChannelOptionsoptions;
brpc::ChannelOptionsoptions;
staticMyRetryPolicyg_my_retry_policy;
staticMyRetryPolicyg_my_retry_policy;
options.retry_policy=&g_my_retry_policy;
options.retry_policy=&g_my_retry_policy;
...
@@ -515,41 +515,41 @@ The default protocol used by Channel is baidu_std, which is changeable by settin
...
@@ -515,41 +515,41 @@ The default protocol used by Channel is baidu_std, which is changeable by settin
Supported protocols:
Supported protocols:
- PROTOCOL_BAIDU_STD or "baidu_std", which is [the standard binary protocol inside Baidu](baidu_std.md), using single connection by default.
- PROTOCOL_BAIDU_STD or "baidu_std", which is [the standard binary protocol inside Baidu](baidu_std.md), using single connection by default.
- PROTOCOL_HULU_PBRPC or "hulu_pbrpc", which is protocol of hulu-pbrpc, using single connection by default.
- PROTOCOL_HULU_PBRPC or "hulu_pbrpc", which is protocol of hulu-pbrpc, using single connection by default.
- PROTOCOL_NOVA_PBRPC or "nova_pbrpc", which is protocol of Baidu ads union, using pooled connection by default.
- PROTOCOL_NOVA_PBRPC or "nova_pbrpc", which is protocol of Baidu ads union, using pooled connection by default.
- PROTOCOL_HTTP or "http", which is http 1.0 or 1.1, using pooled connection by default (Keep-Alive). Check out [Access HTTP service](http_client.md) for details.
- PROTOCOL_HTTP or "http", which is http 1.0 or 1.1, using pooled connection by default (Keep-Alive). Check out [Access HTTP service](http_client.md) for details.
- PROTOCOL_SOFA_PBRPC or "sofa_pbrpc", which is protocol of sofa-pbrpc, using single connection by default.
- PROTOCOL_SOFA_PBRPC or "sofa_pbrpc", which is protocol of sofa-pbrpc, using single connection by default.
- PROTOCOL_PUBLIC_PBRPC or "public_pbrpc", which is protocol of public_pbrpc, using pooled connection by default.
- PROTOCOL_PUBLIC_PBRPC or "public_pbrpc", which is protocol of public_pbrpc, using pooled connection by default.
- PROTOCOL_UBRPC_COMPACK or "ubrpc_compack", which is protocol of public/ubrpc, packing with compack, using pooled connection by default. check out [ubrpc (by protobuf)](ub_client.md) for details. A related protocol is PROTOCOL_UBRPC_MCPACK2 or ubrpc_mcpack2, packing with mcpack2.
- PROTOCOL_UBRPC_COMPACK or "ubrpc_compack", which is protocol of public/ubrpc, packing with compack, using pooled connection by default. check out [ubrpc (by protobuf)](ub_client.md) for details. A related protocol is PROTOCOL_UBRPC_MCPACK2 or ubrpc_mcpack2, packing with mcpack2.
- PROTOCOL_NSHEAD_CLIENT or "nshead_client", which is required by UBXXXRequest in baidu-rpc-ub, using pooled connection by default. Check out [Access UB](ub_client.md) for details.
- PROTOCOL_NSHEAD_CLIENT or "nshead_client", which is required by UBXXXRequest in baidu-rpc-ub, using pooled connection by default. Check out [Access UB](ub_client.md) for details.
- PROTOCOL_NSHEAD or "nshead", which is required by sending NsheadMessage, using pooled connection by default. Check out [nshead+blob](ub_client.md#nshead-blob) for details.
- PROTOCOL_NSHEAD or "nshead", which is required by sending NsheadMessage, using pooled connection by default. Check out [nshead+blob](ub_client.md#nshead-blob) for details.
- PROTOCOL_MEMCACHE or "memcache", which is binary protocol of memcached, using **single connection** by default. Check out [access memcached](memcache_client.md) for details.
- PROTOCOL_MEMCACHE or "memcache", which is binary protocol of memcached, using **single connection** by default. Check out [access memcached](memcache_client.md) for details.
- PROTOCOL_REDIS or "redis", which is protocol of redis 1.2+ (the one supported by hiredis), using **single connection** by default. Check out [Access Redis](redis_client.md) for details.
- PROTOCOL_REDIS or "redis", which is protocol of redis 1.2+ (the one supported by hiredis), using **single connection** by default. Check out [Access Redis](redis_client.md) for details.
- PROTOCOL_NSHEAD_MCPACK or "nshead_mcpack", which is as the name implies, nshead + mcpack (parsed by protobuf via mcpack2pb), using pooled connection by default.
- PROTOCOL_NSHEAD_MCPACK or "nshead_mcpack", which is as the name implies, nshead + mcpack (parsed by protobuf via mcpack2pb), using pooled connection by default.
- PROTOCOL_ESP or "esp", for accessing services with esp protocol, using pooled connection by default.
- PROTOCOL_ESP or "esp", for accessing services with esp protocol, using pooled connection by default.
## Connection Type
## Connection Type
brpc supports following connection types:
brpc supports following connection types:
- short connection: Established before each RPC, closed after completion. Since each RPC has to pay the overhead of establishing connection, this type is used for occasionally launched RPC, not frequently launched ones. No protocol use this type by default. Connections in http 1.0 are handled similarly as short connections.
- short connection: Established before each RPC, closed after completion. Since each RPC has to pay the overhead of establishing connection, this type is used for occasionally launched RPC, not frequently launched ones. No protocol use this type by default. Connections in http 1.0 are handled similarly as short connections.
- pooled connection: Pick an idle connection from a pool before each RPC, return after completion. One connection carries at most one request at the same time. One client may have multiple connections to one server. http and the protocols using nshead use this type by default.
- pooled connection: Pick an unused connection from a pool before each RPC, return after completion. One connection carries at most one request at the same time. One client may have multiple connections to one server. http and the protocols using nshead use this type by default.
- single connection: all clients in one process has at most one connection to one server, one connection may carry multiple requests at the same time. The sequence of returning responses does not need to be same as sending requests. This type is used by baidu_std, hulu_pbrpc, sofa_pbrpc by default.
- single connection: all clients in one process has at most one connection to one server, one connection may carry multiple requests at the same time. The sequence of received responses does not need to be same as sending requests. This type is used by baidu_std, hulu_pbrpc, sofa_pbrpc by default.
| | short connection | pooled connection | single connection |
| | short connection | pooled connection | single connection |
| \#connection at server-side (from a client) | qps*latency ([little's law](https://en.wikipedia.org/wiki/Little%27s_law)) | qps*latency | 1 |
| peak qps | bad, and limited by max number of ports | medium | high |
| peak qps | bad, and limited by max number of ports | medium | high |
| latency | 1.5RTT(connect) + 1RTT + processing time | 1RTT + processing time | 1RTT + processing time |
| latency | 1.5RTT(connect) + 1RTT + processing time | 1RTT + processing time | 1RTT + processing time |
| cpu usage | high, tcp connect for each RPC | medium, every request needs a sys write | low, writes can be combined to reduce overhead. |
| cpu usage | high, tcp connect for each RPC | medium, every request needs a sys write | low, writes can be combined to reduce overhead. |
brpc chooses best connection type for the protocol by default, users generally have no need to change it. If you do, set ChannelOptions.connection_type to:
brpc chooses best connection type for the protocol by default, users generally have no need to change it. If you do, set ChannelOptions.connection_type to:
- CONNECTION_TYPE_SINGLE or "single" : single connection
- CONNECTION_TYPE_SINGLE or "single" : single connection
- CONNECTION_TYPE_POOLED or "pooled": pooled connection. Max number of connections to one server is limited by -max_connection_pool_size:
- CONNECTION_TYPE_POOLED or "pooled": pooled connection. Max number of connections from one client to one server is limited by -max_connection_pool_size:
@@ -559,7 +559,7 @@ brpc chooses best connection type for the protocol by default, users generally h
...
@@ -559,7 +559,7 @@ brpc chooses best connection type for the protocol by default, users generally h
- "" (empty string) makes brpc chooses the default one.
- "" (empty string) makes brpc chooses the default one.
brpc also supports [Streaming RPC](streaming_rpc.md) which is an application-level connection for transferring streaming data.
brpc also supports [Streaming RPC](streaming_rpc.md) which is an application-level connection for transferring streaming data.
## Close idle connections in pools
## Close idle connections in pools
...
@@ -572,7 +572,7 @@ If a connection has no read or write within the seconds specified by -idle_timeo
...
@@ -572,7 +572,7 @@ If a connection has no read or write within the seconds specified by -idle_timeo
## Defer connection close
## Defer connection close
Multiple channels may share a connection via referential counting. When a channel releases last reference of the connection, the connection will be closed. But in some scenarios, channels are created just before sending RPC and destroyed after completion, in which case connections are probably closed and re-open again frequently, as costly as short connections.
Multiple channels may share a connection via referential counting. When a channel releases last reference of the connection, the connection will be closed. But in some scenarios, channels are created just before sending RPC and destroyed after completion, in which case connections are probably closed and re-open again frequently, as costly as short connections.
One solution is to cache channels commonly used by user, which avoids frequent creation and destroying of channels. However brpc does not offer an utility for doing this right now, and it's not trivial for users to implement it correctly.
One solution is to cache channels commonly used by user, which avoids frequent creation and destroying of channels. However brpc does not offer an utility for doing this right now, and it's not trivial for users to implement it correctly.
...
@@ -582,13 +582,13 @@ Another solution is setting gflag -defer_close_second
...
@@ -582,13 +582,13 @@ Another solution is setting gflag -defer_close_second
| defer_close_second | 0 | Defer close of connections for so many seconds even if the connection is not used by anyone. Close immediately for non-positive values | src/brpc/socket_map.cpp |
| defer_close_second | 0 | Defer close of connections for so many seconds even if the connection is not used by anyone. Close immediately for non-positive values | src/brpc/socket_map.cpp |
After setting, connection is not closed immediately after last referential count, instead it will be closed after so many seconds. If a channel references the connection again during the wait, the connection resumes to normal. No matter how frequent channels are created, this flag limits the frequency of closing connections. Side effect of the flag is that file descriptors are not closed immediately after destroying of channels, if the flag is wrongly set to be large, number of used file descriptors in the process may be large as well.
After setting, connection is not closed immediately after last referential count, instead it will be closed after so many seconds. If a channel references the connection again during the wait, the connection resumes to normal. No matter how frequent channels are created, this flag limits the frequency of closing connections. Side effect of the flag is that file descriptors are not closed immediately after destroying of channels, if the flag is wrongly set to be large, number of active file descriptors in the process may be large as well.
## 连接的缓冲区大小
## Buffer size of connections
-socket_recv_buffer_size设置所有连接的接收缓冲区大小, 默认-1(不修改)
-socket_recv_buffer_size sets receiving buffer size of all connections, -1 by default (not modified)
-socket_send_buffer_size设置所有连接的发送缓冲区大小, 默认-1(不修改)
-socket_send_buffer_size sets sending buffer size of all connections, -1 by default (not modified)
set_log_id() sets a 64-bit integral log_id, which is sent to the server-side along with the request, and often printed in server logs to associate different services accessed in a session. String-type log-id must be converted to 64-bit integer before setting.
baidu_std and hulu_pbrpc supports attachment, which is set by user to bypass serialization of protobuf. As a client, the data in Controller::request_attachment() will be received by the server and response_attachment() contains attachment sent back by the server. Attachment is not compressed by brpc.
In http, attachment corresponds to [message body](http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html), namely the data to post is stored in request_attachment().
## giano认证
## Authentication
```
TODO: Describe how authentication methods are extended.
// Create a baas::CredentialGenerator using Giano's API
- brpc::CompressTypeZlib : [zlib](http://en.wikipedia.org/wiki/Zlib), 10%~20% faster than gzip but still significantly slower than snappy, with slightly better compression ratio than gzip.
下表是多种压缩算法应对重复率很高的数据时的性能, 仅供参考.
Following table lists performance of different methods compressing and decompressing **data with a lot of duplications**, just for reference.
No. Local TCP sockets performs just a little slower than unix domain socket since traffic over local TCP sockets bypasses network. Some scenarios where TCP sockets can't be used may require unix domain sockets. We may consider the capability in future.
### Q: Fail to connect to xx.xx.xx.xx:xxxx, Connection refused是什么意思
### Q: Fail to connect to xx.xx.xx.xx:xxxx, Connection refused
一般是对端server没打开端口(很可能挂了).
The remote server does not serve any more (probably crashed).
### Q: 经常遇到Connection timedout(不在一个机房)
### Q: often met Connection timedout to another IDC
![img](../images/connection_timedout.png)
![img](../images/connection_timedout.png)
这个就是连接超时了, 调大连接和RPC超时:
The TCP connection is not established within connection_timeout_ms, you have to tweak options:
```c++
```c++
structChannelOptions{
structChannelOptions{
...
@@ -697,7 +686,7 @@ struct ChannelOptions {
...
@@ -697,7 +686,7 @@ struct ChannelOptions {
// Default: 200 (milliseconds)
// Default: 200 (milliseconds)
// Maximum: 0x7fffffff (roughly 30 days)
// Maximum: 0x7fffffff (roughly 30 days)
int32_tconnect_timeout_ms;
int32_tconnect_timeout_ms;
// Max duration of RPC over this Channel. -1 means wait indefinitely.
// Max duration of RPC over this Channel. -1 means wait indefinitely.
// Overridable by Controller.set_timeout_ms().
// Overridable by Controller.set_timeout_ms().
// Default: 500 (milliseconds)
// Default: 500 (milliseconds)
...
@@ -707,50 +696,46 @@ struct ChannelOptions {
...
@@ -707,50 +696,46 @@ struct ChannelOptions {
};
};
```
```
注意连接超时不是RPC超时, RPC超时打印的日志是"Reached timeout=...".
NOTE: Connection timeout is not RPC timeout, which is printed as "Reached timeout=...".
Check lifetime of Controller, Response and done. In asynchronous call, finish of CallMethod is not completion of RPC which is entering of done->Run(). So the objects should not deleted just after CallMethod, instead they should be delete in done->Run(). Generally you should allocate the objects on heap instead of putting them on stack. Check out [Asynchronous call](client.md#asynchronous-call) for details.
### Q: BNS中机器列表已经配置了,但是RPC报"Fail to select server, No data available"错误
### Q: How to make requests be processed once and only once
This issue is not solved on RPC layer. When response returns and being successful, we know the RPC is processed at server-side. When response returns and being rejected, we know the RPC is not processed at server-side. But when response is not returned, server may or may not process the RPC. If we retry, same request may be processed twice at server-side. Generally RPC services with side effects must consider [idempotence](http://en.wikipedia.org/wiki/Idempotence) of the service, otherwise retries may make side effects be done more than once and result in unexpected behavior. Search services with only read often have no side effects (during a search), being idempotent natually. But storage services that need to write have to design versioning or serial-number mechanisms to reject side effects that already happen, to keep idempoent.
FATAL 04-07 20:00:03 7778 src/brpc/channel.cpp:123] Invalid address=`bns://group.user-persona.dumi.nj03'. You should use Init(naming_service_name, load_balancer_name, options) to access multiple servers.
FATAL 04-07 20:00:03 7778 src/brpc/channel.cpp:123] Invalid address=`bns://group.user-persona.dumi.nj03'. You should use Init(naming_service_name, load_balancer_name, options) to access multiple servers.
Accessing servers under naming service needs the Init() with 3 parameters(the second param is `load_balancer_name`). The Init() here is with 2 parameters and treated by brpc as accessing single server, producing the error.
### Q: 两个产品线都使用protobuf, 为什么不能互相访问
### Q: Both sides use protobuf, why can't they communicate with each other
**protocol != protobuf**. protobuf serializes one package and a message of a protocol may contain multiple packages along with extra lengths, checksums, magic numbers. The capability offered by brpc that "write code once and serve multiple protocols" is implemented by converting data from different protocols to unified API, not on protobuf layer.
2.According to how the Channel is initialized, choose a server from global [SocketMap](https://github.com/brpc/brpc/blob/master/src/brpc/socket_map.h) or [LoadBalancer](https://github.com/brpc/brpc/blob/master/src/brpc/load_balancer.h) as destination of the request.
4.If authentication is turned on and the Socket is not authenticated yet, first request enters authenticating branch, other requests block until the branch writes authenticating information into the Socket. Server-side only verifies the first request.
5.According to protocol of the Channel, choose corresponding serialization callback to serialize request into [IOBuf](https://github.com/brpc/brpc/blob/master/src/butil/iobuf.h).
6.If timeout is set, setup timer. From this point on, avoid using Controller, since the timer may be triggered at anytime and calls user's callback for timeout, which may delete Controller.
8.Write IOBuf with serialized data into the Socket and add Channel::HandleSocketFailed into id_wait_list of the Socket. The callback will be called when the write is failed or connection is broken before completion of RPC.
11.After receiving response, get the correlation_id inside, find out associated Controller within O(1) time. The lookup does not need to lock a global hashmap, and scales well.
13.Call Controller::OnRPCReturned, which may retry errorous RPC, or complete the RPC. Call user's done in asynchronous call. Destroy correlation_id and wakeup joining threads.