FATAL: 05-10 14:40:05: * 0 src/brpc/input_messenger.cpp:89] A message from 127.0.0.1:35217(protocol=baidu_std) is bigger than 67108864 bytes, the connection will be closed. Set max_body_size to allow bigger messages
FATAL: 05-10 13:35:02: * 0 google/protobuf/io/coded_stream.cc:156] A protocol message was rejected because it was too big (more than 67108864 bytes). To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[a27eda84bcdeef529a76f22872b78305] Not allowed to access builtin services, try ServerOptions.internal_port=... instead if you're inside Baidu's network
[a27eda84bcdeef529a76f22872b78305] Not allowed to access builtin services, try ServerOptions.internal_port=... instead if you're inside internal network
Interfaces of requests, responses, services are all defined in proto files.
Interfaces of requests, responses, services are defined in proto files.
```C++
# Tell protoc to generate base classes for C++ Service. If language is java or python, modify to java_generic_services or py_generic_services.
# Tell protoc to generate base classes for C++ Service. modify to java_generic_services or py_generic_services for java or python.
option cc_generic_services = true;
message EchoRequest {
...
...
@@ -22,7 +22,7 @@ service EchoService {
};
```
Read [official docs of protobuf](https://developers.google.com/protocol-buffers/docs/proto#options) for more information about protobuf.
Read [official documents on protobuf](https://developers.google.com/protocol-buffers/docs/proto#options) for more details about protobuf.
# Implement generated interface
...
...
@@ -50,35 +50,37 @@ public:
Service is not available before insertion into [brpc.Server](https://github.com/brpc/brpc/blob/master/src/brpc/server.h).
When client sends request, Echo() is called. Meaning of parameters:
When client sends request, Echo() is called.
Explain parameters:
**controller**
convertiable to brpc::Controller statically (provided the code runs in brpc.Server), containing parameters that can't included by request and response, check out [src/brpc/controller.h](https://github.com/brpc/brpc/blob/master/src/brpc/controller.h) for details.
Statically convertible to brpc::Controller (provided that the code runs in brpc.Server). Contains parameters that can't be included by request and response, check out [src/brpc/controller.h](https://github.com/brpc/brpc/blob/master/src/brpc/controller.h) for details.
**request**
read-only data message from a client.
read-only message from a client.
**response**
Filled by user. If any **required** field is unset, the RPC will be failed.
Filled by user. If any **required** field is not set, the RPC will fail.
**done**
done is created by brpc and passed to service's CallMethod(), including all actions after calling CallMethod(): validating response, serialization, packing, sending etc.
Created by brpc and passed to service's CallMethod(), including all actions after leaving CallMethod(): validating response, serialization, sending back to client etc.
**No matter the RPC is successful or not, done->Run() must be called after processing.**
**No matter the RPC is successful or not, done->Run() must be called by user once and only once when the RPC is done.**
Why not brpc calls done automatically? This is for allowing users to store done and call done->Run() due to some events after CallMethod(), which is**asynchronous service**.
Why does brpc not call done->Run() automatically? Because users are able to store done somewhere and call done->Run() in some event handlers after leaving CallMethod(), which is an**asynchronous service**.
We strongly recommend using **ClosureGuard** to make sure done->Run() is always called, which is the beginning statement in above code snippet:
We strongly recommend using **ClosureGuard** to make done->Run() always be called. Look at the beginning statement in above code snippet:
```c++
brpc::ClosureGuarddone_guard(done);
```
Not matter the callback is exited from middle or the end, done_guard will be destructed, in which done->Run() will be called. The mechanism is called [RAII](https://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization). Without done_guard, you have to add done->Run() before each return, **which is very easy to forget**.
Not matter the callback is exited from middle or end, done_guard will be destructed, in which done->Run() is called. The mechanism is called [RAII](https://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization). Without done_guard, you have to remember to add done->Run() before each `return`, **which is very error-prone**.
In asynchronous service, processing of the request is not completed when CallMethod() returns and done->Run() should not be called, instead it should be preserved for later usage. At first glance, we don't need ClosureGuard here. However in real applications, a synchronous service possibly fails in the middle and exits CallMethod() due to a lot of reasons. Without ClosureGuard, some error branches may forget to call done->Run() before return. Thus we still recommended using done_guard in asynchronous services. Different from synchronous service, to prevent done->Run() from being called at successful returns, you should call done_guard.release() to release the enclosed done.
...
...
@@ -247,58 +249,57 @@ Services can be added or removed after Join() and server can be Start() again.
Services using protobuf can be accessed via http+json generally. The json string stored in http body is convertible to/from corresponding protobuf message. Take [echo server](https://github.com/brpc/brpc/blob/master/example/echo_c%2B%2B/server.cpp) as an example, it's accessible from [curl](https://curl.haxx.se/).
Json fields correspond to pb fields by matched names and message structures. The json must contain required fields in pb, otherwise conversion will fail and corresponding request will be rejected. The json may include undefined fields in pb, but they will be dropped rather than being stored in pb as unknown fields. Check out [json <=> protobuf](json2pb.md) for conversion rules.
When -pb_enum_as_number is turned on, enums in pb are converted to values instead of names. For example in `enum MyEnum { Foo = 1; Bar = 2; };`, fields typed `MyEnum` are converted to "Foo" or "Bar" when the flag is off, 1 or 2 otherwise. This flag affects requests sent by client and responses returned by server both. Since conversion-to-name has better forward and backward compatibilities, this flag should only be turned on to adapt legacy code that are unable to parse enums from names.
Early-version brpc allows pb service being accessed via http without setting the pb request, even if the request has required fields. This kind of service often parses http request and sets http response by its own, and does not touch the pb request. However this behavior is still very dangerous: a service with an undefined request.
These services may meet issues after upgrading to latest brpc, which already deprecated the behavior for a long time. To help these services to upgrade, brpc allows bypassing the conversion from http body to pb request with some settings (so that users can parse http requests differently), the setting is as follows:
After the setting, service does not convert http body to pb request after receiving http request, which also makes the pb request undefined. Users have to parse the http body by themselves when `cntl->request_protocol() == brpc::PROTOCOL_HTTP` is true which indicates the request is from http.
As a correspondence, if cntl->response_attachment() is not empty and pb response is set as well, brpc does not report error anymore, instead cntl->response_attachment() will be used as body of the http response. This behavior does not relate to setting allow_http_body_to_pb or not. If the relaxation results in more users' errors, we may restrict it in future.
Server detects supported protocols automatically, without assignment from users. `cntl->protocol()` gets the protocol being used. Server is able to accept connections with different protocols from one port, users don't need to assign different ports for different protocols. Even one connection may transport messages in multiple protocols, although we rarely do this. Supported protocols:
# 协议支持
-[The standard protocol used in Baidu](baidu_std.md), shown as "baidu_std", enabled by default.
If[-log_hostname](http://brpc.baidu.com:8765/flags/log_hostname) is turned on, each line of log contains hostname so that users know machines where each lines are generated from aggregated logs.
Thisfeatureonlyaffectsloggingmacrosin[butil/logging.h](https://github.com/brpc/brpc/blob/master/src/butil/logging.h), glog crashes for FATAL log by default.
If[-crash_on_fatal_log](http://brpc.baidu.com:8765/flags/crash_on_fatal_log) is turned on, program crashes after printing LOG(FATAL) or failing assertions by CHECK(), and generates coredump(with proper environmental settings). Default value is false. This flag can be turned on in testings to make sure the program never meet critical errors.
Toprotectserverandclient,whenarequestreceivedbyserveroraresponsereceivedbyclientistoolarge,serverorclientrejectsthemessageandclosestheconnection.Thelimitiscontrolledby[-max_body_size](http://brpc.baidu.com:8765/flags/max_body_size), in bytes.
FATAL: 05-10 14:40:05: * 0 src/brpc/input_messenger.cpp:89] A message from 127.0.0.1:35217(protocol=baidu_std) is bigger than 67108864 bytes, the connection will be closed. Set max_body_size to allow bigger messages
protobuf has [similar limits](https://github.com/google/protobuf/blob/master/src/google/protobuf/io/coded_stream.h#L364) and the error log is as follows:
```
FATAL: 05-10 13:35:02: * 0 google/protobuf/io/coded_stream.cc:156] A protocol message was rejected because it was too big (more than 67108864 bytes). To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
brpc removes the restriction from protobuf and controls the limit by -max_body_size solely: as long as the flag is large enough, messages will not be rejected and error logs will not be printed. This feature works for all versions of protobuf.
## Compression
set_response_compress_type() sets compression method for response, no compression by default.
## 压缩
Attachment is not compressed. Check out [here](http_service.md#compress-response-body) for compression of HTTP body.
- brpc::CompressTypeSnappy : [snanpy](http://google.github.io/snappy/), compression and decompression are very fast, but compression ratio is low.
- brpc::CompressTypeGzip : [gzip](http://en.wikipedia.org/wiki/Gzip), significantly slower than snappy, with a higher compression ratio.
- brpc::CompressTypeZlib : [zlib](http://en.wikipedia.org/wiki/Zlib), 10%~20% faster than gzip but still significantly slower than snappy, with slightly better compression ratio than gzip.
Read [Client-Compression](client.md#compression) for more comparisons.
更具体的性能对比见[Client-压缩](client.md#压缩).
## Attachment
## 附件
baidu_std and hulu_pbrpc supports attachments which are sent along with messages and set by users to bypass serialization of protobuf. From a server's perspective, data set in Controller.response_attachment() will be received by client while Controller.request_attachment() contains attachment sent from client.
In http, attachment corresponds to [message body](http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html), namely the data to post to client is stored in response_attachment().
The authentication is connection-specific. When server receives the first request from a connection, it tries to parse related information inside (such as auth field in baidu_std, Authorization header in HTTP), and call `VerifyCredential` along with address of the client. If the method returns 0, which indicates success, user can put verified information into `AuthContext` and access it via `controller->auth_context()` laterly, whose lifetime is managed by framework. Otherwise the authentication is failed and the connection will be closed, which makes the client-side fail as well.
Subsequent requests are treated as verified without authenticating overhead.
否则, 表示验证失败, 连接会被直接关闭, client访问失败.
Assigning an instance of implemented `Authenticator` to `ServerOptions.auth` enables authentication. The instance must be valid during lifetime of the server.
NOTE: ServerOptions.num_threads is just a **hint**.
server端开启giano认证的方式:
Don't think that Server uses exactly so many workers because all servers and channels in the process share worker pthreads. Total number of threads is the maximum of all ServerOptions.num_threads and bthread_concurrency. For example, a program has 2 servers with num_threads=24 and 36 respectively, and bthread_concurrency is 16. Then the number of worker pthreads is max (24, 36, 16) = 36, which is different from other RPC implementations which do summations generally.
```c++
// Create a baas::CredentialVerifier using Giano's API
Channel does not have a corresponding option, but user can change number of worker pthreads at client-side by setting gflag -bthread_concurrency.
In addition, brpc **does not separate "IO" and "processing" threads**. brpc knows how to assemble IO and processing code together to achieve better concurrency and efficiency.
"Concurrency" may have 2 meanings: one is number of connections, another is number of requests processed simultaneously. Here we're talking about the latter one.
> ServerOptions.num_threads仅仅是提示值.
In traditional synchronous servers, max concurreny is limited by number of worker pthreads. Setting number of workers also limits concurrency. But brpc processes new requests in bthreads and M bthreads are mapped to N workers (M > N generally), synchronous server may have a concurrency higher than number of workers. On the other hand, although concurrency of asynchronous server is not limited by number of workers in principle, we need to limit it by other factors sometimes.
brpc can limit concurrency at server-level and method-level. When number of requests processed by the server or method simultaneously would exceed the limit, server responds the client with ELIMIT directly instead of invoking the service. A client seeing ELIMIT should retry another server (by best efforts). This options avoids over-queuing of requests at server-side, or limits related resources.
A server reaching max concurrency does not mean other servers in the same cluster reach the limit as well. Let client be aware of the error and try another server is a better strategy from a cluster perspective.
QPS is a second-level metric, which is not good at limiting request bursts. Max concurrency is closely related to available critical resources: number of "workers" or "slots" etc, thus better at preventing over-queuing.
### 为什么超过最大并发要立刻给client返回错误而不是排队?
In addition, when server is stable at latencies, limiting concurrency has similar effect as limiting QPS due to little's law. But the former one is much easier to implement: simple additions or minuses from a counter representing the concurrency. This is also the reason than most traffic control is implemented by limiting concurrency rather than QPS. For example the window in TCP is a kind of concurrency.
PeakQPS and AverageLatency are queries-per-second and latencies measured in a server being pushed to its limit provided that requests are not delayed severely (with an acceptable latency). Most services have performance tests before going online, multiplications of the two is just max concurrencies of the services.
The code is generally **after AddService, before Start() of the server**. When a setting fails(namely the method does not exist), server will fail to start and notify user to fix settings to MaxConcurrencyOf.
- JNI checks stack layout and cannot be run in bthread.
- Extensively used pthread-local to pass session-level global data to functions. Storing data into pthread-local before a RPC and expecting the data read after RPC to equal to the one stored, is problematic. Although tcmalloc uses pthread/LWP-local as well, calls to malloc do not depend on each other, which is safe.
brpc offers pthread mode to solve the issues. When **-usercode_in_pthread** is turned on, user code will be run in pthreads. Functions that would block bthreads will block pthreads.
- Synchronous RPCs block worker pthreads, server often needs more workers (ServerOptions.num_threads), and scheduling efficiency will be slightly lower.
- User code still runs in special bthreads, which use stacks of pthread workers. These special bthreads are scheduled same with normal bthreads and performance differences are negligible.
- bthread supports an unique feature: yield pthread worker to a newly created bthread to reduce a context switch. brpc client uses this feature to reduce number of context switches in one RPC from 3 to 2. In a performance-demanding system, reducing context-switches significantly improves performance and distributions of latencies. However pthread-mode is not capable of doing this and slower in high-QPS systems.
- Number of threads in pthread-mode is a hard limit. Once all threads are occupied, many requests will be queued rapidly and timed-out finally. A common example: When many requests to downstream servers are timedout, the upstream services may also be severely affected by a lot of blocking threads waiting for responses. Consider setting ServerOptions.max_concurrency to protect the server when pthread-mode is on. As a contrast, number of bthreads in bthread mode is a soft limit and reacts more smoothly to such kind of issues.
pthread-mode lets legacy code to try brpc more easily, but we still recommend refactoring the code with bthread-local or even not using TLS gradually, to turn off the option in future.
If requests are from public(including being proxyed by nginx etc), you have to be aware of some security issues.
如果你的服务流量来自外部(包括经过nginx等转发), 你需要注意一些安全因素:
### Hide builtin services from public
### 对外禁用内置服务
Builtin services are useful, on the other hand include a lot of internal information and shouldn't be exposed to public. There're multiple methods to hide builtin services from public:
- Set internal port. Set ServerOptions.internal_port to a port which can **only be accessible from internal**. You can view builtin services via internal_port, while accesses from port to public(the one passed to Server.Start) should see following error:
```
[a27eda84bcdeef529a76f22872b78305] Not allowed to access builtin services, try ServerOptions.internal_port=... instead if you're inside Baidu's network
[a27eda84bcdeef529a76f22872b78305] Not allowed to access builtin services, try ServerOptions.internal_port=... instead if you're inside internal network
- http proxies only proxy specified URLs. nginx etc is able to configure how to map different URLs. For example the configure below maps public traffic to /MyAPI to `/ServiceName/MethodName` of `target-server`. If builtin services like /status are accessed from public, nginx rejects the attempts directly.
proxy_pass http://<target-server>/ServiceName/MethodName$query_string # $query_string is a nginx varible, check out http://nginx.org/en/docs/http/ngx_http_core_module.html for more.
**Don't turn on** -enable_dir_service and -enable_threads_service on public services. Although they're convenient for debugging, they also expose too many information on the server. The script to check if the public service has enabled the options:
Consider returning signatures of the addresses. For example after setting ServerOptions.internal_port, error information returned by server replaces addresses with their MD5 signatures.
/health returns "OK" by default. If the content on /health needs to be customized: inherit [HealthReporter](https://github.com/brpc/brpc/blob/master/src/brpc/health_reporter.h) and implement code to generate the page(as in implementing other http services). Assign an instance to ServerOptions.health_reporter, which is not owned by server and must be valid during lifetime of server. Users may return richer information on status according to application requirements.
Searching services inside Baidu use [thread-local storage](https://en.wikipedia.org/wiki/Thread-local_storage) (TLS) extensively. Some of them cache frequently used objects and reduce repeated creations, some of them pass contexts to global functions implicitly. You should avoid the latter usage as much as possible. Such functions cannot even run without TLS, being hard to test. brpc provides 3 mechanisms to solve issues related to thread-local storage.
A session-local data is bound to a **server-side RPC**: from entering CallMethod of the service, to calling the server-side done->Run(), no matter the service is synchronous or asynchronous. All session-local data are reused as much as possible and not deleted before stopping the server.
After setting ServerOptions.session_local_data_factory, call Controller.session_local_data() to get a session-local data. If ServerOptions.session_local_data_factory is unset, Controller.session_local_data() always returns NULL.
**示例用法: **
If ServerOptions.reserved_session_local_data is greater than 0, Server creates so many data before serving.
session_local_data_factory is typed [DataFactory](https://github.com/brpc/brpc/blob/master/src/brpc/data_factory.h). You have to implement CreateData and DestroyData inside.
注意: CreateData和DestroyData会被多个线程同时调用, 必须线程安全.
NOTE: CreateData and DestroyData may be called by multiple threads simultaneously. Thread-safety is a must.
```c++
class MySessionLocalDataFactory : public brpc::DataFactory {
...
...
@@ -661,9 +657,19 @@ int main(int argc, char* argv[]) {
A server-thread-local is bound to **a call to service's CallMethod**, from entering service's CallMethod, to leaving the method. All server-thread-local data are reused as much as possible and will not be deleted before stopping server. server-thread-local is implemented as a special bthread-local.
After setting ServerOptions.thread_local_data_factory, call Controller.thread_local_data() to get a thread-local. If ServerOptions.thread_local_data_factory is unset, Controller.thread_local_data() always returns NULL.
**示例用法: **
If ServerOptions.reserved_thread_local_data is greater than 0, Server creates so many data before serving.
**Difference with session-local**
session-local data is got from server-side Controller, server-thread-local can be got globally from any function running directly or indirectly inside a thread created by the server.
session-local and server-thread-local are similar in a synchronous service, except that the former one has to be created from a Controller. If the service is asynchronous and the data needs to be accessed from done->Run(), session-local is the only option, because server-thread-local is already invalid after leaving service's CallMethod.
thread_local_data_factory is typed [DataFactory](https://github.com/brpc/brpc/blob/master/src/brpc/data_factory.h). You need to implement CreateData and DestroyData inside.
注意: CreateData和DestroyData会被多个线程同时调用, 必须线程安全.
NOTE: CreateData and DestroyData may be called by multiple threads simultaneously. Thread-safety is a must.
```c++
class MyThreadLocalDataFactory : public brpc::DataFactory {
...
...
@@ -747,17 +747,13 @@ int main(int argc, char* argv[]) {
Session-local and server-thread-local are enough for most servers. However, in some cases, we need a more general thread-local solution. In which case, you can use bthread_key_create, bthread_key_destroy, bthread_getspecific, bthread_setspecific etc, which are similar to [pthread equivalence](http://linux.die.net/man/3/pthread_key_create).
These functions support both bthread and pthread. When they are called in bthread, bthread private variables are returned; When they are called in pthread, pthread private variables are returned. Note that the "pthread private" here is not created by pthread_key_create, pthread-local created by pthread_key_create cannot be got by bthread_getspecific. __thread in GCC and thread_local in c++11 etc cannot be got by bthread_getspecific as well.
Since brpc creates a bthread for each request, the bthread-local in the server behaves specially: a bthread created by server does not delete bthread-local data at exit, instead it returns the data to a pool in the server for later reuse. This prevents bthread-local from constructing and destructing frequently along with creation and destroying of bthreads. This mechanism is transparent to users.
Create a bthread_key_t which represents a kind of bthread-local variable.
```c++
static void my_data_destructor(void* data) {
...
}
Use bthread_[get|set]specific to get and set bthread-local variables. First-time access to a bthread-local variable from a bthread returns NULL.
bthread_key_t tls_key;
Delete a bthread_key_t after no thread is using bthread-local associated with the key. If a bthread_key_t is deleted during usage, related bthread-local data are leaked.
if (bthread_key_create(&tls_key, my_data_destructor) != 0) {
A: The client-side probably uses pooled or short connections, and closes the connection after RPC timedout, when server writes back response, it finds that the connection has been closed and reports this error. "Got EOF" just means the server has received EOF (remote side closes the connection normally). If the client side uses single connection, server rarely reports this error.
### Q: Remote side of fd=9 SocketId=2@10.94.66.55:8000 was closed是什么意思
### Q: Remote side of fd=9 SocketId=2@10.94.66.55:8000 was closed
It's not an error, it's a common warning representing that remote side has closed the connection(EOF). This log might be useful for debugging problems.
### Q: 为什么server端线程数设了没用
Closed by default. Set gflag -log_connection_close to true to enable it. ([modify at run-time](flags.md#change-gflag-on-the-fly) is supported)
### Q: Why does setting number of threads at server-side not work
### Q: 为什么client端的延时远大于server端的延时
All brpc servers in one process [share worker pthreads](#Number-of-worker-pthreads), If multiple servers are created, number of worker pthreads is probably the maxmium of their ServerOptions.num_threads.
### Q: Why does client-side latency much larger than the server-side one
### Q: 程序切换到rpc之后, 会出现莫名其妙的core, 像堆栈被写坏
server-side worker pthreads may be not enough and requests are signicantly delayed. Read [Server debugging](server_debugging.md) for tips and steps on debugging server-side issues.
brpc server runs code in bthreads with stacksize=1MB by default, while stacksize of pthreads is 10MB. It's possible that programs running normally on pthreads may meet stack overflow on bthreads.
NOTE: It does mean that coredump of programs is likely to be caused by "stack overflow". Just because it's easy and quick to verify this factor and exclude the possibility.
Solution: Add following gflags to adjust the stacksize. For example: `--stack_size_normal=10000000 --tc_stack_normal=1`. The first flag sets stacksize to 10MB and the second flag sets number of stacks cached by each worker pthread (to prevent reusing from global each time)
### Q: Fail to open /proc/self/io
有些内核没这个文件, 不影响服务正确性, 但如下几个bvar会无法更新:
Some kernels do not provide this file. Correctness of the service is unaffected, but following bvars are not updated:
```
process_io_read_bytes_second
process_io_write_bytes_second
process_io_read_second
process_io_write_second
```
### Q: json串="[1,2,3]"没法直接转为protobuf message
### Q: json string "[1,2,3]" can't be converted to protobuf message
不行, 最外层必须是json object(大括号包围的)
This is not a valid json string, which must be a json object enclosed with braces {}.