Commit 65d05389 authored by gejun's avatar gejun

reviewed error_code.md

parent 14600ccb
brpc使用[brpc::Controller](https://github.com/brpc/brpc/blob/master/src/brpc/controller.h)设置一次RPC的参数和获取一次RPC的结果,ErrorCode()和ErrorText()是Controller的两个方法,分别是该次RPC的错误码和错误描述,只在RPC结束后才能访问,否则结果未定义。ErrorText()由Controller的基类google::protobuf::RpcController定义,ErrorCode()则是brpc::Controller定义的。Controller还有个Failed()方法告知该次RPC是否失败,这三者的关系是:
[English version](../en/error_code.md)
- 当Failed()为true时,ErrorCode()一定不为0,ErrorText()是非空的错误描述
- 当Failed()为false时,ErrorCode()一定为0,ErrorText()是未定义的(目前在brpc中会为空,但你最好不要依赖这个事实)
brpc使用[brpc::Controller](https://github.com/brpc/brpc/blob/master/src/brpc/controller.h)设置和获取一次RPC的参数,`Controller::ErrorCode()``Controller::ErrorText()`则分别是该次RPC的错误码和错误描述,RPC结束后才能访问,否则结果未定义。ErrorText()由Controller的基类google::protobuf::RpcController定义,ErrorCode()则是brpc::Controller定义的。Controller还有个Failed()方法告知该次RPC是否失败,这三者的关系是:
- 当Failed()为true时,ErrorCode()一定为非0,ErrorText()则为非空。
- 当Failed()为false时,ErrorCode()一定为0,ErrorText()未定义(目前在brpc中会为空,但你最好不要依赖这个事实)
# 标记RPC为错误
brpc的client端和server端都有Controller,都可以通过SetFailed()修改其中的ErrorCode和ErrorText。当多次调用一个Controller的SetFailed时,ErrorCode会被覆盖,ErrorText则是**添加**而不是覆盖在client端,框架会额外加上第几次重试,在server端,框架会额外加上server的地址信息。
brpc的client端和server端都有Controller,都可以通过SetFailed()修改其中的ErrorCode和ErrorText。当多次调用一个Controller的SetFailed时,ErrorCode会被覆盖,ErrorText则是**添加**而不是覆盖在client端,框架会额外加上第几次重试,在server端,框架会额外加上server的地址信息。
client端Controller的SetFailed()常由框架调用,比如发送request失败,接收到的response不符合要求等等。只有在进行较复杂的访问操作时用户才可能需要设置client端的错误,比如在访问后端前做额外的请求检查,发现有错误时需要把RPC设置为失败。
client端Controller的SetFailed()常由框架调用,比如发送request失败,接收到的response不符合要求等等。只有在进行较复杂的访问操作时用户才可能需要设置client端的错误,比如在访问后端前做额外的请求检查,发现有错误时把RPC设置为失败。
server端Controller的SetFailed()常由用户在服务回调中调用。当处理过程发生错误时,一般调用SetFailed()并释放资源后就return了。框架会把错误码和错误信息按交互协议填入response,client端的框架收到后会填入它那边的Controller中,从而让用户在RPC结束后取到。需要注意的是,**server端在SetFailed()时一般不需要再打条日志。**打日志是比较慢的,在繁忙的线上磁盘上,很容易出现巨大的lag。一个错误频发的client容易减慢整个server的速度而影响到其他的client,理论上来说这甚至能成为一种攻击手段。对于希望在server端看到错误信息的场景,可以打开**-log_error_text**开关(已上线服务可访问/flags/log_error_text?setvalue=true动态打开),server会在每次失败的RPC后把对应Controller的ErrorText()打印出来。
server端Controller的SetFailed()常由用户在服务回调中调用。当处理过程发生错误时,一般调用SetFailed()并释放资源后就return了。框架会把错误码和错误信息按交互协议填入response,client端的框架收到后会填入它那边的Controller中,从而让用户在RPC结束后取到。需要注意的是,**server端在SetFailed()时默认不打印送往client的错误**。打日志是比较慢的,在繁忙的线上磁盘上,很容易出现巨大的lag。一个错误频发的client容易减慢整个server的速度而影响到其他的client,理论上来说这甚至能成为一种攻击手段。对于希望在server端看到错误信息的场景,可以打开gflag **-log_error_text**(可动态开关),server会在每次失败的RPC后把对应Controller的ErrorText()打印出来。
# brpc的错误码
brpc使用的所有ErrorCode都定义在[errno.proto](https://github.com/brpc/brpc/blob/master/src/brpc/errno.proto)中,*SYS_*开头的来自linux系统,与/usr/include/errno.h中定义的精确一致,定义在proto中是为了跨语言。其余的是brpc自有的。
[berror(error_code)](https://github.com/brpc/brpc/blob/master/src/butil/errno.h)可获得error_code的描述,berror()可获得[system errno](http://www.cplusplus.com/reference/cerrno/errno/)的描述。**ErrorText() != berror(ErrorCode())**,ErrorText()会包含更具体的错误信息。brpc默认包含berror,你可以直接使用。
[berror(error_code)](https://github.com/brpc/brpc/blob/master/src/butil/errno.h)可获得error_code的描述,berror()可获得当前[system errno](http://www.cplusplus.com/reference/cerrno/errno/)的描述。**ErrorText() != berror(ErrorCode())**,ErrorText()会包含更具体的错误信息。brpc默认包含berror,你可以直接使用。
brpc中常见错误的打印内容列表如下:
......@@ -23,26 +25,26 @@ brpc中常见错误的打印内容列表如下:
| 错误码 | 数值 | 重试 | 说明 | 日志 |
| -------------- | ---- | ---- | ---------------------------------------- | ---------------------------------------- |
| EAGAIN | 11 | 是 | 同时异步发送的请求过多。软限,很少出现。 | Resource temporarily unavailable |
| EAGAIN | 11 | 是 | 同时发送的请求过多。软限,很少出现。 | Resource temporarily unavailable |
| ETIMEDOUT | 110 | 是 | 连接超时。 | Connection timed out |
| ENOSERVICE | 1001 | 否 | 找不到服务,不太出现,一般会返回ENOMETHOD。 | |
| ENOMETHOD | 1002 | 否 | 找不到方法。 | 形式广泛,常见如"Fail to find method=..." |
| EREQUEST | 1003 | 否 | request格式或序列化错误,client端和server端都可能设置 | 形式广泛:"Missing required fields in request: ...""Fail to parse request message, ...""Bad request" |
| EREQUEST | 1003 | 否 | request序列化错误,client端和server端都可能设置 | 形式广泛:"Missing required fields in request: …" "Fail to parse request message, …" "Bad request" |
| EAUTH | 1004 | 否 | 认证失败 | "Authentication failed" |
| ETOOMANYFAILS | 1005 | 否 | ParallelChannel中太多子channel失败 | "%d/%d channels failed, fail_limit=%d" |
| EBACKUPREQUEST | 1007 | 是 | 触发backup request时设置,用户一般在/rpcz里看到 | “reached backup timeout=%dms" |
| EBACKUPREQUEST | 1007 | 是 | 触发backup request时设置,不会出现在ErrorCode中,但可在/rpcz里看到 | “reached backup timeout=%dms" |
| ERPCTIMEDOUT | 1008 | 否 | RPC超时 | "reached timeout=%dms" |
| EFAILEDSOCKET | 1009 | 是 | RPC进行过程中TCP连接出现问题 | "The socket was SetFailed" |
| EHTTP | 1010 | 否 | 失败的HTTP访问(非2xx状态码)均使用这个错误码。默认不重试,可通过RetryPolicy定制 | Bad http call |
| EOVERCROWDED | 1011 | 是 | 连接上有过多的未发送数据,一般是由于同时发起了过多的异步访问。可通过参数-socket_max_unwritten_bytes控制,默认8MB。 | The server is overcrowded |
| EHTTP | 1010 | 否 | 非2xx状态码的HTTP访问结果均认为失败并被设置为这个错误码。默认不重试,可通过RetryPolicy定制 | Bad http call |
| EOVERCROWDED | 1011 | 是 | 连接上有过多的未发送数据,常由同时发起了过多的异步访问导致。可通过参数-socket_max_unwritten_bytes控制,默认8MB。 | The server is overcrowded |
| EINTERNAL | 2001 | 否 | Server端Controller.SetFailed没有指定错误码时使用的默认错误码。 | "Internal Server Error" |
| ERESPONSE | 2002 | 否 | response解析或格式错误,client端和server端都可能设置 | 形式广泛"Missing required fields in response: ...""Fail to parse response message, ""Bad response" |
| ERESPONSE | 2002 | 否 | response解析错误,client端和server端都可能设置 | 形式广泛"Missing required fields in response: ...""Fail to parse response message, ""Bad response" |
| ELOGOFF | 2003 | 是 | Server已经被Stop了 | "Server is going to quit" |
| ELIMIT | 2004 | 是 | 同时处理的请求数超过ServerOptions.max_concurrency了 | "Reached server's limit=%d on concurrent requests", |
| ELIMIT | 2004 | 是 | 同时处理的请求数超过ServerOptions.max_concurrency了 | "Reached server's limit=%d on concurrent requests" |
# 自定义错误码
在C++/C中你可以通过宏、常量、protobuf enum等方式定义ErrorCode:
在C++/C中你可以通过宏、常量、enum等方式定义ErrorCode:
```c++
#define ESTOP -114 // C/C++
static const int EMYERROR = 30; // C/C++
......@@ -53,7 +55,7 @@ const int EMYERROR2 = -31; // C++ only
BAIDU_REGISTER_ERRNO(ESTOP, "the thread is stopping")
BAIDU_REGISTER_ERRNO(EMYERROR, "my error")
```
strerror/strerror_r不认识使用BAIDU_REGISTER_ERRNO定义的错误码,自然地,printf类的函数中的%m也不能转化为对应的描述,你必须使用%s并配以berror()。
strerrorstrerror_r不认识使用BAIDU_REGISTER_ERRNO定义的错误码,自然地,printf类的函数中的%m也不能转化为对应的描述,你必须使用%s并配以berror()。
```c++
errno = ESTOP;
printf("Describe errno: %m\n"); // [Wrong] Describe errno: Unknown error -114
......@@ -61,17 +63,17 @@ printf("Describe errno: %s\n", strerror_r(errno, NULL, 0)); // [Wrong] Describ
printf("Describe errno: %s\n", berror()); // [Correct] Describe errno: the thread is stopping
printf("Describe errno: %s\n", berror(errno)); // [Correct] Describe errno: the thread is stopping
```
当同一个error code被重复注册时,如果都是在C++中定义的,那么会出现链接错误:
当同一个error code被重复注册时,那么会出现链接错误:
```
redefinition of `class BaiduErrnoHelper<30>'
```
否则在程序启动时会abort:
或者在程序启动时会abort:
```
Fail to define EMYERROR(30) which is already defined as `Read-only file system', abort
```
总的来说这和RPC框架没什么关系,直到你希望通过RPC框架传递ErrorCode。这个需求很自然,不过你得确保不同的模块对ErrorCode的理解是相同的,否则当两个模块把一个错误码理解为不同的错误时,它们之间的交互将出现无法预计的行为。为了防止这种情况出现,你最好这么做:
- 优先使用系统错误码,它们的值和含义是固定不变的。
你得确保不同的模块对ErrorCode的理解是相同的,否则当两个模块把一个错误码理解为不同的错误时,它们之间的交互将出现无法预计的行为。为了防止这种情况出现,你最好这么做:
- 优先使用系统错误码,它们的值和含义一般是固定不变的。
- 多个交互的模块使用同一份错误码定义,防止后续修改时产生不一致。
- 使用BAIDU_REGISTER_ERRNO描述新错误码,以确保同一个进程内错误码是互斥的。
brcc use [brpc::Controller](https://github.com/brpc/brpc/blob/master/src/brpc/controller.h) to set the parameters for RPC and fetch RPC result. `ErrorCode()` and `ErrorText()` are two methods of the Controller, which are the error code and error description of the RPC. It's accessible only after RPC finishes, otherwise the result is undefined. `ErrorText()` is defined by the base class of the Controller: `google::protobuf::RpcController`, while `ErrorCode()` is defined by `brpc::Controller`. Controller also has a `Failed()` method to tell whether RPC fails or not. The following shows the relationship among the three:
[中文版](../cn/error_code.md)
- When `Failed()` is true, `ErrorCode()` can't be 0 and `ErrorText()` is a non-empty error description
- When `Failed()` is false, `ErrorCode()` must be 0 and `ErrorText()` is undefined (currently in brpc it will be empty, but you should not rely on this)
brpc use [brpc::Controller](https://github.com/brpc/brpc/blob/master/src/brpc/controller.h) to set and get parameters for one RPC. `Controller::ErrorCode()` and `Controller::ErrorText()` return error code and description of the RPC respectively, only accessible after completion of the RPC, otherwise the result is undefined. `ErrorText()` is defined by the base class of the Controller: `google::protobuf::RpcController`, while `ErrorCode()` is defined by `brpc::Controller`. Controller also has a method `Failed()` to tell whether RPC fails or not. Relations between the three methods:
# Set Error to RPC
- When `Failed()` is true, `ErrorCode()` must be non-zero and `ErrorText()` be non-empty.
- When `Failed()` is false, `ErrorCode()` is 0 and `ErrorText()` is undefined (it's empty in brpc currently, but you'd better not rely on this)
Both client and server side have Controller object, through which you can use `setFailed()` to modify ErrorCode and ErrorText. Multiple calls to `Controller::SetFailed` leaves the last ErrorCode only, but ErrorText will be **concatenated** instead of overwriting. The framework will also add prefix to the ErrorText: the number of retry at the client side and the address information at the server side.
# Mark RPC as failed
`Controller::SetFailed()` at the client side is usually called by the framework, such as sending failure, incomplete response, and so on. Only under some complex situation may the user set error at the client side. For example, you may need to set error to RPC if an error was found during additional check before sending.
Both client and server in brpc have `Controller`, which can be set with `setFailed()` to modify ErrorCode and ErrorText. Multiple calls to `Controller::SetFailed` leave the last ErrorCode and **concatenate** ErrorTexts rather than leaving the last one. The framework elaborates ErrorTexts by adding extra prefixes: number of retries at client-side and address of the server at server-side.
`Controller::SetFailed()` at the server-side is often called by the user in the service callback. Generally speaking when error occurs, a user calls `SetFailed()` and then releases all the resources before return. The framework will fill the error code and error message into response according to communication protocol, and then these will be received and filled into Controller at the client side so that users can fetch them after RPC completes. Note that **it's not common to print additional error log when calling `SetFailed()` at the server side**, as logging may lead to huge lag due to heavy disk IO. An error prone client could easily slow the speed of the entire server, and thus affect other clients. This can even become a security issue in theory. If you really want to see the error message on the server side, you can open the **-log_error_text** gflag (for online service access `/flags/log_error_text?Setvalue=true` to turn it on dynamically). The server will print the ErrorText of the Controller for each failed RPC.
`Controller::SetFailed()` at client-side is usually called by the framework, such as sending failure, incomplete response, and so on. Error may be set at client-side under some situations. For example, you may set error to the RPC if an additional check before sending the request is failed.
`Controller::SetFailed()` at server-side is often called by the user in the service callback. Generally speaking when error occurs, users call `SetFailed()`, release all the resources, and return from the callback. The framework fills the error code and message into the response according to communication protocol. When the response is received, the error inside are set into the client-side Controller so that users can fetch them after end of RPC. Note that **server does not print errors to clients by default**, as frequent loggings may impact performance of the server significantly due to heavy disk IO. A client crazily producing errors could slow the entire server down and affect all other clients, which can even become an attacking method against the server. If you really want to see error messages on the server, turn on the gflag **-log_error_text** (modifiable at run-time), the server will log the ErrorText of corresponding Controller of each failed RPC.
# Error Code in brpc
All error codes in brpc are defined in [errno.proto](https://github.com/brpc/brpc/blob/master/src/brpc/errno.proto), those begin with *SYS_* come from linux system, which are exactly the same as `/usr/include/errno.h`. The reason we put it in proto is to cross language. The rest of the error codes belong to brpc itself.
All error codes in brpc are defined in [errno.proto](https://github.com/brpc/brpc/blob/master/src/brpc/errno.proto), in which those begin with *SYS_* are defined by linux system and exactly same with the ones defined in `/usr/include/errno.h`. The reason that we put it in .proto is to cross language. The rest of the error codes are defined by brpc.
You can use [berror(error_code)](https://github.com/brpc/brpc/blob/master/src/butil/errno.h) to get the error description for an error code, and `berror()` for [system errno](http://www.cplusplus.com/reference/cerrno/errno/). Note that **ErrorText() != berror(ErorCode())**, since `ErrorText()` contains more specific information. brpc includes berror by default so that you can use it in your project directly.
[berror(error_code)](https://github.com/brpc/brpc/blob/master/src/butil/errno.h) gets description for the error code, and `berror()` gets description for current [system errno](http://www.cplusplus.com/reference/cerrno/errno/). Note that **ErrorText() != berror(ErorCode())** since `ErrorText()` contains more specific information. brpc includes berror by default so that you can use it in your project directly.
The following table shows some common error codes and their description:
Following table shows common error codes and their descriptions:
| Error Code | Value | Retry | Situation | Log Message |
| -------------- | ----- | ----- | ---------------------------------------- | ---------------------------------------- |
| EAGAIN | 11 | Yes | Too many requests at the same time. Hardly happens as it's a soft limit. | Resource temporarily unavailable |
| Error Code | Value | Retriable | Description | Logging message |
| -------------- | ----- | --------- | ---------------------------------------- | ---------------------------------------- |
| EAGAIN | 11 | Yes | Too many requests at the same time, hardly happening as it's a soft limit. | Resource temporarily unavailable |
| ETIMEDOUT | 110 | Yes | Connection timeout. | Connection timed out |
| ENOSERVICE | 1001 | No | Can't locate the service. Hardly happens, usually ENOMETHOD instead | |
| ENOMETHOD | 1002 | No | Can't locate the target method. | Fail to find method=... |
| EREQUEST | 1003 | No | request格式或序列化错误,client端和server端都可能设置 | Missing required fields in request: ... |
| | | | | Fail to parse request message, ... |
| | | | | Bad request |
| EAUTH | 1004 | No | Authentication failed | Authentication failed |
| ETOOMANYFAILS | 1005 | No | Too many sub channel failure inside a ParallelChannel. | %d/%d channels failed, fail_limit=%d |
| EBACKUPREQUEST | 1007 | Yes | Trigger the backup request. Can be seen from /rpcz | reached backup timeout=%dms |
| ERPCTIMEDOUT | 1008 | No | RPC timeout. | reached timeout=%dms |
| EFAILEDSOCKET | 1009 | Yes | Connection broken during RPC | The socket was SetFailed |
| EHTTP | 1010 | No | Non 2xx status code of a HTTP request. No retry by default, but it can be changed through RetryPolicy. | Bad http call |
| EOVERCROWDED | 1011 | Yes | Too many buffering message at the sender side. Usually caused by lots of concurrent asynchronous requests. Can be tuned by `-socket_max_unwritten_bytes`, default is 8MB. | The server is overcrowded |
| EINTERNAL | 2001 | No | Default error code when calling `Controller::SetFailed` without one. | Internal Server Error |
| ERESPONSE | 2002 | No | Parsing/Format error in response. Could be set both by the client and the server. | Missing required fields in response: ... |
| | | | | Fail to parse response message, |
| | | | | Bad response |
| ELOGOFF | 2003 | Yes | Server has already been stopped | Server is going to quit |
| ELIMIT | 2004 | Yes | The number of concurrent processing requests exceeds `ServerOptions.max_concurrency` | Reached server's limit=%d on concurrent requests, |
# User-Defined Error Code
In C/C++, you can use macro, or constant or protobuf enum to define your own ErrorCode:
| ENOSERVICE | 1001 | No | Can't locate the service, hardly happening and usually being ENOMETHOD instead | |
| ENOMETHOD | 1002 | No | Can't locate the method. | Misc forms, common ones are "Fail to find method=…" |
| EREQUEST | 1003 | No | fail to serialize the request, may be set on either client-side or server-side | Misc forms: "Missing required fields in request: …" "Fail to parse request message, …" "Bad request" |
| EAUTH | 1004 | No | Authentication failed | "Authentication failed" |
| ETOOMANYFAILS | 1005 | No | Too many sub-channel failures inside a ParallelChannel | "%d/%d channels failed, fail_limit=%d" |
| EBACKUPREQUEST | 1007 | Yes | Set when backup requests are triggered. Not returned by ErrorCode() directly, viewable from spans in /rpcz | "reached backup timeout=%dms" |
| ERPCTIMEDOUT | 1008 | No | RPC timeout. | "reached timeout=%dms" |
| EFAILEDSOCKET | 1009 | Yes | The connection is broken during RPC | "The socket was SetFailed" |
| EHTTP | 1010 | No | HTTP responses with non 2xx status code are treated as failure and set with this code. No retry by default, changeable by customizing RetryPolicy. | Bad http call |
| EOVERCROWDED | 1011 | Yes | Too many messages to buffer at the sender side. Usually caused by lots of concurrent asynchronous requests. Modifiable by `-socket_max_unwritten_bytes`, 8MB by default. | The server is overcrowded |
| EINTERNAL | 2001 | No | The default error for `Controller::SetFailed` without specifying a one. | Internal Server Error |
| ERESPONSE | 2002 | No | fail to serialize the response, may be set on either client-side or server-side | Misc forms: "Missing required fields in response: …" "Fail to parse response message, " "Bad response" |
| ELOGOFF | 2003 | Yes | Server has been stopped | "Server is going to quit" |
| ELIMIT | 2004 | Yes | Number of requests being processed concurrently exceeds `ServerOptions.max_concurrency` | "Reached server's limit=%d on concurrent requests" |
# User-defined Error Code
In C/C++, error code can be defined in macros, constants or enums:
```c++
#define ESTOP -114 // C/C++
......@@ -52,16 +50,14 @@ static const int EMYERROR = 30; // C/C++
const int EMYERROR2 = -31; // C++ only
```
If you need to get the error description through berror, you can register it in the global scope of your c/cpp file by:
`BAIDU_REGISTER_ERRNO(error_code, description)`
If you need to get the error description through `berror`, register it in the global scope of your c/cpp file by `BAIDU_REGISTER_ERRNO(error_code, description)`, for example:
```c++
BAIDU_REGISTER_ERRNO(ESTOP, "the thread is stopping")
BAIDU_REGISTER_ERRNO(EMYERROR, "my error")
```
Note that `strerror/strerror_r` can't recognize error codes defined by `BAIDU_REGISTER_ERRNO`. Neither can `%m` inside `printf`. You must use `%s` along with `berror`:
Note that `strerror` and `strerror_r` do not recognize error codes defined by `BAIDU_REGISTER_ERRNO`. Neither does the `%m` used in `printf`. You must use `%s` paired with `berror`:
```c++
errno = ESTOP;
......@@ -71,20 +67,20 @@ printf("Describe errno: %s\n", berror()); // [Correct] Descr
printf("Describe errno: %s\n", berror(errno)); // [Correct] Describe errno: the thread is stopping
```
When an error code has already been registered, it will cause a link error if it's defined in C++:
When the registration of an error code is duplicated, a linking error is generated provided it's defined in C++:
```
redefinition of `class BaiduErrnoHelper<30>'
```
Otherwise, the program will abort once starts:
Or the program aborts before start:
```
Fail to define EMYERROR(30) which is already defined as `Read-only file system', abort
```
In general this has nothing to do with the RPC framework unless you want to pass ErrorCode through it. It's a natural scenario but you have to make sure that different modules have the same understanding of the same ErrorCode. Otherwise, the result is unpredictable if two modules interpret an error code differently. In order to prevent this from happening, you'd better follow these:
You have to make sure that different modules have same understandings on same ErrorCode. Otherwise, interactions between two modules that interpret an error code differently may be undefined. To prevent this from happening, you'd better follow these:
- Prefer system error codes since their meanings are fixed.
- Use the same code for error definitions among multiple modules to prevent inconsistencies during later modifications.
- Use `BAIDU_REGISTER_ERRNO` to describe a new error code to ensure that the same error code is mutually exclusive inside a process.
\ No newline at end of file
- Prefer system error codes which have fixed values and meanings, generally.
- Share code on error definitions between multiple modules to prevent inconsistencies after modifications.
- Use `BAIDU_REGISTER_ERRNO` to describe new error code to ensure that same error code is defined only once inside a process.
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment