Commit 727e193e authored by gejun's avatar gejun

review http_client.md and add mutual links between cn/en inside

parent 5cd596b5
http client的例子见[example/http_c++](https://github.com/brpc/brpc/blob/master/example/http_c++/http_client.cpp)
# 示例
[example/http_c++](https://github.com/brpc/brpc/blob/master/example/http_c++/http_client.cpp)
# 创建Channel
......@@ -15,7 +17,7 @@ if (channel.Init("www.baidu.com" /*any url*/, &options) != 0) {
}
```
http channel也支持bns地址。
http channel也支持bns地址或其他NamingService
# GET
......@@ -25,13 +27,13 @@ cntl.http_request().uri() = "www.baidu.com/index.html"; // 设置为待访问
channel.CallMethod(NULL, &cntl, NULL, NULL, NULL/*done*/);
```
HTTP和protobuf无关,所以除了Controller和done,CallMethod的其他参数均为NULL。如果要异步操作,最后一个参数传入done。
HTTP和protobuf关系不大,所以除了Controller和done,CallMethod的其他参数均为NULL。如果要异步操作,最后一个参数传入done。
`cntl.response_attachment()`是回复的body,类型也是butil::IOBuf。注意IOBuf转化为std::string(通过to_string()接口)是需要分配内存并拷贝所有内容的,如果关注性能,你的处理过程应该尽量直接支持IOBuf,而不是要求连续内存。
`cntl.response_attachment()`是回复的body,类型也是butil::IOBuf。IOBuf可通过to_string()转化为std::string,但是需要分配内存并拷贝所有内容,如果关注性能,处理过程应直接支持IOBuf,而不要求连续内存。
# POST
默认的HTTP Method为GET,如果需要做POST,则需要设置。待POST的数据应置入request_attachment(),它([butil::IOBuf](https://github.com/brpc/brpc/blob/master/src/butil/iobuf.h))可以直接append std::string或char*
默认的HTTP Method为GET,可设置为POST或[更多http method](https://github.com/brpc/brpc/blob/master/src/brpc/http_method.h)。待POST的数据应置入request_attachment(),它([butil::IOBuf](https://github.com/brpc/brpc/blob/master/src/butil/iobuf.h))可以直接append std::string或char*
```c++
brpc::Controller cntl;
......@@ -41,7 +43,7 @@ cntl.request_attachment().append("{\"message\":\"hello world!\"}");
channel.CallMethod(NULL, &cntl, NULL, NULL, NULL/*done*/);
```
需要大量打印过程的body建议使用butil::IOBufBuilder,它的用法和std::ostringstream是一样的。对于有大量对象要打印的场景,IOBufBuilder会简化代码,并且效率也更高。
需要大量打印过程的body建议使用butil::IOBufBuilder,它的用法和std::ostringstream是一样的。对于有大量对象要打印的场景,IOBufBuilder简化了代码,效率也可能比c-style printf更高。
```c++
brpc::Controller cntl;
......@@ -79,13 +81,21 @@ URL的一般形式如下图:
// interpretable as extension
```
问题:Channel为什么不直接利用Init时传入的URL,而需要给uri()再设置一次?
在上面例子中可以看到,Channel.Init()和cntl.http_request().uri()被设置了相同的URL。为什么Channel为什么不直接利用Init时传入的URL,而需要给uri()再设置一次?
确实,在简单使用场景下,这两者有所重复,但在复杂场景中,两者差别很大,比如:
- 访问挂在bns下的多个http server。此时Channel.Init传入的是bns节点名称,对uri()的赋值则是包含Host的完整URL(比如"www.foo.com/index.html?name=value"),BNS下所有的http server都会看到"Host: www.foo.com";uri()也可以是只包含路径的URL,比如"/index.html?name=value",框架会以目标server的ip和port为Host,地址为10.46.188.39:8989的http server将会看到"Host: 10.46.188.39:8989"
- 访问名字服务(如BNS)下的多个http server。此时Channel.Init传入的是对该名字服务有意义的名称(如BNS中的节点名称),对uri()的赋值则是包含Host的完整URL(比如"www.foo.com/index.html?name=value")
- 通过http proxy访问目标server。此时Channel.Init传入的是proxy server的地址,但uri()填入的是目标server的URL。
## Host字段
若用户自己填写了host字段(http header),框架不会修改。
若用户没有填且URL中包含host,比如http://www.foo.com/path,则http request中会包含"Host: www.foo.com"。
若用户没有填且URL不包含host,比如"/index.html?name=value",则框架会以目标server的ip和port为Host,地址为10.46.188.39:8989的http server将会看到"Host: 10.46.188.39:8989"。
# 常见设置
以http request为例 (对response的操作自行替换), 常见操作方式如下所示:
......@@ -132,19 +142,17 @@ os.move_to(cntl->request_attachment());
Notes on http header:
- 根据 HTTP 协议[规定](http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2), header 的 field_name部分不区分大小写。brpc对于field_name大小写保持不变,且仍然支持大小写不敏感
- 如果 HTTP 头中出现了相同的 field_name, 根据协议[规定](http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2),value将被合并到一起, 中间用逗号(,) 分隔, 具体value如何理解,需要用户自己确定.
- 根据[rfc2616](http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2),header的field_name部分不区分大小写。brpc支持大小写不敏感,同时还能在打印时保持field_name大小写与用户设定的相同
- 如果HTTP头中出现了相同的field_name, 根据[rfc2616](http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2),value应合并到一起,用逗号(,)分隔,用户自己确定如何理解和处理此类value.
- query之间用"&"分隔, key和value之间用"="分隔, value可以省略,比如key1=value1&key2&key3=value3中key2是合理的query,值为空字符串。
# 查看client发出的请求和收到的回复
# 查看HTTP消息
打开[-http_verbose](http://brpc.baidu.com:8765/flags/http_verbose)即可在stderr看到所有的http request和response,注意这应该只用于线下调试,而不是线上程序。
# HTTP的错误处理
当Server返回的http status code不是2xx时,该次http访问即视为失败,client端会设置对应的ErrorCode:
# HTTP错误
- 所有错误被统一为EHTTP。如果用户发现`cntl->ErrorCode()`为EHTTP,那么可以检查`cntl->http_response().status_code()`以获得具体的http错误。同时http body会置入`cntl->response_attachment()`,用户可以把代表错误的html或json传递回来。
当Server返回的http status code不是2xx时,该次http访问被视为失败,client端会把`cntl->ErrorCode()`设置为EHTTP,用户可通过`cntl->http_response().status_code()`获得具体的http错误。同时server端可以把代表错误的html或json置入`cntl->response_attachment()`作为http body传递回来。
# 压缩request body
......@@ -156,7 +164,7 @@ Notes on http header:
# 解压response body
出于通用性考虑且解压代码不复杂,brpc不会自动解压response body,用户可以自己做,方法如下:
出于通用性考虑brpc不会自动解压response body,解压代码并不复杂,用户可以自己做,方法如下:
```c++
#include <brpc/policy/gzip_compress.h>
......@@ -175,7 +183,7 @@ if (encoding != NULL && *encoding == "gzip") {
# 持续下载
通常下载一个超长的body时,需要一直等待直到body完整才会视作RPC结束,这个过程中超长body都会存在内存中,如果body是无限长的(比如直播用的flv文件),那么内存会持续增长,直到超时。这样的http client不适合下载大文件。
http client往往需要等待到body下载完整才结束RPC,这个过程中body都会存在内存中,如果body超长或无限长(比如直播用的flv文件),那么内存会持续增长,直到超时。这样的http client不适合下载大文件。
brpc client支持在读取完body前就结束RPC,让用户在RPC结束后再读取持续增长的body。注意这个功能不等同于“支持http chunked mode”,brpc的http实现一直支持解析chunked mode,这里的问题是如何让用户处理超长或无限长的body,和body是否以chunked mode传输无关。
......
Examples for Http Client: [example/http_c++](https://github.com/brpc/brpc/blob/master/example/http_c++/http_client.cpp)
# Example
[example/http_c++](https://github.com/brpc/brpc/blob/master/example/http_c++/http_client.cpp)
# Create Channel
In order to use`brpc::Channel` to access the HTTP service, `ChannelOptions.protocol` must be specified as `PROTOCOL_HTTP`.
In order to use `brpc::Channel` to access HTTP services, `ChannelOptions.protocol` must be set to `PROTOCOL_HTTP`.
After setting the HTTP protocol, the first parameter of `Channel::Init` can be any valid URL. *Note*: We only use the host and port part inside the URL here in order to save the user from additional parsing work. Other parts of the URL in `Channel::Init` will be discarded.
Once the HTTP protocol is set, the first parameter of `Channel::Init` can be any valid URL. *Note*: Only host and port inside the URL are used by Init(), other parts are discarded. Allowing full URL simply saves the user from additional parsing code.
```c++
brpc::ChannelOptions options;
......@@ -15,7 +17,7 @@ if (channel.Init("www.baidu.com" /*any url*/, &options) != 0) {
}
```
http channel also support BNS address.
http channel also support BNS address or other naming services.
# GET
......@@ -25,13 +27,13 @@ cntl.http_request().uri() = "www.baidu.com/index.html"; // Request URL
channel.CallMethod(NULL, &cntl, NULL, NULL, NULL/*done*/);
```
HTTP has nothing to do with protobuf, so every parameters of `CallMethod` are NULL except `Controller` and `done`, which can be used to issue RPC asynchronously.
HTTP does not relate to protobuf much, thus all parameters of `CallMethod` are NULL except `Controller` and `done`. Issue asynchronous RPC with non-NULL `done`.
`cntl.response_attachment ()` is the response body whose type is `butil :: IOBuf`. Note that converting `IOBuf` to `std :: string` using `to_string()` needs to allocate memory and copy all the content. As a result, if performance comes first, you should use `IOBuf` directly rather than continuous memory.
`cntl.response_attachment()` is body of the http response and typed `butil::IOBuf`. `IOBuf` can be converted to `std::string` by `to_string()`, which needs to allocate memory and copy all data. If performance is important, the code should consider supporting `IOBuf` directly rather than requiring continuous memory.
# POST
The default HTTP Method is GET. You can set the method to POST if needed, and you should append the POST data into `request_attachment()`, which ([butil::IOBuf](https://github.com/brpc/brpc/blob/master/src/butil/iobuf.h)) supports `std :: string` or `char *`
The default HTTP Method is GET, which can be changed to POST or [other http methods](https://github.com/brpc/brpc/blob/master/src/brpc/http_method.h). The data to POST should be put into `request_attachment()`, which is typed [butil::IOBuf](https://github.com/brpc/brpc/blob/master/src/butil/iobuf.h) and able to append `std :: string` or `char *` directly.
```c++
brpc::Controller cntl;
......@@ -41,7 +43,7 @@ cntl.request_attachment().append("{\"message\":\"hello world!\"}");
channel.CallMethod(NULL, &cntl, NULL, NULL, NULL/*done*/);
```
If you need a lot print, we suggest using `butil::IOBufBuilder`, which has the same interface as `std::ostringstream`. It's much simpler and more efficient to print lots of objects using `butil::IOBufBuilder`.
If the body needs a lot of printing to build, consider using `butil::IOBufBuilder`, which has same interfaces as `std::ostringstream`, probably simpler and more efficient than c-style printf when lots of objects need to be printed.
```c++
brpc::Controller cntl;
......@@ -55,7 +57,7 @@ channel.CallMethod(NULL, &cntl, NULL, NULL, NULL/*done*/);
# URL
Below is the normal form of an URL:
Genaral form of an URL:
```
// URI scheme : http://en.wikipedia.org/wiki/URI_scheme
......@@ -79,16 +81,24 @@ Below is the normal form of an URL:
// interpretable as extension
```
Here's the question, why to pass URL parameter twice (via `set_uri`) instead of using the URL inside `Channel::Init()` ?
As we saw in examples above, `Channel.Init()` and `cntl.http_request().uri()` both need the URL. Why does `uri()` need to be set additionally rather than using the URL to `Init()` directly?
Indeed, the settings are repeated in simple cases. But they are different in more complex scenes:
- Access multiple servers under a NamingService (for example BNS), in which case `Channel::Init` accepts a name meaningful to the NamingService(for example node names in BNS), while `uri()` is assigned with the URL.
- Access servers via http proxy, in which case `Channel::Init` takes the address of the proxy server, while `uri()` is still assigned with the URL.
## Host header
For most simple cases, it's a repeat work. But in complex scenes, they are very different in:
If user already sets `Host` (a http header), framework makes no change.
- Access multiple servers under a BNS node. At this time `Channel::Init` accepts the BNS node name, the value of `set_uri()` is the whole URL including Host (such as `www.foo.com/index.html?name=value`). As a result, all servers under BNS will see `Host: www.foo.com`. `set_uri()` also takes URL with the path only, such as `/index.html?name=value`. RPC framework will automatically fill the `Host` header using of the target server's ip and port. For example, http server at 10.46.188.39: 8989 will see `Host: 10.46.188.39: 8989`.
- Access the target server via http proxy. At this point `Channel::Init` takes the address of the proxy server, while `set_uri()` takes the URL of the target server.
If user does not set `Host` header and the URL has host, for example http://www.foo.com/path, the http request contains "Host: www.foo.com".
# Basic Usage
If user does not set host header and the URL does not have host as well, for example "/index.html?name=value", framework sets `Host` header with IP and port of the target server. A http server at 10.46.188.39:8989 should see `Host: 10.46.188.39:8989`.
We use `http request` as example (which is the same to `http response`). Here's some basic operations:
# Common usages
Take http request as an example (similar with http response), common operations are listed as follows:
Access an HTTP header named `Foo`
......@@ -132,7 +142,7 @@ Set the `content-type`
cntl->http_request().set_content_type("text/plain");
```
Access HTTP body
Get HTTP body
```c++
butil::IOBuf& buf = cntl->request_attachment();
......@@ -149,29 +159,27 @@ os.move_to(cntl->request_attachment());
Notes on http header:
- The field_name of the header is case-insensitive according to [standard](http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2). The framework supports that while leaving the case unchanged.
- If we have multiple headers with the same field_name, according to [standard](http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2), values will be merged together separating by comma (,). Users should figure out how to use this value according to own needs.
- Queries are separated by "&", while key and value are partitioned by "=". Value may be omitted. For example, `key1=value1&key2&key3=value3` is a valid query string, and the value for `key2` is an empty string.
# Debug for HTTP client
- field_name of the header is case-insensitive according to [rfc2616](http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2). brpc supports case-insensitive field names and keeps same cases at printing as users set.
- If multiple headers have same field names, according to [rfc2616](http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2), values should be merged and separated by comma (,). Users figure out how to use this kind of values by their own.
- Queries are separated by "&" and key/value in a query are separated by "=". Values can be omitted. For example, `key1=value1&key2&key3=value3` is a valid query string, in which the value for `key2` is an empty string.
Turn on [-http_verbose](http://brpc.baidu.com:8765/flags/http_verbose) so that the framework will print each request and response in stderr. Note that this should only be used for test and debug rather than online cases.
# Debug HTTP messages
# Error Handle for HTTP
Turn on [-http_verbose](http://brpc.baidu.com:8765/flags/http_verbose) so that the framework prints each http request and response in stderr. Note that this should only be used in tests or debuggings rather than online services.
When server returns a non-2xx HTTP status code, the HTTP request is considered to be failed and sets the corresponding ErrorCode:
# HTTP errors
- All errors are unified as `EHTTP`. If you find `cntl->ErrorCode()` as `EHTTP`, you can check `cntl-> http_response().status_code()` to get a more specific HTTP error. In the meanwhile, HTTP body will be placed inside `cntl->response_attachment()`, you can check for error body such as html or json there.
When server returns a non-2xx HTTP status code, the HTTP RPC is considered to be failed and `cntl->ErrorCode()` at client-side is set to `EHTTP`, users can check `cntl-> http_response().status_code()` for more specific HTTP error. In addition, server can put html or json describing the error into `cntl->response_attachment()` which is sent back to the client as http body.
# Compress Request Body
Calling `Controller::set_request_compress_type(brpc::COMPRESS_TYPE_GZIP)` makes framework try to gzip the HTTP body. "try to" means the compression may not happen, because:
`Controller::set_request_compress_type(brpc::COMPRESS_TYPE_GZIP)` makes framework try to gzip the HTTP body. "try to" means the compression may not happen, because:
* Size of body is smaller than bytes specified by -http_body_compress_threshold, which is 512 by default. The reason is that gzip is not a very fast compression algorithm, when body is small, the delay caused by compression may even larger than the latency saved by faster transportation.
# Decompress Response Body
For generality, brpc will not decompress response body automatically. You can do it yourself as the code won't be complicate:
brpc does not decompress bodies of responses automatically due to universality. The decompression code is not complicated and users can do it by themselves. The code is as follows:
```c++
#include <brpc/policy/gzip_compress.h>
......@@ -188,13 +196,15 @@ if (encoding != NULL && *encoding == "gzip") {
// Now cntl->response_attachment() contains the decompressed data
```
# Continuous Download
# Progressively Download
http client normally does not complete the RPC until http body has been fully downloaded. During the process http body is stored in memory. If the body is very large or infinitely large(a FLV file for live streaming), memory grows continuously until the RPC is timedout. Such http clients are not suitable for downloading very large files.
When downloading a large file, normally the client needs to wait until the whole file has been loaded into its memory to finish this RPC. In order to leverage the problem of memory growth and RPC resourses, in brpc the client can end its RPC first and then continuously read the rest of the file. Note that it's not HTTP chunked mode as brpc always supports for parsing chunked mode body. This is the solution to allow user the deal with super large body.
brpc client supports completing RPC before reading the full body, so that users can read http bodies progressively after RPC. Note that this feature does not mean "support for http chunked mode", actually the http implementation in brpc supports chunked mode from the very beginning. The real issue is how to let users handle very or infinitely large http bodies, which does not imply the chunked mode.
Basic usage:
How to use:
1. Implement ProgressiveReader:
1. Implement ProgressiveReader below:
```c++
#include <brpc/progressive_reader.h>
......@@ -218,16 +228,16 @@ Basic usage:
};
```
`OnReadOnePart` is called each time data is read. `OnEndOfMessage` is called each time data has finished or connection has broken. Please refer to comments before implementing.
`OnReadOnePart` is called each time a piece of data is read. `OnEndOfMessage` is called at the end of data or the connection is broken. Read comments carefully before implementing.
2. Set `cntl.response_will_be_read_progressively();` before RPC so that brpc knows to end RPC after reading the header part.
2. Set `cntl.response_will_be_read_progressively();` before RPC to make brpc end RPC just after reading all headers.
3. Call `cntl.ReadProgressiveAttachmentBy(new MyProgressiveReader);` after RPC so that you can use your own implemented object `MyProgressiveReader` . You may delete this object inside `OnEndOfMessage`.
3. Call `cntl.ReadProgressiveAttachmentBy(new MyProgressiveReader);` after RPC. `MyProgressiveReader` is an instance of user-implemented `ProgressiveReader`. User may delete the object inside `OnEndOfMessage`.
# Continuous Upload
# Progressively Upload
Currently the POST data should be intact so that we do not support large POST body.
Currently the POST data should be intact before launching the http call, thus brpc http client is still not suitable for uploading very large bodies.
# Access Server with Authentication
# Access Servers with authentications
Generate `auth_data` according to the server's authentication method and then set it into header `Authorization`. This is the same as using curl to add option `-H "Authorization : <auth_data>"`.
\ No newline at end of file
Generate `auth_data` according to authenticating method of the server and set it into `Authorization` header. If you're using curl, add option `-H "Authorization : <auth_data>"`.
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment