combo_channel.md 23.4 KB
Newer Older
gejun's avatar
gejun committed
1 2 3
[English version](../en/combo_channel.md)

随着服务规模的增大,对下游的访问流程会越来越复杂,其中往往包含多个同时发起的RPC或有复杂的层次结构。但这类代码的多线程陷阱很多,用户可能写出了bug也不自知,复现和调试也比较困难。而且实现要么只能支持同步的情况,要么要么得为异步重写一套。以"在多个异步RPC完成后运行一些代码"为例,它的同步实现一般是异步地发起多个RPC,然后逐个等待各自完成;它的异步实现一般是用一个带计数器的回调,每当一个RPC完成时计数器减一,直到0时调用回调。可以看到它的缺点:
gejun's avatar
gejun committed
4 5

- 同步和异步代码不一致。用户无法轻易地从一个模式转为另一种模式。从设计的角度,不一致暗示了没有抓住本质。
gejun's avatar
gejun committed
6 7
- 往往不能被取消。正确及时地取消一个操作不是一件易事,何况是组合访问。但取消对于终结无意义的等待是很必要的。
- 不能继续组合。比如你很难把一个上述实现变成“更大"的访问模式的一部分。换个场景还得重写一套。
gejun's avatar
gejun committed
8

gejun's avatar
gejun committed
9
我们需要更好的抽象。如果我们能以不同的方式把一些Channel组合为更大的Channel,并把不同的访问模式置入其中,那么用户可以便用统一接口完成同步、异步、取消等操作。这种channel在brpc中被称为组合channel。
gejun's avatar
gejun committed
10 11 12

# ParallelChannel

gejun's avatar
gejun committed
13
ParallelChannel (有时被称为“pchan”)同时访问其包含的sub channel,并合并它们的结果。用户可通过CallMapper修改请求,通过ResponseMerger合并结果。ParallelChannel看起来就像是一个Channel:
gejun's avatar
gejun committed
14 15 16 17 18 19

- 支持同步和异步访问。
- 发起异步操作后可以立刻删除。
- 可以取消。
- 支持超时。

gejun's avatar
gejun committed
20
示例代码见[example/parallel_echo_c++](https://github.com/brpc/brpc/tree/master/example/parallel_echo_c++/)
gejun's avatar
gejun committed
21

gejun's avatar
gejun committed
22
任何brpc::ChannelBase的子类都可以加入ParallelChannel,包括ParallelChannel和其他组合Channel。用户可以设置ParallelChannelOptions.fail_limit来控制访问的最大失败次数,当失败的访问达到这个数目时,RPC会立刻结束而不等待超时。
gejun's avatar
gejun committed
23

24
一个sub channel可多次加入同一个ParallelChannel。当你需要对同一个服务发起多次异步访问并等待它们完成的话,这很有用。
gejun's avatar
gejun committed
25 26 27

ParallelChannel的内部结构大致如下:

gejun's avatar
gejun committed
28
![img](../images/pchan.png)
gejun's avatar
gejun committed
29 30 31 32 33 34

## 插入sub channel

可通过如下接口把sub channel插入ParallelChannel:

```c++
35
int AddChannel(brpc::ChannelBase* sub_channel,
gejun's avatar
gejun committed
36 37 38 39 40
               ChannelOwnership ownership,
               CallMapper* call_mapper,
               ResponseMerger* response_merger);
```

gejun's avatar
gejun committed
41
当ownership为brpc::OWNS_CHANNEL时,sub_channel会在ParallelChannel析构时被删除。一个sub channel可能会多次加入一个ParallelChannel,如果其中一个指明了ownership为brpc::OWNS_CHANNEL,那个sub channel会在ParallelChannel析构时被最多删除一次。
gejun's avatar
gejun committed
42

gejun's avatar
gejun committed
43
访问ParallelChannel时调用AddChannel是**线程不安全**的。
gejun's avatar
gejun committed
44 45 46

## CallMapper

47
用于把对ParallelChannel的调用转化为对sub channel的调用。如果call_mapper是NULL,sub channel的请求就是ParallelChannel的请求,而response则New()自ParallelChannel的response。如果call_mapper不为NULL,则会在ParallelChannel析构时被删除。call_mapper内含引用计数,一个call_mapper可与多个sub channel关联。
gejun's avatar
gejun committed
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79

```c++
class CallMapper {
public:
    virtual ~CallMapper();
 
    virtual SubCall Map(int channel_index/*starting from 0*/,
                        const google::protobuf::MethodDescriptor* method,
                        const google::protobuf::Message* request,
                        google::protobuf::Message* response) = 0;
};
```

channel_index:该sub channel在ParallelChannel中的位置,从0开始计数。

method/request/response:ParallelChannel.CallMethod()的参数。

返回的SubCall被用于访问对应sub channel,SubCall有两个特殊值:

- 返回SubCall::Bad()则对ParallelChannel的该次访问立刻失败,Controller.ErrorCode()为EREQUEST。
- 返回SubCall::Skip()则跳过对该sub channel的访问,如果所有的sub channel都被跳过了,该次访问立刻失败,Controller.ErrorCode()为ECANCELED。

常见的Map()实现有:

- 广播request。这也是call_mapper为NULL时的行为:
```c++
  class Broadcaster : public CallMapper {
  public:
      SubCall Map(int channel_index/*starting from 0*/,
                  const google::protobuf::MethodDescriptor* method,
                  const google::protobuf::Message* request,
                  google::protobuf::Message* response) {
gejun's avatar
gejun committed
80 81
          // method/request和pchan保持一致.
          // response是new出来的,最后的flag告诉pchan在RPC结束后删除Response。
gejun's avatar
gejun committed
82 83 84 85 86 87 88 89 90 91 92 93
          return SubCall(method, request, response->New(), DELETE_RESPONSE);
      }
  };
```
- 修改request中的字段后再发。
```c++
  class ModifyRequest : public CallMapper {
  public:
    SubCall Map(int channel_index/*starting from 0*/,
                const google::protobuf::MethodDescriptor* method,
                const google::protobuf::Message* request,
                google::protobuf::Message* response) {
94
        FooRequest* copied_req = brpc::Clone<FooRequest>(request);
gejun's avatar
gejun committed
95 96 97 98 99 100
        copied_req->set_xxx(...);
        // 拷贝并修改request,最后的flag告诉pchan在RPC结束后删除Request和Response。
        return SubCall(method, copied_req, response->New(), DELETE_REQUEST | DELETE_RESPONSE);
    }
  };
```
gejun's avatar
gejun committed
101
- request和response已经包含了sub request/response,直接取出来。
gejun's avatar
gejun committed
102 103 104 105 106 107 108 109
```c++
  class UseFieldAsSubRequest : public CallMapper {
  public:
    SubCall Map(int channel_index/*starting from 0*/,
                const google::protobuf::MethodDescriptor* method,
                const google::protobuf::Message* request,
                google::protobuf::Message* response) {
        if (channel_index >= request->sub_request_size()) {
gejun's avatar
gejun committed
110 111
            // sub_request不够,说明外面准备数据的地方和pchan中sub channel的个数不符.
            // 返回Bad()让该次访问立刻失败
gejun's avatar
gejun committed
112 113
            return SubCall::Bad();
        }
gejun's avatar
gejun committed
114
        // 取出对应的sub request,增加一个sub response,最后的flag为0告诉pchan什么都不用删
gejun's avatar
gejun committed
115 116 117 118 119 120 121
        return SubCall(sub_method, request->sub_request(channel_index), response->add_sub_response(), 0);
    }
  };
```

## ResponseMerger

gejun's avatar
gejun committed
122
response_merger把sub channel的response合并入总的response,其为NULL时,则使用response->MergeFrom(*sub_response),MergeFrom的行为可概括为“除了合并repeated字段,其余都是覆盖”。如果你需要更复杂的行为,则需实现ResponseMerger。response_merger是一个个执行的,所以你并不需要考虑多个Merge同时运行的情况。response_merger在ParallelChannel析构时被删除。response_merger内含引用计数,一个response_merger可与多个sub channel关联。
gejun's avatar
gejun committed
123 124 125

Result的取值有:
- MERGED: 成功合并。
gejun's avatar
gejun committed
126 127
- FAIL: sub_response没有合并成功,会被记作一次失败。比如有10个sub channels且fail_limit为4,只要有4个合并结果返回了FAIL,这次RPC就会达到fail_limit并立刻结束。
- FAIL_ALL: 使本次RPC直接结束。
gejun's avatar
gejun committed
128 129 130 131 132 133


## 获得访问sub channel时的controller

有时访问者需要了解访问sub channel时的细节,通过Controller.sub(i)可获得访问sub channel的controller.

gejun's avatar
gejun committed
134
```c++
gejun's avatar
gejun committed
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154
// Get the controllers for accessing sub channels in combo channels.
// Ordinary channel:
//   sub_count() is 0 and sub() is always NULL.
// ParallelChannel/PartitionChannel:
//   sub_count() is #sub-channels and sub(i) is the controller for
//   accessing i-th sub channel inside ParallelChannel, if i is outside
//    [0, sub_count() - 1], sub(i) is NULL.
//   NOTE: You must test sub() against NULL, ALWAYS. Even if i is inside
//   range, sub(i) can still be NULL:
//   * the rpc call may fail and terminate before accessing the sub channel
//   * the sub channel was skipped
// SelectiveChannel/DynamicPartitionChannel:
//   sub_count() is always 1 and sub(0) is the controller of successful
//   or last call to sub channels.
int sub_count() const;
const Controller* sub(int index) const;
```

# SelectiveChannel

gejun's avatar
gejun committed
155
[SelectiveChannel](https://github.com/brpc/brpc/blob/master/src/brpc/selective_channel.h) (有时被称为“schan”)按负载均衡算法访问其包含的Channel,相比普通Channel它更加高层:把流量分给sub channel,而不是具体的Server。SelectiveChannel主要用来支持机器组之间的负载均衡,它具备Channel的主要属性:
gejun's avatar
gejun committed
156 157 158 159 160 161

- 支持同步和异步访问。
- 发起异步操作后可以立刻删除。
- 可以取消。
- 支持超时。

gejun's avatar
gejun committed
162
示例代码见[example/selective_echo_c++](https://github.com/brpc/brpc/tree/master/example/selective_echo_c++/)
gejun's avatar
gejun committed
163

164
任何brpc::ChannelBase的子类都可加入SelectiveChannel,包括SelectiveChannel和其他组合Channel。
gejun's avatar
gejun committed
165

gejun's avatar
gejun committed
166
SelectiveChannel的重试独立于其中的sub channel,当SelectiveChannel访问某个sub channel失败后(本身可能重试),它会重试另外一个sub channel。
gejun's avatar
gejun committed
167

gejun's avatar
gejun committed
168
目前SelectiveChannel要求**request必须在RPC结束前有效**,其他channel没有这个要求。如果你使用SelectiveChannel发起异步操作,确保request在done中才被删除。
gejun's avatar
gejun committed
169 170 171

## 使用SelectiveChannel

gejun's avatar
gejun committed
172
SelectiveChannel的初始化和普通Channel基本一样,但Init不需要指定名字服务,因为SelectiveChannel通过AddChannel动态添加sub channel,而普通Channel通过名字服务动态管理server。
gejun's avatar
gejun committed
173 174

```c++
gejun's avatar
gejun committed
175
#include <brpc/selective_channel.h>
gejun's avatar
gejun committed
176
...
177 178
brpc::SelectiveChannel schan;
brpc::ChannelOptions schan_options;
gejun's avatar
gejun committed
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
schan_options.timeout_ms = ...;
schan_options.backup_request_ms = ...;
schan_options.max_retry = ...;
if (schan.Init(load_balancer, &schan_options) != 0) {
    LOG(ERROR) << "Fail to init SelectiveChannel";
    return -1;
}
```

初始化完毕后通过AddChannel加入sub channel。

```c++
if (schan.AddChannel(sub_channel, NULL/*ChannelHandle*/) != 0) {  // 第二个参数ChannelHandle用于删除sub channel,不用删除可填NULL
    LOG(ERROR) << "Fail to add sub_channel";
    return -1;
}
```

注意:

- 和ParallelChannel不同,SelectiveChannel的AddChannel可在任意时刻调用,即使该SelectiveChannel正在被访问(下一次访问时生效)
- SelectiveChannel总是own sub channel,这和ParallelChannel可选择ownership是不同的。
201
- 如果AddChannel第二个参数不为空,会填入一个类型为brpc::SelectiveChannel::ChannelHandle的值,这个handle可作为RemoveAndDestroyChannel的参数来动态删除一个channel。
gejun's avatar
gejun committed
202
- SelectiveChannel会用自身的超时覆盖sub channel初始化时指定的超时。比如某个sub channel的超时为100ms,SelectiveChannel的超时为500ms,实际访问时的超时是500ms。
gejun's avatar
gejun committed
203 204 205

访问SelectiveChannel的方式和普通Channel是一样的。

gejun's avatar
gejun committed
206
## 例子: 往多个名字服务分流
gejun's avatar
gejun committed
207

gejun's avatar
gejun committed
208
一些场景中我们需要向多个名字服务下的机器分流,原因可能有:
gejun's avatar
gejun committed
209

gejun's avatar
gejun committed
210
- 完成同一个检索功能的机器被挂载到了不同的名字服务下。
gejun's avatar
gejun committed
211 212 213 214
- 机器被拆成了多个组,流量先分流给一个组,再分流到组内机器。组间的分流方式和组内有所不同。

这都可以通过SelectiveChannel完成。

gejun's avatar
gejun committed
215
下面的代码创建了一个SelectiveChannel,并插入三个访问不同bns的普通Channel。
gejun's avatar
gejun committed
216 217

```c++
218 219
brpc::SelectiveChannel channel;
brpc::ChannelOptions schan_options;
gejun's avatar
gejun committed
220 221 222 223 224 225 226 227
schan_options.timeout_ms = FLAGS_timeout_ms;
schan_options.max_retry = FLAGS_max_retry;
if (channel.Init("c_murmurhash", &schan_options) != 0) {
    LOG(ERROR) << "Fail to init SelectiveChannel";
    return -1;
}
 
for (int i = 0; i < 3; ++i) {
228
    brpc::Channel* sub_channel = new brpc::Channel;
gejun's avatar
gejun committed
229
    if (sub_channel->Init(ns_node_name[i], "rr", NULL) != 0) {
gejun's avatar
gejun committed
230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245
        LOG(ERROR) << "Fail to init sub channel " << i;
        return -1;
    }
    if (channel.AddChannel(sub_channel, NULL/*handle for removal*/) != 0) {
        LOG(ERROR) << "Fail to add sub_channel to channel";
        return -1;
    } 
}
...
XXXService_Stub stub(&channel);
stub.FooMethod(&cntl, &request, &response, NULL);
...
```

# PartitionChannel

gejun's avatar
gejun committed
246
[PartitionChannel](https://github.com/brpc/brpc/blob/master/src/brpc/partition_channel.h)是特殊的ParallelChannel,它会根据名字服务中的tag自动建立对应分库的sub channel。这样用户就可以把所有的分库机器挂在一个名字服务内,通过tag来指定哪台机器对应哪个分库。示例代码见[example/partition_echo_c++](https://github.com/brpc/brpc/tree/master/example/partition_echo_c++/)
gejun's avatar
gejun committed
247

gejun's avatar
gejun committed
248
ParititonChannel只能处理一种分库方法,当用户需要多种分库方法共存,或从一个分库方法平滑地切换为另一种分库方法时,可以使用DynamicPartitionChannel,它会根据不同的分库方式动态地建立对应的sub PartitionChannel,并根据容量把请求分配给不同的分库。示例代码见[example/dynamic_partition_echo_c++](https://github.com/brpc/brpc/tree/master/example/dynamic_partition_echo_c++/)
gejun's avatar
gejun committed
249

gejun's avatar
gejun committed
250
如果分库在不同的名字服务内,那么用户得自行用ParallelChannel组装,即每个sub channel对应一个分库(使用不同的名字服务)。ParellelChannel的使用方法见[上面](#ParallelChannel)
gejun's avatar
gejun committed
251 252 253 254 255 256

## 使用PartitionChannel

首先定制PartitionParser。这个例子中tag的形式是N/M,N代表分库的index,M是分库的个数。比如0/3代表一共3个分库,这是第一个。

```c++
gejun's avatar
gejun committed
257
#include <brpc/partition_channel.h>
gejun's avatar
gejun committed
258
...
259
class MyPartitionParser : public brpc::PartitionParser {
gejun's avatar
gejun committed
260
public:
261
    bool ParseFromTag(const std::string& tag, brpc::Partition* out) {
gejun's avatar
gejun committed
262 263 264 265 266 267 268 269 270
        // "N/M" : #N partition of M partitions.
        size_t pos = tag.find_first_of('/');
        if (pos == std::string::npos) {
            LOG(ERROR) << "Invalid tag=" << tag;
            return false;
        }
        char* endptr = NULL;
        out->index = strtol(tag.c_str(), &endptr, 10);
        if (endptr != tag.data() + pos) {
gejun's avatar
gejun committed
271
            LOG(ERROR) << "Invalid index=" << butil::StringPiece(tag.data(), pos);
gejun's avatar
gejun committed
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286
            return false;
        }
        out->num_partition_kinds = strtol(tag.c_str() + pos + 1, &endptr, 10);
        if (endptr != tag.c_str() + tag.size()) {
            LOG(ERROR) << "Invalid num=" << tag.data() + pos + 1;
            return false;
        }
        return true;
    }
};
```

然后初始化PartitionChannel。

```c++
gejun's avatar
gejun committed
287
#include <brpc/partition_channel.h>
gejun's avatar
gejun committed
288
...
289
brpc::PartitionChannel channel;
gejun's avatar
gejun committed
290
 
291
brpc::PartitionChannelOptions options;
gejun's avatar
gejun committed
292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307
options.protocol = ...;   // PartitionChannelOptions继承了ChannelOptions,后者有的前者也有
options.timeout_ms = ...; // 同上
options.fail_limit = 1;   // PartitionChannel自己的选项,意思同ParalellChannel中的fail_limit。这里为1的意思是只要有1个分库访问失败,这次RPC就失败了。
 
if (channel.Init(num_partition_kinds, new MyPartitionParser(),
                 server_address, load_balancer, &options) != 0) {
    LOG(ERROR) << "Fail to init PartitionChannel";
    return -1;
}
// 访问方法和普通Channel是一样的
```

## 使用DynamicPartitionChannel

DynamicPartitionChannel的使用方法和PartitionChannel基本上是一样的,先定制PartitionParser再初始化,但Init时不需要num_partition_kinds,因为DynamicPartitionChannel会为不同的分库方法动态建立不同的sub PartitionChannel。

gejun's avatar
gejun committed
308
下面演示一下使用DynamicPartitionChannel平滑地从3库变成4库。
gejun's avatar
gejun committed
309

gejun's avatar
gejun committed
310
首先分别在8004, 8005, 8006端口启动三个server。
gejun's avatar
gejun committed
311 312 313 314 315 316 317 318 319 320 321

```
$ ./echo_server -server_num 3
TRACE: 09-06 10:40:39:   * 0 server.cpp:159] EchoServer is serving on port=8004
TRACE: 09-06 10:40:39:   * 0 server.cpp:159] EchoServer is serving on port=8005
TRACE: 09-06 10:40:39:   * 0 server.cpp:159] EchoServer is serving on port=8006
TRACE: 09-06 10:40:40:   * 0 server.cpp:192] S[0]=0 S[1]=0 S[2]=0 [total=0]
TRACE: 09-06 10:40:41:   * 0 server.cpp:192] S[0]=0 S[1]=0 S[2]=0 [total=0]
TRACE: 09-06 10:40:42:   * 0 server.cpp:192] S[0]=0 S[1]=0 S[2]=0 [total=0]
```

gejun's avatar
gejun committed
322 323 324
启动后每个Server每秒会打印上一秒收到的流量,目前都是0。

在本地启动使用DynamicPartitionChannel的Client,初始化代码如下:
gejun's avatar
gejun committed
325 326

```c++
gejun's avatar
gejun committed
327
    ...
328 329
    brpc::DynamicPartitionChannel channel;
    brpc::PartitionChannelOptions options;
gejun's avatar
gejun committed
330 331
    // 访问任何分库失败都认为RPC失败。调大这个数值可以使访问更宽松,比如等于2的话表示至少两个分库失败才算失败。
    options.fail_limit = 1;
gejun's avatar
gejun committed
332 333 334 335
    if (channel.Init(new MyPartitionParser(), "file://server_list", "rr", &options) != 0) {
        LOG(ERROR) << "Fail to init channel";
        return -1;
    }
gejun's avatar
gejun committed
336
    ...
gejun's avatar
gejun committed
337 338 339 340 341 342 343 344 345 346 347 348 349
```

名字服务"file://server_list"的内容是:
```
0.0.0.0:8004  0/3  # 表示3分库中的第一个分库,其他依次类推
0.0.0.0:8004  1/3
0.0.0.0:8004  2/3
```

3分库方案的3个库都在8004端口对应的server上。启动Client后Client发现了8004,并向其发送流量。

```
$ ./echo_client            
350 351
TRACE: 09-06 10:51:10:   * 0 src/brpc/policy/file_naming_service.cpp:83] Got 3 unique addresses from `server_list'
TRACE: 09-06 10:51:10:   * 0 src/brpc/socket.cpp:779] Connected to 0.0.0.0:8004 via fd=3 SocketId=0 self_port=46544
gejun's avatar
gejun committed
352 353 354 355 356 357 358 359 360 361 362 363 364
TRACE: 09-06 10:51:11:   * 0 client.cpp:226] Sending EchoRequest at qps=132472 latency=371
TRACE: 09-06 10:51:12:   * 0 client.cpp:226] Sending EchoRequest at qps=132658 latency=370
TRACE: 09-06 10:51:13:   * 0 client.cpp:226] Sending EchoRequest at qps=133208 latency=369
```

同时Server端收到了3倍的流量:因为访问一次Client端要访问三次8004,分别对应每个分库。

```
TRACE: 09-06 10:51:11:   * 0 server.cpp:192] S[0]=398866 S[1]=0 S[2]=0 [total=398866]
TRACE: 09-06 10:51:12:   * 0 server.cpp:192] S[0]=398117 S[1]=0 S[2]=0 [total=398117]
TRACE: 09-06 10:51:13:   * 0 server.cpp:192] S[0]=398873 S[1]=0 S[2]=0 [total=398873]
```

gejun's avatar
gejun committed
365
开始修改分库,在server_list中加入4分库的8005:
gejun's avatar
gejun committed
366 367 368 369 370 371 372 373 374 375 376 377 378 379 380

```
0.0.0.0:8004  0/3
0.0.0.0:8004  1/3   
0.0.0.0:8004  2/3 
 
0.0.0.0:8005  0/4        
0.0.0.0:8005  1/4        
0.0.0.0:8005  2/4        
0.0.0.0:8005  3/4
```

观察Client和Server的输出变化。Client端发现了server_list的变化并重新载入,但qps并没有什么变化。

```
381 382
TRACE: 09-06 10:57:10:   * 0 src/brpc/policy/file_naming_service.cpp:83] Got 7 unique addresses from `server_list'
TRACE: 09-06 10:57:10:   * 0 src/brpc/socket.cpp:779] Connected to 0.0.0.0:8005 via fd=7 SocketId=768 self_port=39171
gejun's avatar
gejun committed
383 384 385 386 387 388 389
TRACE: 09-06 10:57:11:   * 0 client.cpp:226] Sending EchoRequest at qps=135346 latency=363
TRACE: 09-06 10:57:12:   * 0 client.cpp:226] Sending EchoRequest at qps=134201 latency=366
TRACE: 09-06 10:57:13:   * 0 client.cpp:226] Sending EchoRequest at qps=137627 latency=356
TRACE: 09-06 10:57:14:   * 0 client.cpp:226] Sending EchoRequest at qps=136775 latency=359
TRACE: 09-06 10:57:15:   * 0 client.cpp:226] Sending EchoRequest at qps=139043 latency=353
```

gejun's avatar
gejun committed
390
server端的变化比较大。8005收到了流量,并且和8004的流量比例关系约为4:3。
gejun's avatar
gejun committed
391 392 393 394 395 396 397 398 399 400 401

```
TRACE: 09-06 10:57:09:   * 0 server.cpp:192] S[0]=398597 S[1]=0 S[2]=0 [total=398597]
TRACE: 09-06 10:57:10:   * 0 server.cpp:192] S[0]=392839 S[1]=0 S[2]=0 [total=392839]
TRACE: 09-06 10:57:11:   * 0 server.cpp:192] S[0]=334704 S[1]=83219 S[2]=0 [total=417923]
TRACE: 09-06 10:57:12:   * 0 server.cpp:192] S[0]=206215 S[1]=273873 S[2]=0 [total=480088]
TRACE: 09-06 10:57:13:   * 0 server.cpp:192] S[0]=204520 S[1]=270483 S[2]=0 [total=475003]
TRACE: 09-06 10:57:14:   * 0 server.cpp:192] S[0]=207055 S[1]=273725 S[2]=0 [total=480780]
TRACE: 09-06 10:57:15:   * 0 server.cpp:192] S[0]=208453 S[1]=276803 S[2]=0 [total=485256]
```

gejun's avatar
gejun committed
402
一次RPC要访问三次8004或四次8005,8004和8005流量比是3:4,说明Client以1:1的比例访问了3分库和4分库。这个比例关系取决于其容量。容量的计算是递归的:
gejun's avatar
gejun committed
403

gejun's avatar
gejun committed
404
- 普通Channel的容量等于它其中所有server的容量之和。如果名字服务没有配置权值,单个server的容量为1。
gejun's avatar
gejun committed
405 406 407 408
- ParallelChannel或PartitionChannel的容量等于它其中Sub Channel容量的最小值。
- SelectiveChannel的容量等于它其中Sub Channel的容量之和。
- DynamicPartitionChannel的容量等于它其中Sub PartitionChannel的容量之和。

gejun's avatar
gejun committed
409
在这儿的场景中,3分库和4分库的容量都是1,因为所有的3库都在8004一台server上,所有的4库都在8005一台server上。
gejun's avatar
gejun committed
410

gejun's avatar
gejun committed
411
在4分库方案加入加入8006端口的server:
gejun's avatar
gejun committed
412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431

```
0.0.0.0:8004  0/3
0.0.0.0:8004  1/3   
0.0.0.0:8004  2/3
 
0.0.0.0:8005  0/4   
0.0.0.0:8005  1/4   
0.0.0.0:8005  2/4   
0.0.0.0:8005  3/4    
 
0.0.0.0:8006 0/4
0.0.0.0:8006 1/4
0.0.0.0:8006 2/4
0.0.0.0:8006 3/4
```

Client的变化仍旧不大:

```
432 433
TRACE: 09-06 11:11:51:   * 0 src/brpc/policy/file_naming_service.cpp:83] Got 11 unique addresses from `server_list'
TRACE: 09-06 11:11:51:   * 0 src/brpc/socket.cpp:779] Connected to 0.0.0.0:8006 via fd=8 SocketId=1280 self_port=40759
gejun's avatar
gejun committed
434 435 436 437 438 439 440 441 442 443 444 445 446 447 448
TRACE: 09-06 11:11:51:   * 0 client.cpp:226] Sending EchoRequest at qps=131799 latency=372
TRACE: 09-06 11:11:52:   * 0 client.cpp:226] Sending EchoRequest at qps=136217 latency=361
TRACE: 09-06 11:11:53:   * 0 client.cpp:226] Sending EchoRequest at qps=133531 latency=368
TRACE: 09-06 11:11:54:   * 0 client.cpp:226] Sending EchoRequest at qps=136072 latency=361
```

Server端可以看到8006收到了流量。三台server的流量比例约为3:4:4。这是因为3分库的容量仍为1,而4分库由于8006的加入变成了2。3分库和4分库的流量比例是3:8。4分库中的每个分库在8005和8006上都有实例,同一个分库的不同实例使用round robin分流,所以8005和8006平摊了流量。最后的效果就是3:4:4。

```
TRACE: 09-06 11:11:51:   * 0 server.cpp:192] S[0]=199625 S[1]=263226 S[2]=0 [total=462851]
TRACE: 09-06 11:11:52:   * 0 server.cpp:192] S[0]=143248 S[1]=190717 S[2]=159756 [total=493721]
TRACE: 09-06 11:11:53:   * 0 server.cpp:192] S[0]=133003 S[1]=178328 S[2]=178325 [total=489656]
TRACE: 09-06 11:11:54:   * 0 server.cpp:192] S[0]=135534 S[1]=180386 S[2]=180333 [total=496253]
```

gejun's avatar
gejun committed
449
尝试去掉3分库中的一个分库: (你可以在file://server_list中使用#注释一行)
gejun's avatar
gejun committed
450 451

```
gejun's avatar
gejun committed
452
 0.0.0.0:8004  0/3
gejun's avatar
gejun committed
453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469
 0.0.0.0:8004  1/3
#0.0.0.0:8004  2/3
 
 0.0.0.0:8005  0/4   
 0.0.0.0:8005  1/4   
 0.0.0.0:8005  2/4   
 0.0.0.0:8005  3/4    
 
 0.0.0.0:8006 0/4
 0.0.0.0:8006 1/4
 0.0.0.0:8006 2/4
 0.0.0.0:8006 3/4
```

Client端发现了这点。

```
470
TRACE: 09-06 11:17:47:   * 0 src/brpc/policy/file_naming_service.cpp:83] Got 10 unique addresses from `server_list'
gejun's avatar
gejun committed
471 472 473 474 475 476
TRACE: 09-06 11:17:47:   * 0 client.cpp:226] Sending EchoRequest at qps=131653 latency=373
TRACE: 09-06 11:17:48:   * 0 client.cpp:226] Sending EchoRequest at qps=120560 latency=407
TRACE: 09-06 11:17:49:   * 0 client.cpp:226] Sending EchoRequest at qps=124100 latency=395
TRACE: 09-06 11:17:50:   * 0 client.cpp:226] Sending EchoRequest at qps=123743 latency=397
```

gejun's avatar
gejun committed
477
Server端更明显,8004很快没有了流量。这是因为去掉的分库已经是3分库中最后的2/3分库,去掉后3分库的容量变为了0,导致8004分不到任何流量了。
gejun's avatar
gejun committed
478 479 480 481 482 483 484 485

```
TRACE: 09-06 11:17:47:   * 0 server.cpp:192] S[0]=130864 S[1]=174499 S[2]=174548 [total=479911]
TRACE: 09-06 11:17:48:   * 0 server.cpp:192] S[0]=20063 S[1]=230027 S[2]=230098 [total=480188]
TRACE: 09-06 11:17:49:   * 0 server.cpp:192] S[0]=0 S[1]=245961 S[2]=245888 [total=491849]
TRACE: 09-06 11:17:50:   * 0 server.cpp:192] S[0]=0 S[1]=250198 S[2]=250150 [total=500348]
```

gejun's avatar
gejun committed
486
在真实的线上环境中,我们会逐渐地增加4分库的server,同时下掉3分库中的server。DynamicParititonChannel会按照每种分库方式的容量动态切分流量。当某个时刻3分库的容量变为0时,我们便平滑地把Server从3分库变为了4分库,同时并没有修改Client的代码。