

最近在写一个后台中间件的原型,主要是做消息的分发和透传。因为要用Java实现,所以网络通信框架的第一选择当然就是Netty了,使用的是Netty 4版本。Netty果然效率很高,不用做太多努力就能达到一个比较高的tps。但使用过程中也碰到了一些问题,个人觉得都是比较经典而在网上又不太容易查找到相关资料的问题,所以在此总结一下。

1.Context Switch过高

压测时用nmon监控内核,发现Context Switch高达30w+。这明显不正常,但JVM能有什么导致Context Switch。参考之前整理过的恐龙书《Operating System Concept》的读书笔记《进程调度》和Wiki上的Context Switch介绍,进程/线程发生上下文切换的原因有:



之前的读书笔记里总结的是进程的上下文切换原因,那线程的上下文切换又有什么不同呢?在StackOverflow上果然找到了提问thread context switch vs process context switch:

“The main distinction between a thread switch and a process switch is that during a thread switch, the virtual memory space remains the same, while it does not during a process switch. Both types involve handing control over to the operating system kernel to perform the context switch. The process of switching in and out of the OS kernel along with the cost of switching out the registers is the largest fixed cost of performing a context switch. A more fuzzy cost is that a context switch messes with the processors cacheing mechanisms. Basically, when you context switch, all of the memory addresses that the processor “remembers” in it’s cache effectively become useless. The one big distinction here is that when you change virtual memory spaces, the processor’s Translation Lookaside Buffer (TLB) or equivalent gets flushed making memory accesses much more expensive for a while. This does not happen during a thread switch.”

通过排名第一的大牛的解答了解到,进程和线程的上下文切换都涉及进出系统内核和寄存器的保存和还原,这是它们的最大开销。但与进程的上下文切换相比,线程还是要轻量一些,最大的区别是线程上下文切换时虚拟内存地址保持不变,所以像TLB等CPU缓存不会失效。但要注意的是另一份提问What is the overhead of a context-switch?的中提到了:Intel和AMD在2008年引入的技术可能会使TLB不失效。感兴趣的话请自行研究吧。

1.1 非阻塞I/O


这里在说一下如何主动的向Netty的Channel写入数据,因为网络上搜到的资料都是千篇一律:服务端就是接到请求后在Handler中写入返回数据,而客户端的例子竟然也都是在Handler里Channel Active之后发送数据。因为要做消息透传,而且是向下游系统发消息时是异步非阻塞的,网上那种例子根本没法用,所以在这里说一下我的方法吧。



EventLoopGroup group = new NioEventLoopGroup();Bootstrap b = new Bootstrap();b.group(group).channel(NioSocketChannel.class).remoteAddress(host, port).handler(new ChannelInitializer<SocketChannel>() {(SocketChannel ch) throws Exception {ch.pipeline().addLast(…);}});try {ChannelFuture future = b.connect().sync();this.channel = future.channel();}catch (InterruptedException e) {throw new IllegalStateException(“Error when start netty client: addr=[” + addr + “]”, e);}1.2 减少线程数

线程太多的话每个线程得到的时间片就少,CPU要让各个线程都有机会执行就要切换,切换就要不断保存和还原线程的上下文现场。于是检查Netty的I/O worker的EventLoopGroup。之前在《Netty 4源码解析:服务端启动》中曾经分析过,EventLoopGroup默认的线程数是CPU核数的二倍。所以手动配置NioEventLoopGroup的线程数,减少一些I/O线程。

(int port) throws InterruptedException {EventLoopGroup bossGroup = new NioEventLoopGroup();EventLoopGroup workerGroup = new NioEventLoopGroup(4);try {ServerBootstrap b = new ServerBootstrap().group(bossGroup, workerGroup).channel(NioServerSocketChannel.class).localAddress(port).childHandler(new ChannelInitializer<SocketChannel>() {(SocketChannel ch) throws Exception {ch.pipeline().addLast(…);}});// Bind and start to accept incoming connections.ChannelFuture f = b.bind(port).sync();// Wait until the server socket is closed.f.channel().closeFuture().sync();} finally {bossGroup.shutdownGracefully();workerGroup.shutdownGracefully();}}


my-dispatcher { # Dispatcher is the name of the event-based dispatcher type = Dispatcher mailbox-type = “akka.dispatch.SingleConsumerOnlyUnboundedMailbox” # What kind of ExecutionService to use executor = “fork-join-executor” # Configuration for the fork join pool fork-join-executor {# Min number of threads to cap factor-based parallelism number toparallelism-min = 2# Parallelism (threads) … ceil(available processors * factor)parallelism-factor = 1.0# Max number of threads to cap factor-based parallelism number toparallelism-max = 16 } # Throughput defines the maximum number of messages to be # processed per actor before the thread jumps to the next actor. # Set fair as possible. throughput = 100}





