缘起

最近浏览<<Go微服务实战>> (刘金亮, 2021.1)
本系列笔记拟采纳golang练习之

案例需要(聊天服务器)

  • 用户能够连贯到服务器。
  • 用户能够设定本人的用户名。
  • 用户能够向服务器发送音讯,同时服务器也会向其余用户播送该音讯。

指标(Day 3)

  • 在Day 2中, 已根本实现功能性编码, 测试也通过了, 明天尝试压测, 并诊断是否存在内存和routine透露

过程

  • 批改单元测试代码, 并发客户端减少到50个, 每个客户端随机1-3秒发送一条音讯
  • 增加ILoggingService, 以更不便的采集全局日志
  • 应用pprof包的api, 采集heap和goroutine的诊断日志
  • 应用go tool pprof <诊断日志>, 察看是否存在内存和routine透露
  • 发现单元测试总是failed, 总有若干tChatClient, 服务端未打印敞开日志
  • 查看routine诊断日志, 发现存在routine透露
  • 在tChatClient中增加更具体的函数调用日志, 发现是closeChan通道写入阻塞导致
  • 修复routine透露的bug, 加大并发客户端到500个, 并进行多轮测试, 察看是否存在内存和routine透露

单元测试

  • ChatServer_test.go
  • 增大客户端并发数, 并增加pprof诊断
  • 诊断ILoggingService.AllLogs(), 判断所有客户端是否实现连贯/断开的闭环
package chat_serverimport (    "fmt"    cs "learning/gooop/chat_server"    "math/rand"    "os"    "runtime"    "runtime/pprof"    "strings"    "sync"    "testing"    "time")func Test_ChatServer(t *testing.T) {    fnAssertTrue := func(b bool, msg string) {        if !b {            t.Fatal(msg)        }    }    // create dump file    memProfile, err := os.Create("/home/ioly/chat_server_mem.profile")    if err != nil {        t.Fatal(err)    }    cpuProfile, err := os.Create("/home/ioly/chat_server_cpu.profile")    if err != nil {        t.Fatal(err)    }    rtnProfile, err := os.Create("/home/ioly/chat_server_routine.profile")    if err != nil {        t.Fatal(err)    }    err = pprof.StartCPUProfile(cpuProfile)    if err != nil {        t.Fatal(err)    }    defer pprof.StopCPUProfile()    port := 3333    server := cs.NewChatServer()    err = server.Open(port)    if err != nil {        t.Fatal(err)    }    clientCount := 500    wg := &sync.WaitGroup{}    rnd := rand.New(rand.NewSource(time.Now().UnixNano()))    address := fmt.Sprintf("localhost:%v", port)    for i := 0;i < clientCount;i++ {        err, client := cs.DialChatClient(address)        if err != nil {            t.Fatal(err)        }        id := fmt.Sprintf("c%02d", i)        //client.RecvHandler(func(client cs.IChatClient, msg cs.IMsg) {        //    t.Logf("%v recv: %v\n", id, msg)        //})        wg.Add(1)        go func() {            client.SetName(id)            client.Send(&cs.NameMsg{id })            n := 0            duration := rnd.Intn(3) + 1            for range time.Tick(time.Duration(duration) * time.Second) {                client.Send(&cs.ChatMsg{id, fmt.Sprintf("msg %02d from %v", n, id) })                n++                if n > 10 {                    break                }            }            client.Close()            wg.Done()        }()    }    // wait and set status    done := []bool{false}    go func() {        wg.Wait()        done[0] = true    }()    // print passed seconds    passedSeconds := 0    for range time.Tick(time.Second) {        passedSeconds++        t.Logf("%v seconds passed", passedSeconds)        if done[0] {            break        }    }    time.Sleep(5*time.Second)    server.Close()    // dump heap file    runtime.GC()    err = pprof.Lookup("heap").WriteTo(memProfile, 0)    if err != nil {        t.Fatal(err)    }    err = pprof.Lookup("goroutine").WriteTo(rtnProfile, 0)    if err != nil {        t.Fatal(err)    }    // check server logs    logs := cs.Logging.AllLogs()    fnHasLog := func(log string) bool {        for _,it := range logs {            if strings.Contains(it, log) {                return true            }        }        return false    }    for i := 0;i < clientCount;i++ {        msg := fmt.Sprintf("tChatClient.postConnClosed, c%02d, serverFlag=false", i)        fnAssertTrue(fnHasLog(msg), "expecting log: " + msg)        msg = fmt.Sprintf("tChatServer.handleIncomingConn, clientCount=%v", i + 1)        fnAssertTrue(fnHasLog(msg), "expecting log: " + msg)        msg = fmt.Sprintf("tChatClient.postConnClosed, c%02d, serverFlag=true", i)        fnAssertTrue(fnHasLog(msg), "expecting log: " + msg)        msg = fmt.Sprintf("tChatServer.handleClientClosed, c%02d", i)        fnAssertTrue(fnHasLog(msg), "expecting log: " + msg)    }}

测试输入

并发500, 仅开端局部

tChatClient.beginWrite, write error, c496, serverFlag=truetChatClient.beginWrite, <- closeChan, c496, serverFlag=truetChatClient.postConnClosed, c496, serverFlag=truetChatServer.handleClientClosed, c496tChatServer.handleClientClosed, c496, clientCount=1tChatClient.closeConn, c498, serverFlag=falsetChatClient.beginWrite, <- closeChan, c498, serverFlag=falsetChatClient.beginRead, read error, c498, serverFlag=falsetChatClient.postConnClosed, c498, serverFlag=falsetChatClient.beginRead, read error, c498, serverFlag=truetChatClient.closeConn, c498, serverFlag=truetChatClient.beginWrite, <- closeChan, c498, serverFlag=truetChatClient.postConnClosed, c498, serverFlag=truetChatServer.handleClientClosed, c498tChatServer.handleClientClosed, c498, clientCount=0    ChatServer_test.go:97: 34 seconds passed--- PASS: Test_ChatServer (39.50s)PASSok      command-line-arguments  39.535s

内存透露诊断

无泄漏

$ go tool pprof ~/chat_server_mem.profile File: chat_server.testType: inuse_spaceTime: Mar 9, 2021 at 8:56am (CST)Entering interactive mode (type "help" for commands, "o" for options)(pprof) topShowing nodes accounting for 8461.35kB, 100% of 8461.35kB totalShowing top 10 nodes out of 15      flat  flat%   sum%        cum   cum% 5720.44kB 67.61% 67.61%  5720.44kB 67.61%  time.startTimer 1184.27kB 14.00% 81.60%  1184.27kB 14.00%  runtime/pprof.StartCPUProfile 1024.38kB 12.11% 93.71%  1024.38kB 12.11%  runtime.malg  532.26kB  6.29%   100%   532.26kB  6.29%  runtime/pprof.(*profMap).lookup         0     0%   100%  1184.27kB 14.00%  command-line-arguments.Test_ChatServer         0     0%   100%  5720.44kB 67.61%  learning/gooop/chat_server.(*tChatClient).beginWrite         0     0%   100%  1024.38kB 12.11%  runtime.mstart         0     0%   100%  1024.38kB 12.11%  runtime.newproc.func1         0     0%   100%  1024.38kB 12.11%  runtime.newproc1         0     0%   100%  1024.38kB 12.11%  runtime.systemstack(pprof) quit

routine透露诊断(修复前)

有透露时的诊断日志(并发50)

$ go tool pprof ~/chat_server_routine.profile File: chat_server.testType: goroutineTime: Mar 9, 2021 at 9:12am (CST)Entering interactive mode (type "help" for commands, "o" for options)(pprof) topShowing nodes accounting for 4, 100% of 4 totalShowing top 10 nodes out of 21      flat  flat%   sum%        cum   cum%         3 75.00% 75.00%          3 75.00%  runtime.gopark         1 25.00%   100%          1 25.00%  runtime/pprof.runtime_goroutineProfileWithLabels         0     0%   100%          1 25.00%  command-line-arguments.Test_ChatServer         0     0%   100%          1 25.00%  learning/gooop/chat_server.(*tChatClient).beginWrite         0     0%   100%          1 25.00%  learning/gooop/chat_server.(*tChatClient).closeConn         0     0%   100%          1 25.00%  main.main         0     0%   100%          1 25.00%  runtime.chanrecv         0     0%   100%          1 25.00%  runtime.chanrecv1         0     0%   100%          1 25.00%  runtime.chansend         0     0%   100%          1 25.00%  runtime.chansend1(pprof) 

能够看到closeConn函数阻塞

func (me *tChatClient) closeConn() {    if !atomic.CompareAndSwapInt32(&me.closeFlag, 0, 1) {        return    }    Logging.Logf("tChatClient.closeConn, %v, serverFlag=%v", me.name, me.serverFlag)    me.closeChan <- true}

从代码看, closeConn函数阻塞只有一种可能, 就是closeChan阻塞.
这个起因有点诡异, 因为曾经加了atomic.CompareAndSwapInt32爱护,
依然导致阻塞, 惟一起因是并发环境下, closeChan尚未开始读取, 是无奈写入的.

修复routine透露

就是给导致写阻塞的closeChan加上缓冲, 大小等于closeConn函数的可能调用入口数量

func openChatClient(conn net.Conn, serverFlag bool) IChatClient {    it := &tChatClient{        conn: conn,        openFlag: 0,        closeFlag: 0,        serverFlag: serverFlag,        closeChan: make(chan bool, 3),    // 给closeChan增加缓冲区!!        sendChan: make(chan IMsg, gMaxPendingSend),        name: "anonymous",        sendLogs: []IMsg{},        dropLogs: []IMsg{},        recvLogs: []IMsg{},    }    it.open()    return it}

routine透露诊断(修复后)

修复后, 加大并发到500, 未发现显著routine透露(不存在业务代码导致挂起的routine)

$ go tool pprof ~/chat_server_routine.profile File: chat_server.testType: goroutineTime: Mar 9, 2021 at 9:28am (CST)Entering interactive mode (type "help" for commands, "o" for options)(pprof) topShowing nodes accounting for 3, 100% of 3 totalShowing top 10 nodes out of 17      flat  flat%   sum%        cum   cum%         2 66.67% 66.67%          2 66.67%  runtime.gopark         1 33.33%   100%          1 33.33%  runtime/pprof.runtime_goroutineProfileWithLabels         0     0%   100%          1 33.33%  command-line-arguments.Test_ChatServer         0     0%   100%          1 33.33%  main.main         0     0%   100%          1 33.33%  runtime.chanrecv         0     0%   100%          1 33.33%  runtime.chanrecv1         0     0%   100%          1 33.33%  runtime.main         0     0%   100%          1 33.33%  runtime/pprof.(*Profile).WriteTo         0     0%   100%          1 33.33%  runtime/pprof.profileWriter         0     0%   100%          1 33.33%  runtime/pprof.writeGoroutine(pprof) 

(end)