Brad Fitzpatrick在YAPC Asia 2015(Yet Another Perl Conference)上做了一次技术分享,题为:"Go Debugging, Profiling, and Optimization"。个人感觉这篇分享中价值最大的是BradFitz现场演示的一个有关如何对Go程序进行调试、分析和优化的 Demo,Brad将demo上传到了他个人在github.com的repo中,但不知为何,repo中的代码似乎与repo里talk.md中的说明不甚一致(btw,我并没有看video)。于 是打算在这里按照Brad的思路重新走一遍demo的演示流程(所有演示代码在这里可以下载到)。
一、实验环境
$uname -a
Linux pc-tony 3.13.0-61-generic #100~precise1-Ubuntu SMP Wed Jul 29 12:06:40 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
注意:在Darwin或Windows下,profile的结果可能与这里有很大不同(甚至完全不一样的输出和瓶颈热点)。
$go version
go version go1.5 linux/amd64
$ go env
GOARCH="amd64"
GOBIN="/home1/tonybai/.bin/go15/bin"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home1/tonybai/proj/GoProjects"
GORACE=""
GOROOT="/home1/tonybai/.bin/go15"
GOTOOLDIR="/home1/tonybai/.bin/go15/pkg/tool/linux_amd64"
GO15VENDOREXPERIMENT="1"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"
代码基于Brad的github.com/bradfitz/talk-yapc-asia-2015。
二、待优化程序(step0)
待优化程序,也就是原始程序,我们放在step0中:
//go-debug-profile-optimization/step0/demo.go
package main
import (
"fmt"
"log"
"net/http"
"regexp"
)
var visitors int
func handleHi(w http.ResponseWriter, r *http.Request) {
if match, _ := regexp.MatchString(`^\w*$`, r.FormValue("color")); !match {
http.Error(w, "Optional color is invalid", http.StatusBadRequest)
return
}
visitors++
w.Header().Set("Content-Type", "text/html; charset=utf-8")
w.Write([]byte("<h1 style='color: " + r.FormValue("color") +
"'>Welcome!</h1>You are visitor number " + fmt.Sprint(visitors) + "!"))
}
func main() {
log.Printf("Starting on port 8080")
http.HandleFunc("/hi", handleHi)
log.Fatal(http.ListenAndServe("127.0.0.1:8080", nil))
}
$go run demo.go
2015/08/25 09:42:35 Starting on port 8080
在浏览器输入:http://localhost:8080/hi
一切顺利的话,页面会显示:
Welcome!
You are visitor number 1!
三、添加测试代码
按照talk.md中的说明,brad repo中demo中根本没有测试代码(commit 2427d0faa12ed1fb05f1e6a1e69307c11259c2b2)。
于是我根据作者的意图,新增了demo_test.go,采用TestHandleHi_Recorder和TestHandleHi_TestServer对HandleHi进行测试:
//go-debug-profile-optimization/step0/demo_test.go
package main
import (
"bufio"
"net/http"
"net/http/httptest"
"strings"
"testing"
)
func TestHandleHi_Recorder(t *testing.T) {
rw := httptest.NewRecorder()
handleHi(rw, req(t, "GET / HTTP/1.0\r\n\r\n"))
if !strings.Contains(rw.Body.String(), "visitor number") {
t.Errorf("Unexpected output: %s", rw.Body)
}
}
func req(t *testing.T, v string) *http.Request {
req, err := http.ReadRequest(bufio.NewReader(strings.NewReader(v)))
if err != nil {
t.Fatal(err)
}
return req
}
func TestHandleHi_TestServer(t *testing.T) {
ts := httptest.NewServer(http.HandlerFunc(handleHi))
defer ts.Close()
res, err := http.Get(ts.URL)
if err != nil {
t.Error(err)
return
}
if g, w := res.Header.Get("Content-Type"), "text/html; charset=utf-8"; g != w {
t.Errorf("Content-Type = %q; want %q", g, w)
}
slurp, err := ioutil.ReadAll(res.Body)
defer res.Body.Close()
if err != nil {
t.Error(err)
return
}
t.Logf("Got: %s", slurp)
}
$ go test -v
=== RUN TestHandleHi_Recorder
— PASS: TestHandleHi_Recorder (0.00s)
=== RUN TestHandleHi_TestServer
— PASS: TestHandleHi_TestServer (0.00s)
demo_test.go:45: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 2!
PASS
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step0 0.007s
测试通过!
至此,step0使命结束。
四、Race Detector(竞态分析)
并发设计使得程序可以更好更有效的利用现代处理器的多核心。但并发设计很容易引入竞态,导致严重bug。Go程序中竞态就是当多个goroutine并发 访问某共享数据且未使用同步机制时,且至少一个goroutine进行了写操作。不过go工具自带race分析功能。在分析优化step0中demo代码 前,我们先要保证demo代码中不存在竞态。
工具的使用方法就是在go test后加上-race标志,在step0目录下:
$ go test -v -race
=== RUN TestHandleHi_Recorder
— PASS: TestHandleHi_Recorder (0.00s)
=== RUN TestHandleHi_TestServer
— PASS: TestHandleHi_TestServer (0.00s)
demo_test.go:45: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 2!
PASS
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step0 1.012s
-race通过做运行时分析做竞态分析,虽然不存在误报,但却存在实际有竞态,但工具没发现的情况。接下来我们改造一下测试代码,让test并发起来:
向step1(copy自step0)中demo_test.go中添加一个test method:
//go-debug-profile-optimization/step1/demo_test.go
… …
func TestHandleHi_TestServer_Parallel(t *testing.T) {
ts := httptest.NewServer(http.HandlerFunc(handleHi))
defer ts.Close()
var wg sync.WaitGroup
for i := 0; i < 2; i++ {
wg.Add(1)
go func() {
defer wg.Done()
res, err := http.Get(ts.URL)
if err != nil {
t.Error(err)
return
}
if g, w := res.Header.Get("Content-Type"), "text/html; charset=utf-8"; g != w {
t.Errorf("Content-Type = %q; want %q", g, w)
}
slurp, err := ioutil.ReadAll(res.Body)
defer res.Body.Close()
if err != nil {
t.Error(err)
return
}
t.Logf("Got: %s", slurp)
}()
}
wg.Wait()
}
… …
执行竞态test:
$ go test -v -race
=== RUN TestHandleHi_Recorder
— PASS: TestHandleHi_Recorder (0.00s)
=== RUN TestHandleHi_TestServer
— PASS: TestHandleHi_TestServer (0.00s)
demo_test.go:46: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 2!
=== RUN TestHandleHi_TestServer_Parallel
==================
WARNING: DATA RACE
Read by goroutine 22:
_/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1.handleHi()
/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1/demo.go:17 +0xf5
net/http.HandlerFunc.ServeHTTP()
/tmp/workdir/go/src/net/http/server.go:1422 +0×47
net/http/httptest.(*waitGroupHandler).ServeHTTP()
/tmp/workdir/go/src/net/http/httptest/server.go:200 +0xfe
net/http.serverHandler.ServeHTTP()
/tmp/workdir/go/src/net/http/server.go:1862 +0×206
net/http.(*conn).serve()
/tmp/workdir/go/src/net/http/server.go:1361 +0x117c
Previous write by goroutine 25:
_/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1.handleHi()
/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1/demo.go:17 +0×111
net/http.HandlerFunc.ServeHTTP()
/tmp/workdir/go/src/net/http/server.go:1422 +0×47
net/http/httptest.(*waitGroupHandler).ServeHTTP()
/tmp/workdir/go/src/net/http/httptest/server.go:200 +0xfe
net/http.serverHandler.ServeHTTP()
/tmp/workdir/go/src/net/http/server.go:1862 +0×206
net/http.(*conn).serve()
/tmp/workdir/go/src/net/http/server.go:1361 +0x117c
Goroutine 22 (running) created at:
net/http.(*Server).Serve()
/tmp/workdir/go/src/net/http/server.go:1910 +0×464
Goroutine 25 (running) created at:
net/http.(*Server).Serve()
/tmp/workdir/go/src/net/http/server.go:1910 +0×464
==================
— PASS: TestHandleHi_TestServer_Parallel (0.00s)
demo_test.go:71: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 3!
demo_test.go:71: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 4!
PASS
Found 1 data race(s)
exit status 66
FAIL _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1 1.023s
工具发现demo.go第17行:
visitors++
是一处潜在的竞态条件。
visitors被多个goroutine访问但未采用同步机制。
既然发现了竞态条件,我们就需要fix it。有多种fix方法可选:
1、使用channel
2、使用Mutex
3、使用atomic
Brad使用了atomic:
//go-debug-profile-optimization/step1/demo.go
… …
var visitors int64 // must be accessed atomically
func handleHi(w http.ResponseWriter, r *http.Request) {
if match, _ := regexp.MatchString(`^\w*$`, r.FormValue("color")); !match {
http.Error(w, "Optional color is invalid", http.StatusBadRequest)
return
}
visitNum := atomic.AddInt64(&visitors, 1)
w.Header().Set("Content-Type", "text/html; charset=utf-8")
w.Write([]byte("<h1 style='color: " + r.FormValue("color") +
"'>Welcome!</h1>You are visitor number " + fmt.Sprint(visitNum) + "!"))
}
… …
再做一次测试:
$ go test -v -race
=== RUN TestHandleHi_Recorder
— PASS: TestHandleHi_Recorder (0.00s)
=== RUN TestHandleHi_TestServer
— PASS: TestHandleHi_TestServer (0.00s)
demo_test.go:46: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 2!
=== RUN TestHandleHi_TestServer_Parallel
— PASS: TestHandleHi_TestServer_Parallel (0.00s)
demo_test.go:71: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 3!
demo_test.go:71: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 4!
PASS
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1 1.020s
竞态条件被消除了!
至此,step1结束了使命!
五、CPU Profiling
要做CPU Profilling,我们需要benchmark数据,Go test提供benchmark test功能,我们只要写对应的Benchmark测试方法即可:
//go-debug-profile-optimization/step2/demo_test.go
… …
func BenchmarkHi(b *testing.B) {
b.ReportAllocs()
req, err := http.ReadRequest(bufio.NewReader(strings.NewReader("GET / HTTP/1.0\r\n\r\n")))
if err != nil {
b.Fatal(err)
}
for i := 0; i < b.N; i++ {
rw := httptest.NewRecorder()
handleHi(rw, req)
}
}
… …
$ go test -v -run=^$ -bench=.
PASS
BenchmarkHi-4 100000 14808 ns/op 4961 B/op 81 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step2 1.648s
开始CPU Profiling:
$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -cpuprofile=prof.cpu
PASS
BenchmarkHi-4 200000 14679 ns/op 4961 B/op 81 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step2 3.096s
执行完benchmark test后,step2目录下出现两个新文件prof.cpu和step2.test,这两个文件将作为后续go tool pprof的输入:
$ls
demo.go demo_test.go prof.cpu step2.test*
使用go profile viewer工具:
$ go tool pprof step2.test prof.cpu
Entering interactive mode (type "help" for commands)
(pprof) top
1830ms of 3560ms total (51.40%)
Dropped 53 nodes (cum <= 17.80ms)
Showing top 10 nodes out of 133 (cum >= 1290ms)
flat flat% sum% cum cum%
480ms 13.48% 13.48% 980ms 27.53% runtime.growslice
360ms 10.11% 23.60% 700ms 19.66% runtime.mallocgc
170ms 4.78% 28.37% 170ms 4.78% runtime.heapBitsSetType
170ms 4.78% 33.15% 200ms 5.62% runtime.scanblock
120ms 3.37% 36.52% 1100ms 30.90% regexp.makeOnePass.func2
120ms 3.37% 39.89% 550ms 15.45% runtime.newarray
110ms 3.09% 42.98% 300ms 8.43% runtime.makeslice
110ms 3.09% 46.07% 220ms 6.18% runtime.mapassign1
100ms 2.81% 48.88% 100ms 2.81% runtime.futex
90ms 2.53% 51.40% 1290ms 36.24% regexp.makeOnePass
(pprof) top –cum
0.18s of 3.56s total ( 5.06%)
Dropped 53 nodes (cum <= 0.02s)
Showing top 10 nodes out of 133 (cum >= 1.29s)
flat flat% sum% cum cum%
0 0% 0% 3.26s 91.57% runtime.goexit
0.02s 0.56% 0.56% 2.87s 80.62% BenchmarkHi
0 0% 0.56% 2.87s 80.62% testing.(*B).launch
0 0% 0.56% 2.87s 80.62% testing.(*B).runN
0.03s 0.84% 1.40% 2.80s 78.65% step2.handleHi
0.01s 0.28% 1.69% 2.46s 69.10% regexp.MatchString
0 0% 1.69% 2.24s 62.92% regexp.Compile
0 0% 1.69% 2.24s 62.92% regexp.compile
0.03s 0.84% 2.53% 1.56s 43.82% regexp.compileOnePass
0.09s 2.53% 5.06% 1.29s 36.24% regexp.makeOnePass
(pprof) list handleHi
Total: 3.56s
ROUTINE ======================== handleHi in go-debug-profile-optimization/step2/demo.go
30ms 2.80s (flat, cum) 78.65% of Total
. . 9:)
. . 10:
. . 11:var visitors int64 // must be accessed atomically
. . 12:
. . 13:func handleHi(w http.ResponseWriter, r *http.Request) {
. 2.47s 14: if match, _ := regexp.MatchString(`^\w*$`, r.FormValue("color")); !match {
. . 15: http.Error(w, "Optional color is invalid", http.StatusBadRequest)
. . 16: return
. . 17: }
10ms 20ms 18: visitNum := atomic.AddInt64(&visitors, 1)
10ms 90ms 19: w.Header().Set("Content-Type", "text/html; charset=utf-8")
10ms 20ms 20: w.Write([]byte("<h1 style='color: " + r.FormValue("color") +
. 200ms 21: "'>Welcome!</h1>You are visitor number " + fmt.Sprint(visitNum) + "!"))
. . 22:}
. . 23:
. . 24:func main() {
. . 25: log.Printf("Starting on port 8080")
. . 26: http.HandleFunc("/hi", handleHi)
(pprof)
从top –cum来看,handleHi消耗cpu较大,而handleHi中,又是MatchString耗时最长。
六、第一次优化
前面已经发现MatchString较为耗时,优化手段:让正则式仅编译一次(step3):
// go-debug-profile-optimization/step3/demo.go
… …
var visitors int64 // must be accessed atomically
var rxOptionalID = regexp.MustCompile(`^\d*$`)
func handleHi(w http.ResponseWriter, r *http.Request) {
if !rxOptionalID.MatchString(r.FormValue("color")) {
http.Error(w, "Optional color is invalid", http.StatusBadRequest)
return
}
visitNum := atomic.AddInt64(&visitors, 1)
w.Header().Set("Content-Type", "text/html; charset=utf-8")
w.Write([]byte("<h1 style='color: " + r.FormValue("color") +
"'>Welcome!</h1>You are visitor number " + fmt.Sprint(visitNum) + "!"))
}
… …
运行一下bench:
$ go test -bench=.
PASS
BenchmarkHi-4 1000000 1678 ns/op 720 B/op 9 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step3 1.710s
对比之前在step2中运行的bench结果:
$ go test -v -run=^$ -bench=.
PASS
BenchmarkHi-4 100000 14808 ns/op 4961 B/op 81 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step2 1.648s
耗时相同,但优化后的bench运行了100w次,而之前的Bench运行10w次,相当于性能提高10倍。
再看看cpu prof结果:
$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=3s -cpuprofile=prof.cpu
PASS
BenchmarkHi-4 3000000 1640 ns/op 720 B/op 9 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step3 6.540s
$ go tool pprof step3.test prof.cpu
Entering interactive mode (type "help" for commands)
(pprof) top –cum 30
2.74s of 8.07s total (33.95%)
Dropped 72 nodes (cum <= 0.04s)
Showing top 30 nodes out of 103 (cum >= 0.56s)
flat flat% sum% cum cum%
0 0% 0% 7.17s 88.85% runtime.goexit
0.05s 0.62% 0.62% 6.21s 76.95% step3.BenchmarkHi
0 0% 0.62% 6.21s 76.95% testing.(*B).launch
0 0% 0.62% 6.21s 76.95% testing.(*B).runN
0.06s 0.74% 1.36% 4.96s 61.46% step3.handleHi
1.15s 14.25% 15.61% 2.35s 29.12% runtime.mallocgc
0.02s 0.25% 15.86% 1.63s 20.20% runtime.systemstack
0 0% 15.86% 1.53s 18.96% net/http.Header.Set
0.06s 0.74% 16.60% 1.53s 18.96% net/textproto.MIMEHeader.Set
0.09s 1.12% 17.72% 1.22s 15.12% runtime.newobject
0.05s 0.62% 18.34% 1.09s 13.51% fmt.Sprint
0.20s 2.48% 20.82% 1s 12.39% runtime.mapassign1
0 0% 20.82% 0.81s 10.04% runtime.mcall
0.01s 0.12% 20.94% 0.79s 9.79% runtime.schedule
0.05s 0.62% 21.56% 0.76s 9.42% regexp.(*Regexp).MatchString
0.09s 1.12% 22.68% 0.71s 8.80% regexp.(*Regexp).doExecute
0.01s 0.12% 22.80% 0.71s 8.80% runtime.concatstring5
0.20s 2.48% 25.28% 0.70s 8.67% runtime.concatstrings
0 0% 25.28% 0.69s 8.55% runtime.gosweepone
0.05s 0.62% 25.90% 0.69s 8.55% runtime.mSpan_Sweep
0 0% 25.90% 0.68s 8.43% runtime.bgsweep
0.04s 0.5% 26.39% 0.68s 8.43% runtime.newarray
0.01s 0.12% 26.52% 0.67s 8.30% runtime.goschedImpl
0.01s 0.12% 26.64% 0.65s 8.05% runtime.gosched_m
0 0% 26.64% 0.65s 8.05% runtime.gosweepone.func1
0.01s 0.12% 26.77% 0.65s 8.05% runtime.sweepone
0.28s 3.47% 30.24% 0.62s 7.68% runtime.makemap
0.17s 2.11% 32.34% 0.59s 7.31% runtime.heapBitsSweepSpan
0.02s 0.25% 32.59% 0.58s 7.19% fmt.(*pp).doPrint
0.11s 1.36% 33.95% 0.56s 6.94% fmt.(*pp).printArg
handleHi耗时有一定下降。
七、Mem Profiling
在step3目录下执行bench,获取mem分配数据:
$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -memprofile=prof.mem
PASS
BenchmarkHi-4 2000000 1657 ns/op 720 B/op 9 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step3 5.002s
使用pprof工具分析mem:
$ go tool pprof –alloc_space step3.test prof.mem
Entering interactive mode (type "help" for commands)
(pprof) top
2065.91MB of 2067.41MB total (99.93%)
Dropped 14 nodes (cum <= 10.34MB)
flat flat% sum% cum cum%
1076.35MB 52.06% 52.06% 1076.35MB 52.06% net/textproto.MIMEHeader.Set
535.54MB 25.90% 77.97% 2066.91MB 100% step3.BenchmarkHi
406.52MB 19.66% 97.63% 1531.37MB 74.07% step3.handleHi
47.50MB 2.30% 99.93% 48.50MB 2.35% fmt.Sprint
0 0% 99.93% 1076.35MB 52.06% net/http.Header.Set
0 0% 99.93% 2066.91MB 100% runtime.goexit
0 0% 99.93% 2066.91MB 100% testing.(*B).launch
0 0% 99.93% 2066.91MB 100% testing.(*B).runN
(pprof) top -cum
2065.91MB of 2067.41MB total (99.93%)
Dropped 14 nodes (cum <= 10.34MB)
flat flat% sum% cum cum%
535.54MB 25.90% 25.90% 2066.91MB 100% step3.BenchmarkHi
0 0% 25.90% 2066.91MB 100% runtime.goexit
0 0% 25.90% 2066.91MB 100% testing.(*B).launch
0 0% 25.90% 2066.91MB 100% testing.(*B).runN
406.52MB 19.66% 45.57% 1531.37MB 74.07% step3.handleHi
0 0% 45.57% 1076.35MB 52.06% net/http.Header.Set
1076.35MB 52.06% 97.63% 1076.35MB 52.06% net/textproto.MIMEHeader.Set
47.50MB 2.30% 99.93% 48.50MB 2.35% fmt.Sprint
(pprof) list handleHi
Total: 2.02GB
ROUTINE =========step3.handleHi in step3/demo.go
406.52MB 1.50GB (flat, cum) 74.07% of Total
. . 17: http.Error(w, "Optional color is invalid", http.StatusBadRequest)
. . 18: return
. . 19: }
. . 20:
. . 21: visitNum := atomic.AddInt64(&visitors, 1)
. 1.05GB 22: w.Header().Set("Content-Type", "text/html; charset=utf-8")
. . 23: w.Write([]byte("<h1 style='color: " + r.FormValue("color") +
406.52MB 455.02MB 24: "'>Welcome!</h1>You are visitor number " + fmt.Sprint(visitNum) + "!"))
. . 25:}
. . 26:
. . 27:func main() {
. . 28: log.Printf("Starting on port 8080")
. . 29: http.HandleFunc("/hi", handleHi)
(pprof)
可以看到handleHi22、23两行占用了较多内存。
八、第二次优化
第二次优化的方法:
1、删除w.Header().Set这行
2、用fmt.Fprintf替代w.Write
第二次优化的代码在step4目录中:
// go-debug-profile-optimization/step4/demo.go
… …
func handleHi(w http.ResponseWriter, r *http.Request) {
if !rxOptionalID.MatchString(r.FormValue("color")) {
http.Error(w, "Optional color is invalid", http.StatusBadRequest)
return
}
visitNum := atomic.AddInt64(&visitors, 1)
fmt.Fprintf(w, "<html><h1 stype='color: \"%s\"'>Welcome!</h1>You are visitor number %d!", r.FormValue("color"), visitNum)
}
… …
执行一遍pprof:
$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -memprofile=prof.mem
PASS
BenchmarkHi-4 2000000 1428 ns/op 304 B/op 6 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step4 4.343s
$ go tool pprof –alloc_space step4.test prof.mem
Entering interactive mode (type "help" for commands)
(pprof) top
868.06MB of 868.56MB total (99.94%)
Dropped 5 nodes (cum <= 4.34MB)
flat flat% sum% cum cum%
559.54MB 64.42% 64.42% 868.06MB 99.94% step4.BenchmarkHi
219.52MB 25.27% 89.70% 219.52MB 25.27% bytes.makeSlice
89MB 10.25% 99.94% 308.52MB 35.52% step4.handleHi
0 0% 99.94% 219.52MB 25.27% bytes.(*Buffer).Write
0 0% 99.94% 219.52MB 25.27% bytes.(*Buffer).grow
0 0% 99.94% 219.52MB 25.27% fmt.Fprintf
0 0% 99.94% 219.52MB 25.27% net/http/httptest.(*ResponseRecorder).Write
0 0% 99.94% 868.06MB 99.94% runtime.goexit
0 0% 99.94% 868.06MB 99.94% testing.(*B).launch
0 0% 99.94% 868.06MB 99.94% testing.(*B).runN
(pprof) top –cum
868.06MB of 868.56MB total (99.94%)
Dropped 5 nodes (cum <= 4.34MB)
flat flat% sum% cum cum%
559.54MB 64.42% 64.42% 868.06MB 99.94% step4.BenchmarkHi
0 0% 64.42% 868.06MB 99.94% runtime.goexit
0 0% 64.42% 868.06MB 99.94% testing.(*B).launch
0 0% 64.42% 868.06MB 99.94% testing.(*B).runN
89MB 10.25% 74.67% 308.52MB 35.52% step4.handleHi
0 0% 74.67% 219.52MB 25.27% bytes.(*Buffer).Write
0 0% 74.67% 219.52MB 25.27% bytes.(*Buffer).grow
219.52MB 25.27% 99.94% 219.52MB 25.27% bytes.makeSlice
0 0% 99.94% 219.52MB 25.27% fmt.Fprintf
0 0% 99.94% 219.52MB 25.27% net/http/httptest.(*ResponseRecorder).Write
(pprof) list handleHi
Total: 868.56MB
ROUTINE ============ step4.handleHi in step4/demo.go
89MB 308.52MB (flat, cum) 35.52% of Total
. . 17: http.Error(w, "Optional color is invalid", http.StatusBadRequest)
. . 18: return
. . 19: }
. . 20:
. . 21: visitNum := atomic.AddInt64(&visitors, 1)
89MB 308.52MB 22: fmt.Fprintf(w, "<html><h1 stype='color: \"%s\"'>Welcome!</h1>You are visitor number %d!", r.FormValue("color"), visitNum)
. . 23:}
. . 24:
. . 25:func main() {
. . 26: log.Printf("Starting on port 8080")
. . 27: http.HandleFunc("/hi", handleHi)
(pprof)
可以看出内存占用大幅减少。
九、Benchcmp
golang.org/x/tools中有一个工具:benchcmp,可以给出两次bench的结果对比。
github.com/golang/tools是golang.org/x/tools的一个镜像。安装benchcmp步骤:
1、go get -u github.com/golang/tools
2、mkdir -p $GOPATH/src/golang.org/x
3、mv $GOPATH/src/github.com/golang/tools $GOPATH/src/golang.org/x
4、go install golang.org/x/tools/cmd/benchcmp
我们分别在step2、step3和step4下执行如下命令:
$ go-debug-profile-optimization/step2$ go test -bench=. -memprofile=prof.mem | tee mem.2
PASS
BenchmarkHi-4 100000 14786 ns/op 4961 B/op 81 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step2 1.644s
go-debug-profile-optimization/step3$ go test -bench=. -memprofile=prof.mem | tee mem.3
PASS
BenchmarkHi-4 1000000 1662 ns/op 720 B/op 9 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step3 1.694s
go-debug-profile-optimization/step4$ go test -bench=. -memprofile=prof.mem | tee mem.4
PASS
BenchmarkHi-4 1000000 1428 ns/op 304 B/op 6 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step4 1.456s
利用benchcmp工具对比结果(benchcmp old new):
$ benchcmp step3/mem.3 step4/mem.4
benchmark old ns/op new ns/op delta
BenchmarkHi-4 1662 1428 -14.08%
benchmark old allocs new allocs delta
BenchmarkHi-4 9 6 -33.33%
benchmark old bytes new bytes delta
BenchmarkHi-4 720 304 -57.78%
$ benchcmp step2/mem.2 step4/mem.4
benchmark old ns/op new ns/op delta
BenchmarkHi-4 14786 1428 -90.34%
benchmark old allocs new allocs delta
BenchmarkHi-4 81 6 -92.59%
benchmark old bytes new bytes delta
BenchmarkHi-4 4961 304 -93.87%
可以看出优化后,内存分配大幅减少,gc的时间也随之减少。
十、内存来自哪
我们在BenchmarkHi中清理每次handleHi执行后的内存:
//step5/demo_test.go
… …
func BenchmarkHi(b *testing.B) {
b.ReportAllocs()
req, err := http.ReadRequest(bufio.NewReader(strings.NewReader("GET / HTTP/1.0\r\n\r\n")))
if err != nil {
b.Fatal(err)
}
for i := 0; i < b.N; i++ {
rw := httptest.NewRecorder()
handleHi(rw, req)
reset(rw)
}
}
func reset(rw *httptest.ResponseRecorder) {
m := rw.HeaderMap
for k := range m {
delete(m, k)
}
body := rw.Body
body.Reset()
*rw = httptest.ResponseRecorder{
Body: body,
HeaderMap: m,
}
}
… …
$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -memprofile=prof.mem
PASS
BenchmarkHi-4 2000000 1518 ns/op 304 B/op 6 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step5 4.577s
$ go tool pprof –alloc_space step5.test prof.mem
Entering interactive mode (type "help" for commands)
(pprof) top –cum 10
290.52MB of 291.52MB total (99.66%)
Dropped 14 nodes (cum <= 1.46MB)
flat flat% sum% cum cum%
0 0% 0% 291.02MB 99.83% runtime.goexit
179.01MB 61.41% 61.41% 290.52MB 99.66% step5.BenchmarkHi
0 0% 61.41% 290.52MB 99.66% testing.(*B).launch
0 0% 61.41% 290.52MB 99.66% testing.(*B).runN
26.50MB 9.09% 70.50% 111.51MB 38.25% step5.handleHi
0 0% 70.50% 85.01MB 29.16% bytes.(*Buffer).Write
0 0% 70.50% 85.01MB 29.16% bytes.(*Buffer).grow
85.01MB 29.16% 99.66% 85.01MB 29.16% bytes.makeSlice
0 0% 99.66% 85.01MB 29.16% fmt.Fprintf
0 0% 99.66% 85.01MB 29.16% net/http/httptest.(*ResponseRecorder).Write
(pprof) list handleHi
Total: 291.52MB
ROUTINE ======================== _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step5.handleHi in /home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step5/demo.go
26.50MB 111.51MB (flat, cum) 38.25% of Total
. . 17: http.Error(w, "Optional color is invalid", http.StatusBadRequest)
. . 18: return
. . 19: }
. . 20:
. . 21: visitNum := atomic.AddInt64(&visitors, 1)
26.50MB 111.51MB 22: fmt.Fprintf(w, "<html><h1 stype='color: \"%s\"'>Welcome!</h1>You are visitor number %d!", r.FormValue("color"), visitNum)
. . 23:}
. . 24:
. . 25:func main() {
. . 26: log.Printf("Starting on port 8080")
. . 27: http.HandleFunc("/hi", handleHi)
(pprof)
内存从300MB降到111MB。内存来自哪?看到list handleHi,fmt.Fprintf分配了111.51MB。
我们来看这一行代码:
fmt.Fprintf(w, "<h1 style='color: %s'>Welcome!</h1>You are visitor number %d!",
r.FormValue("color"), num)
fmt.Fprintf的manual:
$ go doc fmt.Fprintf
func Fprintf(w io.Writer, format string, a …interface{}) (n int, err error)
Fprintf formats according to a format specifier and writes to w. It returns
the number of bytes written and any write error encountered.
这里回顾一下Go type在runtime中的内存占用:
A Go interface is 2 words of memory: (type, pointer).
A Go string is 2 words of memory: (base pointer, length)
A Go slice is 3 words of memory: (base pointer, length, capacity)
每次调用fmt.Fprintf,参数以value值形式传入函数时,程序就要为每个变参分配一个占用16bytes的empty interface,然后用传入的类型初始化该interface value。这就是这块累计分配内存较多的原因。
十一、消除所有内存分配
下面的优化代码可能在实际中并不需要,但一旦真的成为瓶颈,可以这么做:
//go-debug-profile-optimization/step6/demo.go
… …
var bufPool = sync.Pool{
New: func() interface{} {
return new(bytes.Buffer)
},
}
func handleHi(w http.ResponseWriter, r *http.Request) {
if !rxOptionalID.MatchString(r.FormValue("color")) {
http.Error(w, "Optional color is invalid", http.StatusBadRequest)
return
}
visitNum := atomic.AddInt64(&visitors, 1)
buf := bufPool.Get().(*bytes.Buffer)
defer bufPool.Put(buf)
buf.Reset()
buf.WriteString("<h1 style='color: ")
buf.WriteString(r.FormValue("color"))
buf.WriteString("'>Welcome!</h1>You are visitor number ")
b := strconv.AppendInt(buf.Bytes(), int64(visitNum), 10)
b = append(b, '!')
w.Write(b)
}
… …
$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -memprofile=prof.mem
PASS
BenchmarkHi-4 5000000 780 ns/op 192 B/op 3 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step6 4.709s
go tool pprof –alloc_space step6.test prof.mem
Entering interactive mode (type "help" for commands)
(pprof) top –cum 10
1.07GB of 1.07GB total ( 100%)
Dropped 5 nodes (cum <= 0.01GB)
flat flat% sum% cum cum%
1.07GB 100% 100% 1.07GB 100% step6.BenchmarkHi
0 0% 100% 1.07GB 100% runtime.goexit
0 0% 100% 1.07GB 100% testing.(*B).launch
0 0% 100% 1.07GB 100% testing.(*B).runN
$ go test -bench=. -memprofile=prof.mem | tee mem.6
PASS
BenchmarkHi-4 2000000 790 ns/op 192 B/op 3 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step6 2.401s
$ benchcmp step5/mem.5 step6/mem.6
benchmark old ns/op new ns/op delta
BenchmarkHi-4 1513 790 -47.79%
benchmark old allocs new allocs delta
BenchmarkHi-4 6 3 -50.00%
benchmark old bytes new bytes delta
BenchmarkHi-4 304 192 -36.84%
可以看到handleHi已经不在top列表中了。benchcmp结果也显示内存分配又有大幅下降!
十二、竞争(Contention)优化
为handleHi编写一个Parallel benchmark test:
//go-debug-profile-optimization/step7/demo_test.go
… …
func BenchmarkHiParallel(b *testing.B) {
r, err := http.ReadRequest(bufio.NewReader(strings.NewReader("GET / HTTP/1.0\r\n\r\n")))
if err != nil {
b.Fatal(err)
}
b.RunParallel(func(pb *testing.PB) {
rw := httptest.NewRecorder()
for pb.Next() {
handleHi(rw, r)
reset(rw)
}
})
}
… …
执行测试,并分析结果:
$ go test -bench=Parallel -blockprofile=prof.block
PASS
BenchmarkHiParallel-4 5000000 305 ns/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step7 1.947s
$ go tool pprof step7.test prof.block
Entering interactive mode (type "help" for commands)
(pprof) top –cum 10
3.68s of 3.72s total (98.82%)
Dropped 29 nodes (cum <= 0.02s)
Showing top 10 nodes out of 20 (cum >= 1.84s)
flat flat% sum% cum cum%
0 0% 0% 3.72s 100% runtime.goexit
1.84s 49.46% 49.46% 1.84s 49.46% runtime.chanrecv1
0 0% 49.46% 1.84s 49.46% main.main
0 0% 49.46% 1.84s 49.46% runtime.main
0 0% 49.46% 1.84s 49.46% testing.(*M).Run
0 0% 49.46% 1.84s 49.43% testing.(*B).run
0 0% 49.46% 1.84s 49.43% testing.RunBenchmarks
0 0% 49.46% 1.84s 49.36% step7.BenchmarkHiParallel
1.84s 49.36% 98.82% 1.84s 49.36% sync.(*WaitGroup).Wait
0 0% 98.82% 1.84s 49.36% testing.(*B).RunParallel
(pprof) list BenchmarkHiParallel
Total: 3.72s
ROUTINE ====== step7.BenchmarkHiParallel in step7/demo_test.go
0 1.84s (flat, cum) 49.36% of Total
. . 113: rw := httptest.NewRecorder()
. . 114: for pb.Next() {
. . 115: handleHi(rw, r)
. . 116: reset(rw)
. . 117: }
. 1.84s 118: })
. . 119:}
ROUTINE ==== step7.BenchmarkHiParallel.func1 in step7/demo_test.go
0 43.02ms (flat, cum) 1.16% of Total
. . 110: }
. . 111:
. . 112: b.RunParallel(func(pb *testing.PB) {
. . 113: rw := httptest.NewRecorder()
. . 114: for pb.Next() {
. 43.02ms 115: handleHi(rw, r)
. . 116: reset(rw)
. . 117: }
. . 118: })
. . 119:}
(pprof) list handleHi
Total: 3.72s
ROUTINE =====step7.handleHi in step7/demo.go
0 43.02ms (flat, cum) 1.16% of Total
. . 18: return new(bytes.Buffer)
. . 19: },
. . 20:}
. . 21:
. . 22:func handleHi(w http.ResponseWriter, r *http.Request) {
. 43.01ms 23: if !rxOptionalID.MatchString(r.FormValue("color")) {
. . 24: http.Error(w, "Optional color is invalid", http.StatusBadRequest)
. . 25: return
. . 26: }
. . 27:
. . 28: visitNum := atomic.AddInt64(&visitors, 1)
. 2.50us 29: buf := bufPool.Get().(*bytes.Buffer)
. . 30: defer bufPool.Put(buf)
. . 31: buf.Reset()
. . 32: buf.WriteString("<h1 style='color: ")
. . 33: buf.WriteString(r.FormValue("color"))
. . 34: buf.WriteString("'>Welcome!</h1>You are visitor number ")
(pprof)
handleHi中MatchString这块是一个焦点,这里耗时较多。
优化方法(step8):
//go-debug-profile-optimization/step8/demo.go
… …
var colorRxPool = sync.Pool{
New: func() interface{} { return regexp.MustCompile(`\w*$`) },
}
func handleHi(w http.ResponseWriter, r *http.Request) {
if !colorRxPool.Get().(*regexp.Regexp).MatchString(r.FormValue("color")) {
http.Error(w, "Optional color is invalid", http.StatusBadRequest)
return
}
visitNum := atomic.AddInt64(&visitors, 1)
buf := bufPool.Get().(*bytes.Buffer)
defer bufPool.Put(buf)
buf.Reset()
buf.WriteString("<h1 style='color: ")
buf.WriteString(r.FormValue("color"))
buf.WriteString("'>Welcome!</h1>You are visitor number ")
b := strconv.AppendInt(buf.Bytes(), int64(visitNum), 10)
b = append(b, '!')
w.Write(b)
}
… …
测试执行与分析:
$ go test -bench=Parallel -blockprofile=prof.block
PASS
BenchmarkHiParallel-4 100000 19190 ns/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step8 2.219s
$ go tool pprof step8.test prof.block
Entering interactive mode (type "help" for commands)
(pprof) top –cum 10
4.22s of 4.23s total (99.69%)
Dropped 28 nodes (cum <= 0.02s)
Showing top 10 nodes out of 12 (cum >= 2.11s)
flat flat% sum% cum cum%
0 0% 0% 4.23s 100% runtime.goexit
2.11s 49.90% 49.90% 2.11s 49.90% runtime.chanrecv1
0 0% 49.90% 2.11s 49.89% main.main
0 0% 49.90% 2.11s 49.89% runtime.main
0 0% 49.90% 2.11s 49.89% testing.(*M).Run
0 0% 49.90% 2.11s 49.86% testing.(*B).run
0 0% 49.90% 2.11s 49.86% testing.RunBenchmarks
0 0% 49.90% 2.11s 49.79% step8.BenchmarkHiParallel
2.11s 49.79% 99.69% 2.11s 49.79% sync.(*WaitGroup).Wait
0 0% 99.69% 2.11s 49.79% testing.(*B).RunParallel
(pprof) list BenchmarkHiParallel
Total: 4.23s
ROUTINE ======step8.BenchmarkHiParallel in step8/demo_test.go
0 2.11s (flat, cum) 49.79% of Total
. . 113: rw := httptest.NewRecorder()
. . 114: for pb.Next() {
. . 115: handleHi(rw, r)
. . 116: reset(rw)
. . 117: }
. 2.11s 118: })
. . 119:}
ROUTINE ======step8.BenchmarkHiParallel.func1 in step8/demo_test.go
0 11.68ms (flat, cum) 0.28% of Total
. . 110: }
. . 111:
. . 112: b.RunParallel(func(pb *testing.PB) {
. . 113: rw := httptest.NewRecorder()
. . 114: for pb.Next() {
. 11.68ms 115: handleHi(rw, r)
. . 116: reset(rw)
. . 117: }
. . 118: })
. . 119:}
(pprof) list handleHi
Total: 4.23s
ROUTINE ======step8.handleHi in step8/demo.go
0 11.68ms (flat, cum) 0.28% of Total
. . 21:var colorRxPool = sync.Pool{
. . 22: New: func() interface{} { return regexp.MustCompile(`\w*$`) },
. . 23:}
. . 24:
. . 25:func handleHi(w http.ResponseWriter, r *http.Request) {
. 5.66ms 26: if !colorRxPool.Get().(*regexp.Regexp).MatchString(r.FormValue("color")) {
. . 27: http.Error(w, "Optional color is invalid", http.StatusBadRequest)
. . 28: return
. . 29: }
. . 30:
. . 31: visitNum := atomic.AddInt64(&visitors, 1)
. 6.02ms 32: buf := bufPool.Get().(*bytes.Buffer)
. . 33: defer bufPool.Put(buf)
. . 34: buf.Reset()
. . 35: buf.WriteString("<h1 style='color: ")
. . 36: buf.WriteString(r.FormValue("color"))
. . 37: buf.WriteString("'>Welcome!</h1>You are visitor number ")
(pprof)
优化后,MatchString从43ms降到5.66ms。
评论