标签 goroutine 下的文章

Go程序调试、分析与优化

Brad FitzpatrickYAPC Asia 2015(Yet Another Perl Conference)上做了一次技术分享,题为:"Go Debugging, Profiling, and Optimization"。个人感觉这篇分享中价值最大的是BradFitz现场演示的一个有关如何对Go程序进行调试、分析和优化的 Demo,Brad将demo上传到了他个人在github.com的repo中,但不知为何,repo中的代码似乎与repo里talk.md中的说明不甚一致(btw,我并没有看video)。于 是打算在这里按照Brad的思路重新走一遍demo的演示流程(所有演示代码在这里可以下载到)。

一、实验环境

$uname -a
Linux pc-tony 3.13.0-61-generic #100~precise1-Ubuntu SMP Wed Jul 29 12:06:40 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

注意:在Darwin或Windows下,profile的结果可能与这里有很大不同(甚至完全不一样的输出和瓶颈热点)。

$go version
go version go1.5 linux/amd64

$ go env
GOARCH="amd64"
GOBIN="/home1/tonybai/.bin/go15/bin"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home1/tonybai/proj/GoProjects"
GORACE=""
GOROOT="/home1/tonybai/.bin/go15"
GOTOOLDIR="/home1/tonybai/.bin/go15/pkg/tool/linux_amd64"
GO15VENDOREXPERIMENT="1"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"

代码基于Brad的github.com/bradfitz/talk-yapc-asia-2015

二、待优化程序(step0)

待优化程序,也就是原始程序,我们放在step0中:

//go-debug-profile-optimization/step0/demo.go

package main

import (
    "fmt"
    "log"
    "net/http"
    "regexp"
)

var visitors int

func handleHi(w http.ResponseWriter, r *http.Request) {
    if match, _ := regexp.MatchString(`^\w*$`, r.FormValue("color")); !match {
        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
        return
    }
    visitors++
    w.Header().Set("Content-Type", "text/html; charset=utf-8")
    w.Write([]byte("<h1 style='color: " + r.FormValue("color") +
        "'>Welcome!</h1>You are visitor number " + fmt.Sprint(visitors) + "!"))
}

func main() {
    log.Printf("Starting on port 8080")
    http.HandleFunc("/hi", handleHi)
    log.Fatal(http.ListenAndServe("127.0.0.1:8080", nil))
}

$go run demo.go
2015/08/25 09:42:35 Starting on port 8080

在浏览器输入:http://localhost:8080/hi

一切顺利的话,页面会显示:

Welcome!

You are visitor number 1!

三、添加测试代码

按照talk.md中的说明,brad repo中demo中根本没有测试代码(commit 2427d0faa12ed1fb05f1e6a1e69307c11259c2b2)。

于是我根据作者的意图,新增了demo_test.go,采用TestHandleHi_Recorder和TestHandleHi_TestServer对HandleHi进行测试:

//go-debug-profile-optimization/step0/demo_test.go
package main

import (
    "bufio"
    "net/http"
    "net/http/httptest"
    "strings"
    "testing"
)

func TestHandleHi_Recorder(t *testing.T) {
    rw := httptest.NewRecorder()
    handleHi(rw, req(t, "GET / HTTP/1.0\r\n\r\n"))
    if !strings.Contains(rw.Body.String(), "visitor number") {
        t.Errorf("Unexpected output: %s", rw.Body)
    }
}

func req(t *testing.T, v string) *http.Request {
    req, err := http.ReadRequest(bufio.NewReader(strings.NewReader(v)))
    if err != nil {
        t.Fatal(err)
    }
    return req
}

func TestHandleHi_TestServer(t *testing.T) {
    ts := httptest.NewServer(http.HandlerFunc(handleHi))
    defer ts.Close()
    res, err := http.Get(ts.URL)
    if err != nil {
        t.Error(err)
        return
    }
    if g, w := res.Header.Get("Content-Type"), "text/html; charset=utf-8"; g != w {
        t.Errorf("Content-Type = %q; want %q", g, w)
    }
    slurp, err := ioutil.ReadAll(res.Body)
    defer res.Body.Close()
    if err != nil {
        t.Error(err)
        return
    }
    t.Logf("Got: %s", slurp)
}

$ go test -v
=== RUN   TestHandleHi_Recorder
— PASS: TestHandleHi_Recorder (0.00s)
=== RUN   TestHandleHi_TestServer
— PASS: TestHandleHi_TestServer (0.00s)
    demo_test.go:45: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 2!
PASS
ok      _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step0    0.007s

测试通过!

至此,step0使命结束。

四、Race Detector(竞态分析)

并发设计使得程序可以更好更有效的利用现代处理器的多核心。但并发设计很容易引入竞态,导致严重bug。Go程序中竞态就是当多个goroutine并发 访问某共享数据且未使用同步机制时,且至少一个goroutine进行了写操作。不过go工具自带race分析功能。在分析优化step0中demo代码 前,我们先要保证demo代码中不存在竞态。

工具的使用方法就是在go test后加上-race标志,在step0目录下:

$ go test -v -race
=== RUN   TestHandleHi_Recorder
— PASS: TestHandleHi_Recorder (0.00s)
=== RUN   TestHandleHi_TestServer
— PASS: TestHandleHi_TestServer (0.00s)
    demo_test.go:45: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 2!
PASS
ok      _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step0    1.012s

-race通过做运行时分析做竞态分析,虽然不存在误报,但却存在实际有竞态,但工具没发现的情况。接下来我们改造一下测试代码,让test并发起来:

向step1(copy自step0)中demo_test.go中添加一个test method:

//go-debug-profile-optimization/step1/demo_test.go
… …

func TestHandleHi_TestServer_Parallel(t *testing.T) {
    ts := httptest.NewServer(http.HandlerFunc(handleHi))
    defer ts.Close()
    var wg sync.WaitGroup
    for i := 0; i < 2; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            res, err := http.Get(ts.URL)
            if err != nil {
                t.Error(err)
                return
            }
            if g, w := res.Header.Get("Content-Type"), "text/html; charset=utf-8"; g != w {
                t.Errorf("Content-Type = %q; want %q", g, w)
            }
            slurp, err := ioutil.ReadAll(res.Body)
            defer res.Body.Close()
            if err != nil {
                t.Error(err)
                return
            }
            t.Logf("Got: %s", slurp)
        }()
    }
    wg.Wait()
}
… …

执行竞态test:

$ go test -v -race
=== RUN   TestHandleHi_Recorder
— PASS: TestHandleHi_Recorder (0.00s)
=== RUN   TestHandleHi_TestServer
— PASS: TestHandleHi_TestServer (0.00s)
    demo_test.go:46: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 2!
=== RUN   TestHandleHi_TestServer_Parallel
==================
WARNING: DATA RACE
Read by goroutine 22:
  _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1.handleHi()
      /home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1/demo.go:17 +0xf5
  net/http.HandlerFunc.ServeHTTP()
      /tmp/workdir/go/src/net/http/server.go:1422 +0×47
  net/http/httptest.(*waitGroupHandler).ServeHTTP()
      /tmp/workdir/go/src/net/http/httptest/server.go:200 +0xfe
  net/http.serverHandler.ServeHTTP()
      /tmp/workdir/go/src/net/http/server.go:1862 +0×206
  net/http.(*conn).serve()
      /tmp/workdir/go/src/net/http/server.go:1361 +0x117c

Previous write by goroutine 25:
  _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1.handleHi()
      /home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1/demo.go:17 +0×111
  net/http.HandlerFunc.ServeHTTP()
      /tmp/workdir/go/src/net/http/server.go:1422 +0×47
  net/http/httptest.(*waitGroupHandler).ServeHTTP()
      /tmp/workdir/go/src/net/http/httptest/server.go:200 +0xfe
  net/http.serverHandler.ServeHTTP()
      /tmp/workdir/go/src/net/http/server.go:1862 +0×206
  net/http.(*conn).serve()
      /tmp/workdir/go/src/net/http/server.go:1361 +0x117c

Goroutine 22 (running) created at:
  net/http.(*Server).Serve()
      /tmp/workdir/go/src/net/http/server.go:1910 +0×464

Goroutine 25 (running) created at:
  net/http.(*Server).Serve()
      /tmp/workdir/go/src/net/http/server.go:1910 +0×464
==================
— PASS: TestHandleHi_TestServer_Parallel (0.00s)
    demo_test.go:71: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 3!
    demo_test.go:71: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 4!
PASS
Found 1 data race(s)
exit status 66
FAIL    _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1    1.023s

工具发现demo.go第17行:
        visitors++
是一处潜在的竞态条件。

visitors被多个goroutine访问但未采用同步机制。

既然发现了竞态条件,我们就需要fix it。有多种fix方法可选:

1、使用channel
2、使用Mutex
3、使用atomic

Brad使用了atomic:

//go-debug-profile-optimization/step1/demo.go
… …
var visitors int64 // must be accessed atomically

func handleHi(w http.ResponseWriter, r *http.Request) {
    if match, _ := regexp.MatchString(`^\w*$`, r.FormValue("color")); !match {
        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
        return
    }
    visitNum := atomic.AddInt64(&visitors, 1)
    w.Header().Set("Content-Type", "text/html; charset=utf-8")
    w.Write([]byte("<h1 style='color: " + r.FormValue("color") +
        "'>Welcome!</h1>You are visitor number " + fmt.Sprint(visitNum) + "!"))
}
… …

再做一次测试:

$ go test -v -race
=== RUN   TestHandleHi_Recorder
— PASS: TestHandleHi_Recorder (0.00s)
=== RUN   TestHandleHi_TestServer
— PASS: TestHandleHi_TestServer (0.00s)
    demo_test.go:46: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 2!
=== RUN   TestHandleHi_TestServer_Parallel
— PASS: TestHandleHi_TestServer_Parallel (0.00s)
    demo_test.go:71: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 3!
    demo_test.go:71: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 4!
PASS
ok      _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1    1.020s

竞态条件被消除了!

至此,step1结束了使命!

五、CPU Profiling

要做CPU Profilling,我们需要benchmark数据,Go test提供benchmark test功能,我们只要写对应的Benchmark测试方法即可:

//go-debug-profile-optimization/step2/demo_test.go
… …
func BenchmarkHi(b *testing.B) {
    b.ReportAllocs()

    req, err := http.ReadRequest(bufio.NewReader(strings.NewReader("GET / HTTP/1.0\r\n\r\n")))
    if err != nil {
        b.Fatal(err)
    }

    for i := 0; i < b.N; i++ {
        rw := httptest.NewRecorder()
        handleHi(rw, req)
    }
}
… …

$ go test -v -run=^$ -bench=.
PASS
BenchmarkHi-4      100000         14808 ns/op        4961 B/op          81 allocs/op
ok      _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step2    1.648s

开始CPU Profiling:

$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -cpuprofile=prof.cpu
PASS
BenchmarkHi-4      200000         14679 ns/op        4961 B/op          81 allocs/op
ok      _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step2    3.096s

执行完benchmark test后,step2目录下出现两个新文件prof.cpu和step2.test,这两个文件将作为后续go tool pprof的输入:
$ls
demo.go        demo_test.go    prof.cpu    step2.test*

使用go profile viewer工具:

$ go tool pprof step2.test prof.cpu
Entering interactive mode (type "help" for commands)
(pprof) top
1830ms of 3560ms total (51.40%)
Dropped 53 nodes (cum <= 17.80ms)
Showing top 10 nodes out of 133 (cum >= 1290ms)
      flat  flat%   sum%        cum   cum%
     480ms 13.48% 13.48%      980ms 27.53%  runtime.growslice
     360ms 10.11% 23.60%      700ms 19.66%  runtime.mallocgc
     170ms  4.78% 28.37%      170ms  4.78%  runtime.heapBitsSetType
     170ms  4.78% 33.15%      200ms  5.62%  runtime.scanblock
     120ms  3.37% 36.52%     1100ms 30.90%  regexp.makeOnePass.func2
     120ms  3.37% 39.89%      550ms 15.45%  runtime.newarray
     110ms  3.09% 42.98%      300ms  8.43%  runtime.makeslice
     110ms  3.09% 46.07%      220ms  6.18%  runtime.mapassign1
     100ms  2.81% 48.88%      100ms  2.81%  runtime.futex
      90ms  2.53% 51.40%     1290ms 36.24%  regexp.makeOnePass

(pprof) top –cum
0.18s of 3.56s total ( 5.06%)
Dropped 53 nodes (cum <= 0.02s)
Showing top 10 nodes out of 133 (cum >= 1.29s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%      3.26s 91.57%  runtime.goexit
     0.02s  0.56%  0.56%      2.87s 80.62%  BenchmarkHi
         0     0%  0.56%      2.87s 80.62%  testing.(*B).launch
         0     0%  0.56%      2.87s 80.62%  testing.(*B).runN
     0.03s  0.84%  1.40%      2.80s 78.65%  step2.handleHi
     0.01s  0.28%  1.69%      2.46s 69.10%  regexp.MatchString
         0     0%  1.69%      2.24s 62.92%  regexp.Compile
         0     0%  1.69%      2.24s 62.92%  regexp.compile
     0.03s  0.84%  2.53%      1.56s 43.82%  regexp.compileOnePass
     0.09s  2.53%  5.06%      1.29s 36.24%  regexp.makeOnePass

(pprof) list handleHi
Total: 3.56s
ROUTINE ======================== handleHi in go-debug-profile-optimization/step2/demo.go
      30ms      2.80s (flat, cum) 78.65% of Total
         .          .      9:)
         .          .     10:
         .          .     11:var visitors int64 // must be accessed atomically
         .          .     12:
         .          .     13:func handleHi(w http.ResponseWriter, r *http.Request) {
         .      2.47s     14:    if match, _ := regexp.MatchString(`^\w*$`, r.FormValue("color")); !match {
         .          .     15:        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
         .          .     16:        return
         .          .     17:    }
      10ms       20ms     18:    visitNum := atomic.AddInt64(&visitors, 1)
      10ms       90ms     19:    w.Header().Set("Content-Type", "text/html; charset=utf-8")
      10ms       20ms     20:    w.Write([]byte("<h1 style='color: " + r.FormValue("color") +
         .      200ms     21:        "'>Welcome!</h1>You are visitor number " + fmt.Sprint(visitNum) + "!"))
         .          .     22:}
         .          .     23:
         .          .     24:func main() {
         .          .     25:    log.Printf("Starting on port 8080")
         .          .     26:    http.HandleFunc("/hi", handleHi)
(pprof)

从top –cum来看,handleHi消耗cpu较大,而handleHi中,又是MatchString耗时最长。

六、第一次优化

前面已经发现MatchString较为耗时,优化手段:让正则式仅编译一次(step3):

// go-debug-profile-optimization/step3/demo.go

… …
var visitors int64 // must be accessed atomically

var rxOptionalID = regexp.MustCompile(`^\d*$`)

func handleHi(w http.ResponseWriter, r *http.Request) {
    if !rxOptionalID.MatchString(r.FormValue("color")) {
        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
        return
    }

    visitNum := atomic.AddInt64(&visitors, 1)
    w.Header().Set("Content-Type", "text/html; charset=utf-8")
    w.Write([]byte("<h1 style='color: " + r.FormValue("color") +
        "'>Welcome!</h1>You are visitor number " + fmt.Sprint(visitNum) + "!"))
}
… …

运行一下bench:

$ go test -bench=.
PASS
BenchmarkHi-4     1000000          1678 ns/op         720 B/op           9 allocs/op
ok      _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step3    1.710s

对比之前在step2中运行的bench结果:

$ go test -v -run=^$ -bench=.
PASS
BenchmarkHi-4      100000         14808 ns/op        4961 B/op          81 allocs/op
ok      _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step2    1.648s

耗时相同,但优化后的bench运行了100w次,而之前的Bench运行10w次,相当于性能提高10倍。

再看看cpu prof结果:

$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=3s -cpuprofile=prof.cpu
PASS
BenchmarkHi-4     3000000          1640 ns/op         720 B/op           9 allocs/op
ok      _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step3    6.540s

$ go tool pprof step3.test prof.cpu
Entering interactive mode (type "help" for commands)
(pprof) top –cum 30
2.74s of 8.07s total (33.95%)
Dropped 72 nodes (cum <= 0.04s)
Showing top 30 nodes out of 103 (cum >= 0.56s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%      7.17s 88.85%  runtime.goexit
     0.05s  0.62%  0.62%      6.21s 76.95%  step3.BenchmarkHi
         0     0%  0.62%      6.21s 76.95%  testing.(*B).launch
         0     0%  0.62%      6.21s 76.95%  testing.(*B).runN
     0.06s  0.74%  1.36%      4.96s 61.46%  step3.handleHi
     1.15s 14.25% 15.61%      2.35s 29.12%  runtime.mallocgc
     0.02s  0.25% 15.86%      1.63s 20.20%  runtime.systemstack
         0     0% 15.86%      1.53s 18.96%  net/http.Header.Set
     0.06s  0.74% 16.60%      1.53s 18.96%  net/textproto.MIMEHeader.Set
     0.09s  1.12% 17.72%      1.22s 15.12%  runtime.newobject
     0.05s  0.62% 18.34%      1.09s 13.51%  fmt.Sprint
     0.20s  2.48% 20.82%         1s 12.39%  runtime.mapassign1
         0     0% 20.82%      0.81s 10.04%  runtime.mcall
     0.01s  0.12% 20.94%      0.79s  9.79%  runtime.schedule
     0.05s  0.62% 21.56%      0.76s  9.42%  regexp.(*Regexp).MatchString
     0.09s  1.12% 22.68%      0.71s  8.80%  regexp.(*Regexp).doExecute
     0.01s  0.12% 22.80%      0.71s  8.80%  runtime.concatstring5
     0.20s  2.48% 25.28%      0.70s  8.67%  runtime.concatstrings
         0     0% 25.28%      0.69s  8.55%  runtime.gosweepone
     0.05s  0.62% 25.90%      0.69s  8.55%  runtime.mSpan_Sweep
         0     0% 25.90%      0.68s  8.43%  runtime.bgsweep
     0.04s   0.5% 26.39%      0.68s  8.43%  runtime.newarray
     0.01s  0.12% 26.52%      0.67s  8.30%  runtime.goschedImpl
     0.01s  0.12% 26.64%      0.65s  8.05%  runtime.gosched_m
         0     0% 26.64%      0.65s  8.05%  runtime.gosweepone.func1
     0.01s  0.12% 26.77%      0.65s  8.05%  runtime.sweepone
     0.28s  3.47% 30.24%      0.62s  7.68%  runtime.makemap
     0.17s  2.11% 32.34%      0.59s  7.31%  runtime.heapBitsSweepSpan
     0.02s  0.25% 32.59%      0.58s  7.19%  fmt.(*pp).doPrint
     0.11s  1.36% 33.95%      0.56s  6.94%  fmt.(*pp).printArg

handleHi耗时有一定下降。

七、Mem Profiling

在step3目录下执行bench,获取mem分配数据:

$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -memprofile=prof.mem
PASS
BenchmarkHi-4     2000000          1657 ns/op         720 B/op           9 allocs/op
ok      _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step3    5.002s

使用pprof工具分析mem:

$ go tool pprof –alloc_space step3.test prof.mem
Entering interactive mode (type "help" for commands)
(pprof) top
2065.91MB of 2067.41MB total (99.93%)
Dropped 14 nodes (cum <= 10.34MB)
      flat  flat%   sum%        cum   cum%
 1076.35MB 52.06% 52.06%  1076.35MB 52.06%  net/textproto.MIMEHeader.Set
  535.54MB 25.90% 77.97%  2066.91MB   100%  step3.BenchmarkHi
  406.52MB 19.66% 97.63%  1531.37MB 74.07%  step3.handleHi
   47.50MB  2.30% 99.93%    48.50MB  2.35%  fmt.Sprint
         0     0% 99.93%  1076.35MB 52.06%  net/http.Header.Set
         0     0% 99.93%  2066.91MB   100%  runtime.goexit
         0     0% 99.93%  2066.91MB   100%  testing.(*B).launch
         0     0% 99.93%  2066.91MB   100%  testing.(*B).runN

(pprof) top -cum
2065.91MB of 2067.41MB total (99.93%)
Dropped 14 nodes (cum <= 10.34MB)
      flat  flat%   sum%        cum   cum%
  535.54MB 25.90% 25.90%  2066.91MB   100%  step3.BenchmarkHi
         0     0% 25.90%  2066.91MB   100%  runtime.goexit
         0     0% 25.90%  2066.91MB   100%  testing.(*B).launch
         0     0% 25.90%  2066.91MB   100%  testing.(*B).runN
  406.52MB 19.66% 45.57%  1531.37MB 74.07%  step3.handleHi
         0     0% 45.57%  1076.35MB 52.06%  net/http.Header.Set
 1076.35MB 52.06% 97.63%  1076.35MB 52.06%  net/textproto.MIMEHeader.Set
   47.50MB  2.30% 99.93%    48.50MB  2.35%  fmt.Sprint

(pprof) list handleHi
Total: 2.02GB
     ROUTINE =========step3.handleHi in step3/demo.go
  406.52MB     1.50GB (flat, cum) 74.07% of Total
         .          .     17:        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
         .          .     18:        return
         .          .     19:    }
         .          .     20:
         .          .     21:    visitNum := atomic.AddInt64(&visitors, 1)
         .     1.05GB     22:    w.Header().Set("Content-Type", "text/html; charset=utf-8")
         .          .     23:    w.Write([]byte("<h1 style='color: " + r.FormValue("color") +
  406.52MB   455.02MB     24:        "'>Welcome!</h1>You are visitor number " + fmt.Sprint(visitNum) + "!"))
         .          .     25:}
         .          .     26:
         .          .     27:func main() {
         .          .     28:    log.Printf("Starting on port 8080")
         .          .     29:    http.HandleFunc("/hi", handleHi)
(pprof)

可以看到handleHi22、23两行占用了较多内存。

八、第二次优化

第二次优化的方法:
1、删除w.Header().Set这行
2、用fmt.Fprintf替代w.Write

第二次优化的代码在step4目录中:

// go-debug-profile-optimization/step4/demo.go
… …
func handleHi(w http.ResponseWriter, r *http.Request) {
    if !rxOptionalID.MatchString(r.FormValue("color")) {
        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
        return
    }

    visitNum := atomic.AddInt64(&visitors, 1)
    fmt.Fprintf(w, "<html><h1 stype='color: \"%s\"'>Welcome!</h1>You are visitor number %d!", r.FormValue("color"), visitNum)
}
… …

执行一遍pprof:

$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -memprofile=prof.mem
PASS
BenchmarkHi-4     2000000          1428 ns/op         304 B/op           6 allocs/op
ok      _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step4    4.343s

$ go tool pprof –alloc_space step4.test prof.mem
Entering interactive mode (type "help" for commands)
(pprof) top
868.06MB of 868.56MB total (99.94%)
Dropped 5 nodes (cum <= 4.34MB)
      flat  flat%   sum%        cum   cum%
  559.54MB 64.42% 64.42%   868.06MB 99.94%  step4.BenchmarkHi
  219.52MB 25.27% 89.70%   219.52MB 25.27%  bytes.makeSlice
      89MB 10.25% 99.94%   308.52MB 35.52%  step4.handleHi
         0     0% 99.94%   219.52MB 25.27%  bytes.(*Buffer).Write
         0     0% 99.94%   219.52MB 25.27%  bytes.(*Buffer).grow
         0     0% 99.94%   219.52MB 25.27%  fmt.Fprintf
         0     0% 99.94%   219.52MB 25.27%  net/http/httptest.(*ResponseRecorder).Write
         0     0% 99.94%   868.06MB 99.94%  runtime.goexit
         0     0% 99.94%   868.06MB 99.94%  testing.(*B).launch
         0     0% 99.94%   868.06MB 99.94%  testing.(*B).runN
(pprof) top –cum
868.06MB of 868.56MB total (99.94%)
Dropped 5 nodes (cum <= 4.34MB)
      flat  flat%   sum%        cum   cum%
  559.54MB 64.42% 64.42%   868.06MB 99.94%  step4.BenchmarkHi
         0     0% 64.42%   868.06MB 99.94%  runtime.goexit
         0     0% 64.42%   868.06MB 99.94%  testing.(*B).launch
         0     0% 64.42%   868.06MB 99.94%  testing.(*B).runN
      89MB 10.25% 74.67%   308.52MB 35.52%  step4.handleHi
         0     0% 74.67%   219.52MB 25.27%  bytes.(*Buffer).Write
         0     0% 74.67%   219.52MB 25.27%  bytes.(*Buffer).grow
  219.52MB 25.27% 99.94%   219.52MB 25.27%  bytes.makeSlice
         0     0% 99.94%   219.52MB 25.27%  fmt.Fprintf
         0     0% 99.94%   219.52MB 25.27%  net/http/httptest.(*ResponseRecorder).Write
(pprof) list handleHi
Total: 868.56MB
ROUTINE ============ step4.handleHi in step4/demo.go
      89MB   308.52MB (flat, cum) 35.52% of Total
         .          .     17:        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
         .          .     18:        return
         .          .     19:    }
         .          .     20:
         .          .     21:    visitNum := atomic.AddInt64(&visitors, 1)
      89MB   308.52MB     22:    fmt.Fprintf(w, "<html><h1 stype='color: \"%s\"'>Welcome!</h1>You are visitor number %d!", r.FormValue("color"), visitNum)
         .          .     23:}
         .          .     24:
         .          .     25:func main() {
         .          .     26:    log.Printf("Starting on port 8080")
         .          .     27:    http.HandleFunc("/hi", handleHi)
(pprof)

可以看出内存占用大幅减少。

九、Benchcmp

golang.org/x/tools中有一个工具:benchcmp,可以给出两次bench的结果对比。

github.com/golang/tools是golang.org/x/tools的一个镜像。安装benchcmp步骤:

1、go get -u github.com/golang/tools
2、mkdir -p $GOPATH/src/golang.org/x
3、mv $GOPATH/src/github.com/golang/tools $GOPATH/src/golang.org/x
4、go install golang.org/x/tools/cmd/benchcmp

我们分别在step2、step3和step4下执行如下命令:

$ go-debug-profile-optimization/step2$ go test -bench=. -memprofile=prof.mem | tee mem.2
PASS
BenchmarkHi-4      100000         14786 ns/op        4961 B/op          81 allocs/op
ok      _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step2    1.644s

go-debug-profile-optimization/step3$ go test -bench=. -memprofile=prof.mem | tee mem.3
PASS
BenchmarkHi-4     1000000          1662 ns/op         720 B/op           9 allocs/op
ok      _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step3    1.694s

go-debug-profile-optimization/step4$ go test -bench=. -memprofile=prof.mem | tee mem.4
PASS
BenchmarkHi-4     1000000          1428 ns/op         304 B/op           6 allocs/op
ok      _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step4    1.456s

利用benchcmp工具对比结果(benchcmp old new):

$ benchcmp step3/mem.3 step4/mem.4
benchmark         old ns/op     new ns/op     delta
BenchmarkHi-4     1662          1428          -14.08%

benchmark         old allocs     new allocs     delta
BenchmarkHi-4     9              6              -33.33%

benchmark         old bytes     new bytes     delta
BenchmarkHi-4     720           304           -57.78%

$ benchcmp step2/mem.2 step4/mem.4
benchmark         old ns/op     new ns/op     delta
BenchmarkHi-4     14786         1428          -90.34%

benchmark         old allocs     new allocs     delta
BenchmarkHi-4     81             6              -92.59%

benchmark         old bytes     new bytes     delta
BenchmarkHi-4     4961          304           -93.87%

可以看出优化后,内存分配大幅减少,gc的时间也随之减少。

十、内存来自哪

我们在BenchmarkHi中清理每次handleHi执行后的内存:

//step5/demo_test.go
… …
func BenchmarkHi(b *testing.B) {
    b.ReportAllocs()

    req, err := http.ReadRequest(bufio.NewReader(strings.NewReader("GET / HTTP/1.0\r\n\r\n")))
    if err != nil {
        b.Fatal(err)
    }

    for i := 0; i < b.N; i++ {
        rw := httptest.NewRecorder()
        handleHi(rw, req)
        reset(rw)
    }
}

func reset(rw *httptest.ResponseRecorder) {
    m := rw.HeaderMap
    for k := range m {
        delete(m, k)
    }
    body := rw.Body
    body.Reset()
    *rw = httptest.ResponseRecorder{
        Body:      body,
        HeaderMap: m,
    }
}

… …
$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -memprofile=prof.mem
PASS
BenchmarkHi-4     2000000          1518 ns/op         304 B/op           6 allocs/op
ok      _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step5    4.577s

$ go tool pprof –alloc_space step5.test prof.mem
Entering interactive mode (type "help" for commands)
(pprof) top –cum 10
290.52MB of 291.52MB total (99.66%)
Dropped 14 nodes (cum <= 1.46MB)
      flat  flat%   sum%        cum   cum%
         0     0%     0%   291.02MB 99.83%  runtime.goexit
  179.01MB 61.41% 61.41%   290.52MB 99.66%  step5.BenchmarkHi
         0     0% 61.41%   290.52MB 99.66%  testing.(*B).launch
         0     0% 61.41%   290.52MB 99.66%  testing.(*B).runN
   26.50MB  9.09% 70.50%   111.51MB 38.25%  step5.handleHi
         0     0% 70.50%    85.01MB 29.16%  bytes.(*Buffer).Write
         0     0% 70.50%    85.01MB 29.16%  bytes.(*Buffer).grow
   85.01MB 29.16% 99.66%    85.01MB 29.16%  bytes.makeSlice
         0     0% 99.66%    85.01MB 29.16%  fmt.Fprintf
         0     0% 99.66%    85.01MB 29.16%  net/http/httptest.(*ResponseRecorder).Write
(pprof) list handleHi
Total: 291.52MB
ROUTINE ======================== _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step5.handleHi in /home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step5/demo.go
   26.50MB   111.51MB (flat, cum) 38.25% of Total
         .          .     17:        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
         .          .     18:        return
         .          .     19:    }
         .          .     20:
         .          .     21:    visitNum := atomic.AddInt64(&visitors, 1)
   26.50MB   111.51MB     22:    fmt.Fprintf(w, "<html><h1 stype='color: \"%s\"'>Welcome!</h1>You are visitor number %d!", r.FormValue("color"), visitNum)
         .          .     23:}
         .          .     24:
         .          .     25:func main() {
         .          .     26:    log.Printf("Starting on port 8080")
         .          .     27:    http.HandleFunc("/hi", handleHi)
(pprof)

内存从300MB降到111MB。内存来自哪?看到list handleHi,fmt.Fprintf分配了111.51MB。

我们来看这一行代码:
fmt.Fprintf(w, "<h1 style='color: %s'>Welcome!</h1>You are visitor number %d!",
               r.FormValue("color"), num)

fmt.Fprintf的manual:

$ go doc fmt.Fprintf
func Fprintf(w io.Writer, format string, a …interface{}) (n int, err error)

    Fprintf formats according to a format specifier and writes to w. It returns
    the number of bytes written and any write error encountered.

这里回顾一下Go type在runtime中的内存占用:

A Go interface is 2 words of memory: (type, pointer).
A Go string is 2 words of memory: (base pointer, length)
A Go slice is 3 words of memory: (base pointer, length, capacity)

每次调用fmt.Fprintf,参数以value值形式传入函数时,程序就要为每个变参分配一个占用16bytes的empty interface,然后用传入的类型初始化该interface value。这就是这块累计分配内存较多的原因。

十一、消除所有内存分配

下面的优化代码可能在实际中并不需要,但一旦真的成为瓶颈,可以这么做:

//go-debug-profile-optimization/step6/demo.go
… …
var bufPool = sync.Pool{
    New: func() interface{} {
        return new(bytes.Buffer)
    },
}

func handleHi(w http.ResponseWriter, r *http.Request) {
    if !rxOptionalID.MatchString(r.FormValue("color")) {
        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
        return
    }

    visitNum := atomic.AddInt64(&visitors, 1)
    buf := bufPool.Get().(*bytes.Buffer)
    defer bufPool.Put(buf)
    buf.Reset()
    buf.WriteString("<h1 style='color: ")
    buf.WriteString(r.FormValue("color"))
    buf.WriteString("'>Welcome!</h1>You are visitor number ")
    b := strconv.AppendInt(buf.Bytes(), int64(visitNum), 10)
    b = append(b, '!')
    w.Write(b)
}

… …

$  go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -memprofile=prof.mem
PASS
BenchmarkHi-4     5000000           780 ns/op         192 B/op           3 allocs/op
ok      _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step6    4.709s

 go tool pprof –alloc_space step6.test prof.mem
Entering interactive mode (type "help" for commands)
(pprof) top –cum 10
1.07GB of 1.07GB total (  100%)
Dropped 5 nodes (cum <= 0.01GB)
      flat  flat%   sum%        cum   cum%
    1.07GB   100%   100%     1.07GB   100%  step6.BenchmarkHi
         0     0%   100%     1.07GB   100%  runtime.goexit
         0     0%   100%     1.07GB   100%  testing.(*B).launch
         0     0%   100%     1.07GB   100%  testing.(*B).runN

$ go test -bench=. -memprofile=prof.mem | tee mem.6
PASS
BenchmarkHi-4     2000000           790 ns/op         192 B/op           3 allocs/op
ok      _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step6    2.401s

$ benchcmp step5/mem.5 step6/mem.6
benchmark         old ns/op     new ns/op     delta
BenchmarkHi-4     1513          790           -47.79%

benchmark         old allocs     new allocs     delta
BenchmarkHi-4     6              3              -50.00%

benchmark         old bytes     new bytes     delta
BenchmarkHi-4     304           192           -36.84%

可以看到handleHi已经不在top列表中了。benchcmp结果也显示内存分配又有大幅下降!

十二、竞争(Contention)优化

为handleHi编写一个Parallel benchmark test:

//go-debug-profile-optimization/step7/demo_test.go
… …
func BenchmarkHiParallel(b *testing.B) {
    r, err := http.ReadRequest(bufio.NewReader(strings.NewReader("GET / HTTP/1.0\r\n\r\n")))
    if err != nil {
        b.Fatal(err)
    }

    b.RunParallel(func(pb *testing.PB) {
        rw := httptest.NewRecorder()
        for pb.Next() {
            handleHi(rw, r)
            reset(rw)
        }
    })
}
… …

执行测试,并分析结果:

$ go test -bench=Parallel -blockprofile=prof.block
PASS
BenchmarkHiParallel-4     5000000           305 ns/op
ok      _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step7    1.947s

$ go tool pprof step7.test  prof.block
Entering interactive mode (type "help" for commands)
(pprof) top –cum 10
3.68s of 3.72s total (98.82%)
Dropped 29 nodes (cum <= 0.02s)
Showing top 10 nodes out of 20 (cum >= 1.84s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%      3.72s   100%  runtime.goexit
     1.84s 49.46% 49.46%      1.84s 49.46%  runtime.chanrecv1
         0     0% 49.46%      1.84s 49.46%  main.main
         0     0% 49.46%      1.84s 49.46%  runtime.main
         0     0% 49.46%      1.84s 49.46%  testing.(*M).Run
         0     0% 49.46%      1.84s 49.43%  testing.(*B).run
         0     0% 49.46%      1.84s 49.43%  testing.RunBenchmarks
         0     0% 49.46%      1.84s 49.36%  step7.BenchmarkHiParallel
     1.84s 49.36% 98.82%      1.84s 49.36%  sync.(*WaitGroup).Wait
         0     0% 98.82%      1.84s 49.36%  testing.(*B).RunParallel
(pprof) list BenchmarkHiParallel
Total: 3.72s
ROUTINE ====== step7.BenchmarkHiParallel in step7/demo_test.go
         0      1.84s (flat, cum) 49.36% of Total
         .          .    113:        rw := httptest.NewRecorder()
         .          .    114:        for pb.Next() {
         .          .    115:            handleHi(rw, r)
         .          .    116:            reset(rw)
         .          .    117:        }
         .      1.84s    118:    })
         .          .    119:}
ROUTINE ==== step7.BenchmarkHiParallel.func1 in step7/demo_test.go
         0    43.02ms (flat, cum)  1.16% of Total
         .          .    110:    }
         .          .    111:
         .          .    112:    b.RunParallel(func(pb *testing.PB) {
         .          .    113:        rw := httptest.NewRecorder()
         .          .    114:        for pb.Next() {
         .    43.02ms    115:            handleHi(rw, r)
         .          .    116:            reset(rw)
         .          .    117:        }
         .          .    118:    })
         .          .    119:}
(pprof) list handleHi
Total: 3.72s
ROUTINE =====step7.handleHi in step7/demo.go
         0    43.02ms (flat, cum)  1.16% of Total
         .          .     18:        return new(bytes.Buffer)
         .          .     19:    },
         .          .     20:}
         .          .     21:
         .          .     22:func handleHi(w http.ResponseWriter, r *http.Request) {
         .    43.01ms     23:    if !rxOptionalID.MatchString(r.FormValue("color")) {
         .          .     24:        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
         .          .     25:        return
         .          .     26:    }
         .          .     27:
         .          .     28:    visitNum := atomic.AddInt64(&visitors, 1)
         .     2.50us     29:    buf := bufPool.Get().(*bytes.Buffer)
         .          .     30:    defer bufPool.Put(buf)
         .          .     31:    buf.Reset()
         .          .     32:    buf.WriteString("<h1 style='color: ")
         .          .     33:    buf.WriteString(r.FormValue("color"))
         .          .     34:    buf.WriteString("'>Welcome!</h1>You are visitor number ")
(pprof)

handleHi中MatchString这块是一个焦点,这里耗时较多。

优化方法(step8):

//go-debug-profile-optimization/step8/demo.go
… …
var colorRxPool = sync.Pool{
    New: func() interface{} { return regexp.MustCompile(`\w*$`) },
}

func handleHi(w http.ResponseWriter, r *http.Request) {
    if !colorRxPool.Get().(*regexp.Regexp).MatchString(r.FormValue("color")) {
        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
        return
    }

    visitNum := atomic.AddInt64(&visitors, 1)
    buf := bufPool.Get().(*bytes.Buffer)
    defer bufPool.Put(buf)
    buf.Reset()
    buf.WriteString("<h1 style='color: ")
    buf.WriteString(r.FormValue("color"))
    buf.WriteString("'>Welcome!</h1>You are visitor number ")
    b := strconv.AppendInt(buf.Bytes(), int64(visitNum), 10)
    b = append(b, '!')
    w.Write(b)
}
… …

测试执行与分析:

$ go test -bench=Parallel -blockprofile=prof.block
PASS
BenchmarkHiParallel-4      100000         19190 ns/op
ok      _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step8    2.219s

$ go tool pprof step8.test  prof.block
Entering interactive mode (type "help" for commands)
(pprof) top –cum 10
4.22s of 4.23s total (99.69%)
Dropped 28 nodes (cum <= 0.02s)
Showing top 10 nodes out of 12 (cum >= 2.11s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%      4.23s   100%  runtime.goexit
     2.11s 49.90% 49.90%      2.11s 49.90%  runtime.chanrecv1
         0     0% 49.90%      2.11s 49.89%  main.main
         0     0% 49.90%      2.11s 49.89%  runtime.main
         0     0% 49.90%      2.11s 49.89%  testing.(*M).Run
         0     0% 49.90%      2.11s 49.86%  testing.(*B).run
         0     0% 49.90%      2.11s 49.86%  testing.RunBenchmarks
         0     0% 49.90%      2.11s 49.79%  step8.BenchmarkHiParallel
     2.11s 49.79% 99.69%      2.11s 49.79%  sync.(*WaitGroup).Wait
         0     0% 99.69%      2.11s 49.79%  testing.(*B).RunParallel
(pprof) list BenchmarkHiParallel
Total: 4.23s
ROUTINE ======step8.BenchmarkHiParallel in step8/demo_test.go
         0      2.11s (flat, cum) 49.79% of Total
         .          .    113:        rw := httptest.NewRecorder()
         .          .    114:        for pb.Next() {
         .          .    115:            handleHi(rw, r)
         .          .    116:            reset(rw)
         .          .    117:        }
         .      2.11s    118:    })
         .          .    119:}
ROUTINE ======step8.BenchmarkHiParallel.func1 in step8/demo_test.go
         0    11.68ms (flat, cum)  0.28% of Total
         .          .    110:    }
         .          .    111:
         .          .    112:    b.RunParallel(func(pb *testing.PB) {
         .          .    113:        rw := httptest.NewRecorder()
         .          .    114:        for pb.Next() {
         .    11.68ms    115:            handleHi(rw, r)
         .          .    116:            reset(rw)
         .          .    117:        }
         .          .    118:    })
         .          .    119:}
(pprof) list handleHi
Total: 4.23s
ROUTINE ======step8.handleHi in step8/demo.go
         0    11.68ms (flat, cum)  0.28% of Total
         .          .     21:var colorRxPool = sync.Pool{
         .          .     22:    New: func() interface{} { return regexp.MustCompile(`\w*$`) },
         .          .     23:}
         .          .     24:
         .          .     25:func handleHi(w http.ResponseWriter, r *http.Request) {
         .     5.66ms     26:    if !colorRxPool.Get().(*regexp.Regexp).MatchString(r.FormValue("color")) {
         .          .     27:        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
         .          .     28:        return
         .          .     29:    }
         .          .     30:
         .          .     31:    visitNum := atomic.AddInt64(&visitors, 1)
         .     6.02ms     32:    buf := bufPool.Get().(*bytes.Buffer)
         .          .     33:    defer bufPool.Put(buf)
         .          .     34:    buf.Reset()
         .          .     35:    buf.WriteString("<h1 style='color: ")
         .          .     36:    buf.WriteString(r.FormValue("color"))
         .          .     37:    buf.WriteString("'>Welcome!</h1>You are visitor number ")
(pprof)

优化后,MatchString从43ms降到5.66ms。

Go 1.5中值得关注的几个变化

GopherCon2015开幕之 际,Google Go Team终于放出了Go 1.5Beta1版本的安装包。在go 1.5Beta1的发布说明中,Go Team也诚恳地承认Go 1.5将打破之前6个月一个版本的发布周期,这是因为Go 1.5变动太大,需要更多时间来准备这次发布(fix bug, Write doc)。关于Go 1.5的变化,之前Go Team staff在各种golang技术会议的slide  中暴露不少,包括:

- 编译器和运行时由C改为Go(及少量汇编语言)重写,实现了Go的self Bootstrap(自举)
- Garbage Collector优化,大幅降低GC延迟(Stop The World),实现Gc在单独的goroutine中与其他user goroutine并行运行。
- 标准库变更以及一些go tools的引入。

每项变动都会让gopher激动不已。但之前也只是激动,这次beta1出来后,我们可以实际体会一下这些变动带来的“快感”了。Go 1.5beta1的发布文档目前还不全,有些地方还有“待补充”字样,可能与最终go 1.5发布时的版本有一定差异,不过大体内容应该是固定不变的了。这篇文章就想和大家一起浅显地体验一下go 1.5都给gophers们带来了哪些变化吧。

一、语言

【map literal】

go 1.5依旧兼容Go 1 language specification,但修正了之前的一个“小疏忽”。

Go 1.4及之前版本中,我们只能这么来写代码:

//testmapliteral.go
package main

import (
    "fmt"
)

type Point struct {
    x int
    y int
}

func main() {
    var sl = []Point{{3, 4}, {5, 6}}
    var m = map[Point]string{
        Point{3,4}:"foo1",
        Point{5,6}:"foo2",
    }
    fmt.Println(sl)
    fmt.Println(m)
}

可以看到,对于Point这个struct来说,在初始化一个slice时,slice value literal中无需显式的带上元素类型Point,即

var sl = []Point{{3, 4}, {5, 6}}

而不是

var sl = []Point{Point{3, 4}, Point{5, 6}}

但当Point作为map类型的key类型时,初始化map时则要显式带上元素类型Point。Go team承认这是当初的一个疏忽,在本次Go 1.5中将该问题fix掉了。也就是说,下面的代码在Go 1.5中可以顺利编译通过:

func main() {
    var sl = []Point{{3, 4}, {5, 6}}
    var m = map[Point]string{
        {3,4}:"foo1",
        {5,6}:"foo2",
    }
    fmt.Println(sl)
    fmt.Println(m)
}

【GOMAXPROCS】

就像这次GopherCon2015上现任Google Go project Tech Lead的Russ Cox的开幕Keynote中所说的那样:Go目标定位于高度并发的云环境。Go 1.5中将标识并发系统线程个数的GOMAXPROCS的初始值由1改为了运行环境的CPU核数。

// testgomaxprocs.go
package main

import (
    "fmt"
    "runtime"
)

func main() {
    fmt.Println(runtime.GOMAXPROCS(-1))
    fmt.Println(runtime.NumGoroutine())
}

这个代码在Go 1.4下(Mac OS X 4核)运行结果是:

$go run testgomaxprocs.go
1
4

而在go 1.5beta1下,结果为:

$go run testgomaxprocs.go
4
4

二、编译

【简化跨平台编译】

1.5之前的版本要想实现跨平台编译,需要到$GOROOT/src下重新执行一遍make.bash,执行前设置好目标环境的环境变量(GOOS和 GOARCH),Go 1.5大大简化这个过程,使得跨平台编译几乎与普通编译一样简单。下面是一个简单的例子:

//testcrosscompile.go
package main

import (
    "fmt"
    "runtime"
)

func main() {
    fmt.Println(runtime.GOOS)
}

在我的Mac上,本地编译执行:
$go build -o testcrosscompile_darwin testcrosscompile.go
$testcrosscompile_darwin
darwin

跨平台编译linux amd64上的目标程序:

$GOOS=linux GOARCH=amd64 go build -o testcrosscompile_linux testcrosscompile.go

上传testcrosscompile_linux到ubuntu 14.04上执行:
$testcrosscompile_linux
linux

虽然从用户角度跨平台编译命令很简单,但事实是go替你做了很多事情,我们可以通过build -x -v选项来输出编译的详细过程,你会发现go会先进入到$GOROOT/src重新编译runtime.a以及一些平台相关的包。编译输出的信息 太多,这里就不贴出来了。但在1.5中这个过程非常快(10秒以内),与1.4之前版本的跨平台编译相比,完全不是一个级别,这也许就是编译器用Go重写完的好处之一吧。

除了直接使用go build,我们还可以使用go tool compile和go tool link来编译程序,实际上go build也是调用这两个工具完成编译过程的。

$go tool compile testcrosscompile.go
testcrosscompile.o
$go tool link testcrosscompile.o
a.out
$a.out
darwin

go 1.5移除了以前的6a,6l之类的编译连接工具,将这些工具整合到go tool中。并且go tool compile的输出默认改为.o文件,链接器输出默认改为了a.out。

【动态共享库】

个人不是很赞同Go语言增加对动态共享库的支持,.so和.dll这类十多年前的技术在如今内存、磁盘空间都“非常大”的前提下,似乎已经失去了以往的魅 力。并且动态共享库所带来的弊端:"DLL hell"会让程序后续的运维痛苦不已。Docker等轻量级容器的兴起,面向不变性的架构(immutable architecture)受到更多的关注。人们更多地会在container这一层进行操作,一个纯static link的应用在部署和维护方面将会有天然优势,.so只会增加复杂性。如果单纯从与c等其他语言互操作的角度,似乎用途也不会很广泛(但游戏或ui领域 可能会用到)。不过go 1.5还是增加了对动态链接库的支持,不过从go tool compile和link的doc说明来看,目前似乎还处于实验阶段。

既然go 1.5已经支持了shared library,我们就来实验一下。我们先规划一下测试repository的目录结构:

$GOPATH
    /src
        /testsharedlib
            /shlib
                – lib.go
        /app
            /main.go

lib.go中的代码很简单:

//lib.go
package shlib

import "fmt"

// export Method1
func Method1() {
    fmt.Println("shlib -Method1")
}

对于希望导出的方法,采用export标记。

我们来将这个lib.go编译成shared lib,注意目前似乎只有linux平台支持编译go shared library:

$ go build -buildmode=shared testsharedlib/shlib
# /tmp/go-build709704006/libtestsharedlib-shlib.so
warning: unable to find runtime/cgo.a

编译ok,那个warning是何含义不是很理解。

要想.so被其他go程序使用,需要将.so安装到相关目录下。我们install一下试试:

$ go install -buildmode=shared testsharedlib/shlib
multiple roots /home1/tonybai/test/go/go15/pkg/linux_amd64_dynlink & /home1/tonybai/.bin/go15beta1/go/pkg/linux_amd64_dynlink

go工具居然纠结了,不知道选择放在哪里,一个是$GOPATH/pkg/linux_amd64_dynlink,另外一个则是$GOROOT/pkg/linux_amd64_dynlink,我不清楚这是不是一个bug。

在Google了之后,我尝试了网上的一个解决方法,先编译出runtime的动态共享库:

$go install -buildmode=shared runtime sync/atomic

编译安装后,你就会在$GOROOT/pkg下面看到多出来一个目录:linux_amd64_dynlink。这个目录下的结构如下:

$ ls -R
.:
libruntime,sync-atomic.so  runtime  runtime.a  runtime.shlibname  sync

./runtime:
cgo.a  cgo.shlibname

./sync:
atomic.a  atomic.shlibname

这里看到了之前warning提到的runtime/cgo.a,我们再来重新执行一下build,看看能不能消除warning:

$ go build -buildmode=shared testsharedlib/shlib
# /tmp/go-build086398801/libtestsharedlib-shlib.so
/home1/tonybai/.bin/go15beta1/go/pkg/tool/linux_amd64/link: cannot implicitly include runtime/cgo in a shared library

这回连warnning都没有了,直接是一个error。这里提示:无法在一个共享库中隐式包含runtime/cgo。也就是说我们在构建 testshared/shlib这个动态共享库时,还需要显式的link到runtime/cgo,这里就需要另外一个命令行标志:- linkshared。我们再来试试:

$ go build  -linkshared -buildmode=shared testsharedlib/shlib

这回build成功!我们再来试试install:

$ go install  -linkshared -buildmode=shared testsharedlib/shlib

同样成功了。并且我们在$GOPATH/pkg/linux_amd64_dynlink下发现了共享库:

$ ls -R
.:
libtestsharedlib-shlib.so  testsharedlib

./testsharedlib:
shlib.a  shlib.shlibname

$ ldd libtestsharedlib-shlib.so
    linux-vdso.so.1 =>  (0x00007fff93983000)
    libruntime,sync-atomic.so => /home1/tonybai/.bin/go15beta1/go/pkg/linux_amd64_dynlink/libruntime,sync-atomic.so (0x00007fa150f1b000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa150b3f000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa150921000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fa1517a7000)

好了,既然共享库编译出来了。我们就来用一下这个共享库。

//app/main.go

package main

import (
    "testsharedlib/shlib"
)

func main() {
    shlib.Method1()
}

$ go build -linkshared main.go
$ ldd main
    linux-vdso.so.1 =>  (0x00007fff579f7000)
    libruntime,sync-atomic.so => /home1/tonybai/.bin/go15beta1/go/pkg/linux_amd64_dynlink/libruntime,sync-atomic.so (0x00007fa8d6df2000)
    libtestsharedlib-shlib.so => /home1/tonybai/test/go/go15/pkg/linux_amd64_dynlink/libtestsharedlib-shlib.so (0x00007fa8d6962000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa8d6586000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa8d6369000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fa8d71ef000)

$ main
shlib -Method1

编译执行ok。从输出结果来看,我们可以清晰看到main依赖的.so以及so的路径。我们再来试试,如果将testsharedlib源码目录移除后,是否还能编译ok:

$ go build -linkshared main.go
main.go:4:2: cannot find package "testsharedlib/shlib" in any of:
    /home1/tonybai/.bin/go15beta1/go/src/testsharedlib/shlib (from $GOROOT)
    /home1/tonybai/test/go/go15/src/testsharedlib/shlib (from $GOPATH)

go编译器无法找到shlib,也就说即便是动态链接,我们也要有动态共享库的源码,应用才能编译通过。

internal package

internal包不是go 1.5的原创,在go 1.4中就已经提出对internal package的支持了。但go 1.4发布时,internal package只能用于GOROOT下的go core核心包,用户层面GOPATH不支持internal package。按原计划,go 1.5中会将internal包机制工作范围全面扩大到所有repository的。我原以为1.5beta1以及将internal package机制生效了,但实际结果呢,我们来看看示例代码:

测试目录结构如下:

testinternal/src
    mypkg/
        /internal
            /foo
                foo.go
        /pkg1
            main.go

    otherpkg/
            main.go

按照internal包的原理,预期mypkg/pkg1下的代码是可以import "mypkg/internal/foo"的,otherpkg/下的代码是不能import "mypkg/internal/foo"的。

//foo.go
package foo

import "fmt"

func Foo() {
    fmt.Println("mypkg/internal/foo")
}

//main.go
package main

import "mypkg/internal/foo"

func main() {
    foo.Foo()
}

在pkg1和otherpkg下分别run main.go:

mypkg/pkg1$ go run main.go
mypkg/internal/foo

otherpkg$ go run main.go
mypkg/internal/foo

可以看到在otherpkg下执行时,并没有任何build error出现。看来internal机制并未生效。

我们再来试试import $GOROOT下某些internal包,看看是否可以成功:

package main

import (
    "fmt"
    "image/internal/imageutil"
)

func main() {
    fmt.Println(imageutil.DrawYCbCr)
}

我们run这个代码:

$go run main.go
0x6b7f0

同样没有出现任何error。

不是很清楚为何在1.5beta1中internal依旧无效。难道非要等最终1.5 release版么?

【Vendor】
Vendor机制是go team为了解决go第三方包依赖和管理而引入的实验性技术。你执行以下go env:

$go env
GOARCH="amd64"
GOBIN="/Users/tony/.bin/go15beta1/go/bin"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/tony/Test/GoToolsProjects"
GORACE=""
GOROOT="/Users/tony/.bin/go15beta1/go"
GOTOOLDIR="/Users/tony/.bin/go15beta1/go/pkg/tool/darwin_amd64"
GO15VENDOREXPERIMENT=""
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fno-common"
CXX="clang++"
CGO_ENABLED="1"

从结果中你会看到新增一个GO15VENDOREXPERIMENT变量,这个就是用来控制vendor机制是否开启的环境变量,默认不开启。若要开启,可以在环境变量文件中设置或export GO15VENDOREXPERIMENT=1临时设置。

vendor机制是在go 1.5beta1发布前不长时间临时决定加入到go 1.5中的,Russ Cox在Keith Rarick之前的一个Proposal的基础上重新做了设计而成,大致机制内容:

If there is a source directory d/vendor, then,
    when compiling a source file within the subtree rooted at d,
    import "p" is interpreted as import "d/vendor/p" if that exists.

    When there are multiple possible resolutions,
    the most specific (longest) path wins.

    The short form must always be used: no import path can
    contain “/vendor/” explicitly.

    Import comments are ignored in vendored packages.

下面我们来测试一下这个机制。首先我们临时开启vendor机制,export GO15VENDOREXPERIMENT=1,我们的测试目录规划如下:

testvendor
    vendor/
        tonybai.com/
            foolib/
                foo.go
    main/
        main.go

$GOPATH/src/tonybai.com/foolib/foo.go

//vendor/tonybai.com/foolib/foo.go
package foo

import "fmt"

func Hello() {
    fmt.Println("foo in vendor")
}

//$GOPATH/src/tonybai.com/foolib/foo.go
package foo

import "fmt"

func Hello() {
    fmt.Println("foo in gopath")
}

vendor和gopath下的foo.go稍有不同,主要在输出内容上,以方便后续区分。

现在我们编译执行main.go

//main/main.go
package main

import (
    "tonybai.com/foolib"
)

func main() {
    foo.Hello()
}

$go run main.go
foo in gopath

显然结果与预期不符,我们通过go list -json来看main.go的依赖包路径:

$go list -json
{
… …
    "Imports": [
        "tonybai.com/foolib"
    ],
    "Deps": [
        "errors",
        "fmt",
        "io",
        "math",
        "os",
        "reflect",
        "runtime",
        "strconv",
        "sync",
        "sync/atomic",
        "syscall",
        "time",
        "tonybai.com/foolib",
        "unicode/utf8",
        "unsafe"
    ]
}

可以看出并没有看到vendor路径,main.go import的是$GOPATH下的foo。难道是go 1.5beta1的Bug?于是翻看各种资料,最后在go 1.5beta1发布前最后提交的revison的commit log中得到了帮助:

cmd/go: disable vendoredImportPath for code outside $GOPATH
It was crashing.
This fixes the build for
GO15VENDOREXPERIMENT=1 go test -short runtime

Fixes #11416.

Change-Id: I74a9114cdd8ebafcc9d2a6f40bf500db19c6e825
Reviewed-on: https://go-review.googlesource.com/11964
Reviewed-by: Russ Cox <rsc@golang.org>

从commit log来看,大致意思是$GOPATH之外的代码的vendor机制被disable了(因为某个bug)。也就是说只有$GOPATH路径下的包在 import时才会考虑vendor路径,我们的代码的确没有在$GOPATH下,我们重新设置一下$GOPATH。

$export GOPATH=~/test/go/go15
[tony@TonydeMacBook-Air-2 ~/test/go/go15/src/testvendor/main]$go list -json
{
  
  … …
    "Imports": [
        "testvendor/vendor/tonybai.com/foolib"
    ],
    "Deps": [
        "errors",
        "fmt",
        "io",
        "math",
        "os",
        "reflect",
        "runtime",
        "strconv",
        "sync",
        "sync/atomic",
        "syscall",
        "testvendor/vendor/tonybai.com/foolib",
        "time",
        "unicode/utf8",
        "unsafe"
    ]
}

这回可以看到vendor机制生效了。执行main.go:

$go run main.go
foo in vendor

这回与预期结果就相符了。

前面提到,关闭GOPATH外的vendor机制是因为一个bug,相信go 1.5正式版发布时,这块会被enable的。

三、小结

Go 1.5还增加了很多工具,如trace,但因文档不全,尚不知如何使用。

Go 1.5标准库也有很多小的变化,这个只有到使用时才能具体深入了解。

Go 1.5更多是Go语言骨子里的变化,也就是runtime和编译器重写。语法由于兼容Go 1,所以基本frozen,因此从外在看来,基本没啥变动了。

至于Go 1.5的性能,官方的说法是,有的程序用1.5编译后会变得慢点,有的会快些。官方bench的结果是总体比1.4快上一些。但Go 1.5在性能方面主要是为了减少gc延迟,后续版本才会在性能上做进一步优化,优化空间还较大的,这次runtime、编译器由c变go,很多地方的go 代码并非是最优的,多是自动翻译,相信经过Go team的优化后,更idiomatic的Go code会让Go整体性能更为优异。

如发现本站页面被黑,比如:挂载广告、挖矿等恶意代码,请朋友们及时联系我。十分感谢! Go语言第一课 Go语言进阶课 Go语言精进之路1 Go语言精进之路2 Go语言第一课 Go语言编程指南
商务合作请联系bigwhite.cn AT aliyun.com

欢迎使用邮件订阅我的博客

输入邮箱订阅本站,只要有新文章发布,就会第一时间发送邮件通知你哦!

这里是 Tony Bai的个人Blog,欢迎访问、订阅和留言! 订阅Feed请点击上面图片

如果您觉得这里的文章对您有帮助,请扫描上方二维码进行捐赠 ,加油后的Tony Bai将会为您呈现更多精彩的文章,谢谢!

如果您希望通过微信捐赠,请用微信客户端扫描下方赞赏码:

如果您希望通过比特币或以太币捐赠,可以扫描下方二维码:

比特币:

以太币:

如果您喜欢通过微信浏览本站内容,可以扫描下方二维码,订阅本站官方微信订阅号“iamtonybai”;点击二维码,可直达本人官方微博主页^_^:
本站Powered by Digital Ocean VPS。
选择Digital Ocean VPS主机,即可获得10美元现金充值,可 免费使用两个月哟! 著名主机提供商Linode 10$优惠码:linode10,在 这里注册即可免费获 得。阿里云推荐码: 1WFZ0V立享9折!


View Tony Bai's profile on LinkedIn
DigitalOcean Referral Badge

文章

评论

  • 正在加载...

分类

标签

归档



View My Stats