Pprof | Tony Bai

标签 pprof 下的文章

Go 1.8中值得关注的几个变化

二月 3, 2017
3 条评论

在已经过去的2016年，Go语言继在2009年之后再次成为编程语言界的明星- 问鼎TIOBE 2016年度语言。这与Go team、Go community和全世界的Gophers的努力是分不开的。按计划在这个2月份，Go team将正式发布Go 1.8版本(截至目前，Go的最新版本是Go 1.8rc3)。在这里我们一起来看一下在Go 1.8版本中都有哪些值得Gopher们关注的变化。

一、语言（Language）

Go 1.8版本依旧坚守Go Team之前的承诺，即Go1兼容性：使用Go 1.7及以前版本编写的Go代码，理论上都可以通过Go 1.8进行编译并运行。因此在臆想中的Go 2.0变成现实之前，每个Go Release版本在语言这方面的“改变”都会是十分微小的。

1、仅tags不同的两个struct可以相互做显式类型转换

在Go 1.8版本以前，两个struct即便字段个数相同且每个字段类型均一样，但如果某个字段的tag描述不一样，这两个struct相互间也不能做显式类型转换，比如：

//go18-examples/language/structtag.go
package main

import "fmt"

type XmlEventRegRequest struct {
    AppID     string `xml:"appid"`
    NeedReply int    `xml:"Reply,omitempty"`
}

type JsonEventRegRequest struct {
    AppID     string `json:"appid"`
    NeedReply int    `json:"reply,omitempty"`
}

func convert(in *XmlEventRegRequest) *JsonEventRegRequest {
    out := &JsonEventRegRequest{}
    *out = (JsonEventRegRequest)(*in)
    return out
}

func main() {
    in := XmlEventRegRequest{
        AppID:     "wx12345678",
        NeedReply: 1,
    }
    out := convert(&in)
    fmt.Println(out)
}

采用Go 1.7.4版本go compiler进行编译，我们会得到如下错误输出：

$go build structtag.go
# command-line-arguments
./structtag.go:17: cannot convert *in (type XmlEventRegRequest) to type JsonEventRegRequest

但在Go 1.8中，gc将忽略tag值的不同，使得显式类型转换成为可能：

$go run structtag.go
&{wx12345678 1}

改变虽小，但带来的便利却不小，否则针对上面代码中的convert，我们只能做逐一字段赋值了。

2、浮点常量的指数部分至少支持16bits长

在Go 1.8版本之前的The Go Programming Language Specificaton中，关于浮点数常量的指数部分的描述如下：

Represent floating-point constants, including the parts of a complex constant, with a mantissa of at least 256 bits and a signed exponent of at least 32 bits.

在Go 1.8版本中，文档中对于浮点数常量指数部分的长度的实现的条件放宽了，由支持最少32bit，放宽到最少支持16bits：

Represent floating-point constants, including the parts of a complex constant, with a mantissa of at least 256 bits and a signed binary exponent of at least 16 bits.

但Go 1.8版本go compiler实际仍然支持至少32bits的指数部分长度，因此这个改变对现存的所有Go源码不会造成影响。

二、标准库（Standard Library）

Go号称是一门”Batteries Included”编程语言。“Batteries Included”指的就是Go语言强大的标准库。使用Go标准库，你可以完成绝大部分你想要的功能，而无需再使用第三方库。Go语言的每次版本更新，都会在标准库环节增加强大的功能、提升性能或是提高使用上的便利性。每次版本更新，标准库也是改动最大的部分。这次也不例外，我们逐一来看。

1、便于slice sort的sort.Slice函数

在Go 1.8之前我们要对一个slice进行sort，需要定义出实现了下面接口的slice type：

//$GOROOT/src/sort.go
... ...
type Interface interface {
    // Len is the number of elements in the collection.
    Len() int
    // Less reports whether the element with
    // index i should sort before the element with index j.
    Less(i, j int) bool
    // Swap swaps the elements with indexes i and j.
    Swap(i, j int)
}

标准库定义了一些应对常见类型slice的sort类型以及对应的函数：

StringSlice -> sort.Strings
IntSlice -> sort.Ints
Float64Slice -> sort.Float64s

但即便如此，对于用户定义的struct或其他自定义类型的slice进行排序仍需定义一个新type，比如下面这个例子中的TiboeIndexByRank：

//go18-examples/stdlib/sort/sortslice-before-go18.go
package main

import (
    "fmt"
    "sort"
)

type Lang struct {
    Name string
    Rank int
}

type TiboeIndexByRank []Lang

func (l TiboeIndexByRank) Len() int           { return len(l) }
func (l TiboeIndexByRank) Less(i, j int) bool { return l[i].Rank < l[j].Rank }
func (l TiboeIndexByRank) Swap(i, j int)      { l[i], l[j] = l[j], l[i] }

func main() {
    langs := []Lang{
        {"rust", 2},
        {"go", 1},
        {"swift", 3},
    }
    sort.Sort(TiboeIndexByRank(langs))
    fmt.Printf("%v\n", langs)
}

$go run sortslice-before-go18.go
[{go 1} {rust 2} {swift 3}]

从上面的例子可以看到，我们要对[]Lang这个slice进行排序，我们就需要为之定义一个专门用于排序的类型：这里是TiboeIndexByRank，并让其实现sort.Interface接口。使用过sort包的gophers们可能都意识到了，我们在为新的slice type实现sort.Interface接口时，那三个方法的Body几乎每次都是一样的。为了使得gopher们在排序slice时编码更为简化和便捷，减少copy&paste，Go 1.8为slice type新增了三个函数：Slice、SliceStable和SliceIsSorted。我们重新用Go 1.8的sort.Slice函数实现上面例子中的排序需求，代码如下：

//go18-examples/stdlib/sort/sortslice-in-go18.go
package main

import (
    "fmt"
    "sort"
)

type Lang struct {
    Name string
    Rank int
}

func main() {
    langs := []Lang{
        {"rust", 2},
        {"go", 1},
        {"swift", 3},
    }
    sort.Slice(langs, func(i, j int) bool { return langs[i].Rank < langs[j].Rank })
    fmt.Printf("%v\n", langs)
}

$go run sortslice-in-go18.go
[{go 1} {rust 2} {swift 3}]

实现sort，需要三要素：Len、Swap和Less。在1.8之前，我们通过实现sort.Interface实现了这三个要素；而在1.8版本里，Slice函数通过reflect获取到swap和length，通过结合闭包实现的less参数让Less要素也具备了。我们从下面sort.Slice的源码可以看出这一点：

// $GOROOT/src/sort/sort.go
... ...
func Slice(slice interface{}, less func(i, j int) bool) {
    rv := reflect.ValueOf(slice)
    swap := reflect.Swapper(slice)
    length := rv.Len()
    quickSort_func(lessSwap{less, swap}, 0, length, maxDepth(length))
}

2、支持HTTP/2 Push

继在Go 1.6版本全面支持HTTP/2之后，Go 1.8又新增了对HTTP/2 Push的支持。HTTP/2是在HTTPS的基础上的下一代HTTP协议，虽然当前HTTPS的应用尚不是十分广泛。而HTTP/2 Push是HTTP/2的一个重要特性，无疑其提出的初衷也仍然是为了改善网络传输性能，提高Web服务的用户侧体验。这里我们可以借用知名网络提供商Cloudflare blog上的一幅示意图来诠释HTTP/2 Push究竟是什么：

img{512x368}

从上图中，我们可以看到：当Browser向Server发起Get page.html请求后，在同一条TCP Connection上，Server主动将style.css和image.png两个资源文件推送(Push)给了Browser。这是由于Server端启用了HTTP/2 Push机制，并预测判断Browser很可能会在接下来发起Get style.css和image.png两个资源的请求。这是一种典型的：“你可能会需要，但即使你不要，我也推给你”的处世哲学^0^。这种机制虽然在一定程度上能改善网络传输性能（减少Client发起Get的次数），但也可能造成带宽的浪费，因为这些主动推送给Browser的资源很可能是Browser所不需要的或是已经在Browser cache中存在的资源。

接下来，我们来看看Go 1.8是如何在net/http包中提供对HTTP/2 Push的支持的。由于HTTP/2是基于HTTPS的，因此我们先使用generate_cert.go生成程序所需的私钥和证书：

// 在go18-examples/stdlib/http2-push目录下，执行：

$go run $GOROOT/src/crypto/tls/generate_cert.go --host 127.0.0.1
2017/01/27 10:58:01 written cert.pem
2017/01/27 10:58:01 written key.pem

支持HTTP/2 Push的server端代码如下：

// go18-examples/stdlib/http2-push/server.go

package main

import (
    "fmt"
    "log"
    "net/http"
)

const mainJS = `document.write('Hello World!');`

func main() {
    http.Handle("/static/", http.StripPrefix("/static/", http.FileServer(http.Dir("./static"))))

    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        if r.URL.Path != "/" {
            http.NotFound(w, r)
            return
        }
        pusher, ok := w.(http.Pusher)
        if ok {
            // If it's a HTTP/2 Server.
            // Push is supported. Try pushing rather than waiting for the browser.
            if err := pusher.Push("/static/img/gopherizeme.png", nil); err != nil {
                log.Printf("Failed to push: %v", err)
            }
        }
        fmt.Fprintf(w, `<html>
<head>
<title>Hello Go 1.8</title>
</head>
<body>
    <img src="/static/img/gopherizeme.png"></img>
</body>
</html>
`)
    })
    log.Fatal(http.ListenAndServeTLS(":8080", "./cert.pem", "./key.pem", nil))
}

运行这段代码，打开Google Chrome浏览器，输入：https://127.0.0.1:8080，忽略浏览器的访问非受信网站的警告，继续浏览你就能看到下面的页面（这里打开了Chrome的“检查”功能）：

img{512x368}

从示例图中的“检查”窗口，我们可以看到gopherizeme.png这个image资源就是Server主动推送给客户端的，这样浏览器在Get /后无需再发起一次Get /static/img/gopherizeme.png的请求了。

而这一切的背后，其实是HTTP/2的ResponseWriter实现了Go 1.8新增的http.Pusher interface：

// $GOROOT/src/net/http/http.go

// Pusher is the interface implemented by ResponseWriters that support
// HTTP/2 server push. For more background, see
// https://tools.ietf.org/html/rfc7540#section-8.2.
type Pusher interface {
    ... ...
    Push(target string, opts *PushOptions) error
}

3、支持HTTP Server优雅退出

Go 1.8中增加对HTTP Server优雅退出(gracefullly exit)的支持，对应的新增方法为：

func (srv *Server) Shutdown(ctx context.Context) error

和server.Close在调用时瞬间关闭所有active的Listeners和所有状态为New、Active或idle的connections不同，server.Shutdown首先关闭所有active Listeners和所有处于idle状态的Connections，然后无限等待那些处于active状态的connection变为idle状态后，关闭它们并server退出。如果有一个connection依然处于active状态，那么server将一直block在那里。因此Shutdown接受一个context参数，调用者可以通过context传入一个Shutdown等待的超时时间。一旦超时，Shutdown将直接返回。对于仍然处理active状态的Connection，就任其自生自灭（通常是进程退出后，自动关闭）。通过Shutdown的源码我们也可以看出大致的原理：

// $GOROOT/src/net/http/server.go
... ...
func (srv *Server) Shutdown(ctx context.Context) error {
    atomic.AddInt32(&srv.inShutdown, 1)
    defer atomic.AddInt32(&srv.inShutdown, -1)

    srv.mu.Lock()
    lnerr := srv.closeListenersLocked()
    srv.closeDoneChanLocked()
    srv.mu.Unlock()

    ticker := time.NewTicker(shutdownPollInterval)
    defer ticker.Stop()
    for {
        if srv.closeIdleConns() {
            return lnerr
        }
        select {
        case <-ctx.Done():
            return ctx.Err()
        case <-ticker.C:
        }
    }
}

我们来编写一个例子：

// go18-examples/stdlib/graceful/server.go

import (
    "context"
    "io"
    "log"
    "net/http"
    "os"
    "os/signal"
    "time"
)

func main() {
    exit := make(chan os.Signal)
    signal.Notify(exit, os.Interrupt)

    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        log.Println("Handle a new request:", *r)
        time.Sleep(10 * time.Second)
        log.Println("Handle the request ok!")
        io.WriteString(w, "Finished!")
    })

    srv := &http.Server{
        Addr:    ":8080",
        Handler: http.DefaultServeMux,
    }

    go func() {
        if err := srv.ListenAndServe(); err != nil {
            log.Printf("listen: %s\n", err)
        }
    }()

    <-exit // wait for SIGINT
    log.Println("Shutting down server...")

    // Wait no longer than 30 seconds before halting
    ctx, _ := context.WithTimeout(context.Background(), 30*time.Second)
    err := srv.Shutdown(ctx)

    log.Println("Server gracefully stopped:", err)
}

在上述例子中，我们通过设置Linux Signal的处理函数来拦截Linux Interrupt信号并处理。我们通过context给Shutdown传入30s的超时参数，这样Shutdown在退出之前会给各个Active connections 30s的退出时间。下面分为几种情况run一下这个例子：

a) 当前无active connections

在这种情况下，我们run上述demo，ctrl + C后，上述demo直接退出：

$go run server.go
^C2017/02/02 15:13:16 Shutting down server...
2017/02/02 15:13:16 Server gracefully stopped: <nil>

b) 当前有未处理完的active connections，ctx 超时

为了模拟这一情况，我们修改一下参数。让每个request handler的sleep时间为30s，而Shutdown ctx的超时时间改为10s。我们再来运行这个demo，并通过curl命令连接该server(curl -v http://localhost:8080)，待连接成功后，再立即ctrl+c停止Server，待约10s后，我们得到如下日志：

$go run server.go
2017/02/02 15:15:57 Handle a new request: {GET / HTTP/1.1 1 1 map[User-Agent:[curl/7.30.0] Accept:[*/*]] {} <nil> 0 [] false localhost:8080 map[] map[] <nil> map[] [::1]:52590 / <nil> <nil> <nil> 0xc420016700}
^C2017/02/02 15:15:59 Shutting down server...
2017/02/02 15:15:59 listen: http: Server closed
2017/02/02 15:16:09 Server gracefully stopped: context deadline exceeded

c) 当前有未处理完的active connections，ctx超时之前，这些connections处理ok了

我们将上述demo的参数还原，即request handler sleep 10s，而Shutdown ctx超时时间为30s，运行这个Demo后，通过curl命令连接该server，待连接成功后，再立即ctrl+c停止Server。等待约10s后，我们得到如下日志：

$go run server.go
2017/02/02 15:19:56 Handle a new request: {GET / HTTP/1.1 1 1 map[User-Agent:[curl/7.30.0] Accept:[*/*]] {} <nil> 0 [] false localhost:8080 map[] map[] <nil> map[] [::1]:52605 / <nil> <nil> <nil> 0xc420078500}
^C2017/02/02 15:19:59 Shutting down server...
2017/02/02 15:19:59 listen: http: Server closed
2017/02/02 15:20:06 Handle the request ok!
2017/02/02 15:20:06 Server gracefully stopped: <nil>

可以看出，当ctx超时之前，request处理ok，connection关闭。这时不再有active connection和idle connection了，Shutdown成功返回，server立即退出。

4、Mutex Contention Profiling

Go 1.8中runtime新增了对Mutex和RWMutex的profiling(剖析)支持。golang team成员，负责从go user角度去看待go team的work是否满足用户需求的Jaana B. Dogan在其个人站点上写了一篇介绍mutex profiling的文章，这里借用一下其中的Demo：

//go18-examples/stdlib/mutexprofile/mutexprofile.go

package main

import (
    "net/http"
    _ "net/http/pprof"
    "runtime"
    "sync"
)

func main() {
    var mu sync.Mutex
    var items = make(map[int]struct{})

    runtime.SetMutexProfileFraction(5)
    for i := 0; i < 1000*1000; i++ {
        go func(i int) {
            mu.Lock()
            defer mu.Unlock()
            items[i] = struct{}{}
        }(i)
    }

    http.ListenAndServe(":8888", nil)
}

运行该程序后，在浏览器中输入：http://localhost:8888/debug/pprof/mutex，你就可以看到有关该程序的mutex profile（耐心等待一小会儿，因为数据的采样需要一点点时间^0^）：

--- mutex:
cycles/second=2000012082
sampling period=5
378803564 776 @ 0x106c4d1 0x13112ab 0x1059991

构建该程序，然后通过下面命令：

go build mutexprofile.go
./mutexprofile
go tool pprof mutexprofile http://localhost:8888/debug/pprof/mutex?debug=1

可以进入pprof交互界面，这个是所有用过go pprof工具的gophers们所熟知的：

$go tool pprof mutexprofile http://localhost:8888/debug/pprof/mutex?debug=1
Fetching profile from http://localhost:8888/debug/pprof/mutex?debug=1
Saved profile in /Users/tony/pprof/pprof.mutexprofile.localhost:8888.contentions.delay.003.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) list
Total: 12.98s
ROUTINE ======================== main.main.func1 in /Users/tony/Test/GoToolsProjects/src/github.com/bigwhite/experiments/go18-examples/stdlib/mutexprofile/mutexprofile.go
         0     12.98s (flat, cum)   100% of Total
         .          .     17:            mu.Lock()
         .          .     18:            defer mu.Unlock()
         .          .     19:            items[i] = struct{}{}
         .          .     20:        }(i)
         .          .     21:    }
         .     12.98s     22:
         .          .     23:    http.ListenAndServe(":8888", nil)
         .          .     24:}
ROUTINE ======================== runtime.goexit in /Users/tony/.bin/go18rc2/src/runtime/asm_amd64.s
         0     12.98s (flat, cum)   100% of Total
         .          .   2192:    RET
         .          .   2193:
         .          .   2194:// The top-most function running on a goroutine
         .          .   2195:// returns to goexit+PCQuantum.
         .          .   2196:TEXT runtime·goexit(SB),NOSPLIT,$0-0
         .     12.98s   2197:    BYTE    $0x90    // NOP
         .          .   2198:    CALL    runtime·goexit1(SB)    // does not return
         .          .   2199:    // traceback from goexit1 must hit code range of goexit
         .          .   2200:    BYTE    $0x90    // NOP
         .          .   2201:
         .          .   2202:TEXT runtime·prefetcht0(SB),NOSPLIT,$0-8
ROUTINE ======================== sync.(*Mutex).Unlock in /Users/tony/.bin/go18rc2/src/sync/mutex.go
    12.98s     12.98s (flat, cum)   100% of Total
         .          .    121:            return
         .          .    122:        }
         .          .    123:        // Grab the right to wake someone.
         .          .    124:        new = (old - 1<<mutexWaiterShift) | mutexWoken
         .          .    125:        if atomic.CompareAndSwapInt32(&m.state, old, new) {
    12.98s     12.98s    126:            runtime_Semrelease(&m.sema)
         .          .    127:            return
         .          .    128:        }
         .          .    129:        old = m.state
         .          .    130:    }
         .          .    131:}
(pprof) top10
1.29s of 1.29s total (  100%)
      flat  flat%   sum%        cum   cum%
     1.29s   100%   100%      1.29s   100%  sync.(*Mutex).Unlock
         0     0%   100%      1.29s   100%  main.main.func1
         0     0%   100%      1.29s   100%  runtime.goexit

go pprof的另外一个用法就是在go test时，mutexprofile同样支持这一点：

go test -mutexprofile=mutex.out
go tool pprof <test.binary> mutex.out

5、其他重要改动

Go 1.8标准库还有两个值得注意的改动，一个是：crypto/tls，另一个是database/sql。

在HTTPS逐渐成为主流的今天，各个编程语言对HTTPS连接的底层加密协议- TLS协议支持的成熟度日益被人们所关注。Go 1.8给广大Gophers们带来了一个更为成熟、性能更好、更为安全的TLS实现，同时也增加了对一些TLS领域最新协议规范的支持。无论你是实现TLS Server端，还是Client端，都将从中获益。

Go 1.8在crypto/tls中提供了基于ChaCha20-Poly1305的cipher suite，其中ChaCha20是一种stream cipher算法；而Poly1305则是一种code authenticator算法。它们共同组成一个TLS suite。使用这个suite，将使得你的web service或站点具有更好的mobile浏览性能，这是因为传统的AES算法实现在没有硬件支持的情况下cost更多。因此，如果你在使用tls时没有指定cipher suite，那么Go 1.8会根据硬件支持情况（是否有AES的硬件支持），来决定是使用ChaCha20还是AES算法。除此之外，crypto/tls还实现了更为安全和高效的X25519密钥交换算法等。

Go 1.4以来，database/sql包的变化很小，但对于该包的feature需求却在与日俱增。终于在Go 1.8这个dev cycle中，govendor的作者Daniel Theophanes在Brad Fitzpatrick的“指导”下，开始对database/sql进行“大规模”的改善。在Go 1.8中，借助于context.Context的帮助，database/sql增加了Cancelable Queries、SQL Database Type、Multiple Result Sets、Database ping、Named Parameters和Transaction Isolation等新Features。在GopherAcademy的Advent 2016系列文章中，我们可以看到Daniel Theophanes亲手撰写的文章，文章针对Go 1.8 database/sql包新增的features作了详细解释。

三、Go工具链（Go Toolchain）

在目前市面上的主流编程语言中，如果说Go的工具链在成熟度和完善度方面排第二，那没有语言敢称自己是第一吧^_^。Go 1.8在Go Toolchain上继续做着持续地改进，下面我们来逐一看看。

1、Plugins

Go在1.8版本中提供了对Plugin的初步支持，并且这种支持仅限于Linux。plugin这个术语在不同语言、不同情景上下文中有着不同的含义，那么什么是Go Plugin呢？

Go Plugin为Go程序提供了一种在运行时加载代码、执行代码以改变运行行为的能力，它实质上由两个部分组成：

go build -buildmode=plugin xx.go 构建xx.so plugin文件
利用plugin包在运行时动态加载xx.so并执行xx.so中的代码

C程序员看到这里肯定会有似曾相识的赶脚，因为这和传统的动态共享库在概念上十分类似：

go build -buildmode=plugin xx.go 类似于 gcc -o xx.so -shared xx.c
go plugin包 类似于 linux上的dlopen/dlsym或windows上的LoadLibrary

我们来看一个例子！我们先来建立一个名为foo.so的go plugin：

//go18-examples/gotoolchain/plugins/foo.go

package main

import "fmt"

var V int
var v int

func init() {
        V = 17
        v = 23
        fmt.Println("init function in plugin foo")
}

func Foo(in string) string {
        return "Hello, " + in
}

func foo(in string) string {
        return "hello, " + in
}

通过go build命令将foo.go编译为foo.so：

# go build -buildmode=plugin foo.go
# ldd foo.so
    linux-vdso.so.1 =>  (0x00007ffe47f67000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f9d06f4b000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9d06b82000)
    /lib64/ld-linux-x86-64.so.2 (0x000055c69cfcf000)

# nm foo.so|grep Foo
0000000000150010 t local.plugin/unnamed-69e21ef38d16a3fee5eb7b9e515c27a389067879.Foo
0000000000150010 T plugin/unnamed-69e21ef38d16a3fee5eb7b9e515c27a389067879.Foo
000000000036a0dc D type..namedata.Foo.

我们看到go plugin的.so文件就是一个标准的Linux动态共享库文件，我们可以通过nm命令查看.so中定义的各种符号。接下来，我们来load这个.so，并查找并调用相应符号：

//go18-examples/gotoolchain/plugins/main.go

package main

import (
        "fmt"
        "plugin"
        "time"
)

func init() {
        fmt.Println("init in main program")
}

func loadPlugin(i int) {
        fmt.Println("load plugin #", i)
        var err error
        fmt.Println("before opening the foo.so")

        p, err := plugin.Open("foo.so")
        if err != nil {
                fmt.Println("plugin Open error:", err)
                return
        }
        fmt.Println("after opening the foo.so")

        f, err := p.Lookup("Foo")
        if err != nil {
                fmt.Println("plugin Lookup symbol Foo error:", err)
        } else {
                fmt.Println(f.(func(string) string)("gophers"))
        }

        f, err = p.Lookup("foo")
        if err != nil {
                fmt.Println("plugin Lookup symbol foo error:", err)
        } else {
                fmt.Println(f.(func(string) string)("gophers"))
        }

        v, err := p.Lookup("V")
        if err != nil {
                fmt.Println("plugin Lookup symbol V error:", err)
        } else {
                fmt.Println(*v.(*int))
        }

        v, err = p.Lookup("v")
        if err != nil {
                fmt.Println("plugin Lookup symbol v error:", err)
        } else {
                fmt.Println(*v.(*int))
        }
        fmt.Println("load plugin #", i, "done")
}

func main() {
        var counter int = 1
        for {
                loadPlugin(counter)
                counter++
                time.Sleep(time.Second * 30)
        }
}

执行这个程序：

# go run main.go
init in main program
load plugin # 1
before opening the foo.so
init function in plugin foo
after opening the foo.so
Hello, gophers
plugin Lookup symbol foo error: plugin: symbol foo not found in plugin plugin/unnamed-69e21ef38d16a3fee5eb7b9e515c27a389067879
17
plugin Lookup symbol v error: plugin: symbol v not found in plugin plugin/unnamed-69e21ef38d16a3fee5eb7b9e515c27a389067879
load plugin # 1 done

load plugin # 2
before opening the foo.so
after opening the foo.so
Hello, gophers
plugin Lookup symbol foo error: plugin: symbol foo not found in plugin plugin/unnamed-69e21ef38d16a3fee5eb7b9e515c27a389067879
17
plugin Lookup symbol v error: plugin: symbol v not found in plugin plugin/unnamed-69e21ef38d16a3fee5eb7b9e515c27a389067879
load plugin # 2 done
... ...

我们来分析一下这个执行结果！

a) foo.go中的代码也包含在main package下，但只是当foo.so被第一次加载时，foo.go中的init函数才会被执行；
b) foo.go中的exported function和variable才能被Lookup到，如Foo、V；查找unexported的变量和函数符号将得到error信息，如：“symbol foo not found in plugin”；
c) Lookup返回的是plugin.Symbol类型的值，plugin.Symbol是一个指向plugin中变量或函数的指针；
d) foo.go中的init在后续重复加载中并不会被执行。

注意：plugin.Lookup是goroutine-safe的。

在golang-dev group上，有人曾问过：buildmode=c-shared和buildmode=plugin有何差别？Go team member给出的答案如下：

The difference is mainly on the program that loads the shared library.

For c-shared, we can't assume anything about the host, so the c-shared dynamic library must be self-contained, but for plugin, we know the host program will be a Go program built with the same runtime version, so the toolchain can omit at least the runtime package from the dynamic library, and possibly more if it's certain that some packages are linked into the host program. (This optimization hasn't be implemented yet, but we need the distinction to enable this kind of optimization in the future.)

2、默认的GOPATH

Go team在Go 1.8以及后续版本会更加注重”Go语言的亲民性”，即进一步降低Go的入门使用门槛，让大家更加Happy的使用Go。对于一个Go初学者来说，一上来就进行GOPATH的设置很可能让其感到有些迷惑，甚至有挫折感，就像建立Java开发环境需要设置JAVA_HOME和CLASSPATH一样。Gophers们期望能做到Go的安装即可用。因此Go 1.8就在这方面做出了改进：支持默认的GOPATH。

在Linux/Mac系下，默认的GOPATH为$HOME/go，在Windows下，GOPATH默认路径为：%USERPROFILE%/go。你可以通过下面命令查看到这一结果：

$ go env
GOARCH="amd64"
GOBIN="/home/tonybai/.bin/go18rc3/bin"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/tonybai/go"
GORACE=""
GOROOT="/home/tonybai/.bin/go18rc3"
GOTOOLDIR="/home/tonybai/.bin/go18rc3/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build313929093=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

BTW，在Linux/Mac下，默认的GOROOT为/usr/local/go，如果你的Go环境没有安装到这个路径下，在没有设置$GOROOT环境变量的情况下，当你执行go subcommand相关命令时，你会看到如下错误：

$go env
go: cannot find GOROOT directory: /usr/local/go

3、其他变化

Go 1.8删除了Go 1.7中增加的用于关闭ssa新后端的”-ssa=0” compiler flag，并且将ssa backend扩展到所有architecture中，对ssa后端也进一步做了优化。与此同时，为了将来进一步的性能优化打基础，Go 1.8还引入了一个新编译器前端，当然这对于普通Gopher的Go使用并没有什么影响。

Go 1.8还新增go bug子命令，该命令会自动使用默认浏览器打开new issue页面，并将采集到的issue提交者的系统信息填入issue模板，以帮助gopher提交符合要求的go issue，下面是go bug打开的issue page的图示：

img{512x368}

四、性能变化（Performance Improvement）

无论是Gotoolchain、还是runtime（包括GC）的性能，一直都是Go team重点关注的领域。本次Go 1.8依旧给广大Gophers们带来了性能提升方面的惊喜。

首先，Go SSA后端扩展到所有architecture和新编译器前端的引入，将会给除X86-64之外架构上运行的Go代码带来约20-30%的运行性能提升。对于x86-64，虽然Go 1.7就已经开启了SSA，但Go 1.8对SSA做了进一步优化，x86-64上的Go代码依旧可能会得到10%以内的性能提升。

其次，Go 1.8持续对Go compiler和linker做性能优化，和1.7相比，平均编译链接的性能提升幅度在15%左右。虽然依旧没有达到Go 1.4的性能水准。不过，优化依旧在持续进行中，目标的达成是可期的。

再次，GC在低延迟方面的优化给了我们最大的惊喜。在Go 1.8中，由于消除了GC的“stop-the-world stack re-scanning”，使得GC STW(stop-the-world)的时间通常低于100微秒，甚至经常低于10微秒。当然这或多或少是以牺牲“吞吐”作为代价的。因此在Go 1.9中，GC的改进将持续进行，会在吞吐和低延迟上做一个很好的平衡。

最后，defer的性能消耗在Go 1.8中下降了一半，与此下降幅度相同的还有通过cgo在go中调用C代码的性能消耗。

五、小结兼参考资料

Go 1.8的变化不仅仅是以上这些，更多变化以及详细的描述请参考下面参考资料中的“Go 1.8 Release Notes”：

以上demo中的代码在这里可以找到。

Go程序调试、分析与优化

八月 25, 2015
18 条评论

Brad Fitzpatrick在YAPC Asia 2015（Yet Another Perl Conference）上做了一次技术分享，题为："Go Debugging, Profiling, and Optimization"。个人感觉这篇分享中价值最大的是BradFitz现场演示的一个有关如何对Go程序进行调试、分析和优化的 Demo，Brad将demo上传到了他个人在github.com的repo中，但不知为何，repo中的代码似乎与repo里talk.md中的说明不甚一致(btw，我并没有看video)。于是打算在这里按照Brad的思路重新走一遍demo的演示流程(所有演示代码在这里可以下载到)。

一、实验环境

$uname -a
Linux pc-tony 3.13.0-61-generic #100~precise1-Ubuntu SMP Wed Jul 29 12:06:40 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

注意:在Darwin或Windows下，profile的结果可能与这里有很大不同(甚至完全不一样的输出和瓶颈热点)。

$go version
go version go1.5 linux/amd64

$ go env
GOARCH="amd64"
GOBIN="/home1/tonybai/.bin/go15/bin"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home1/tonybai/proj/GoProjects"
GORACE=""
GOROOT="/home1/tonybai/.bin/go15"
GOTOOLDIR="/home1/tonybai/.bin/go15/pkg/tool/linux_amd64"
GO15VENDOREXPERIMENT="1"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"

代码基于Brad的github.com/bradfitz/talk-yapc-asia-2015。

二、待优化程序(step0)

待优化程序，也就是原始程序，我们放在step0中：

//go-debug-profile-optimization/step0/demo.go

package main

import (
    "fmt"
    "log"
    "net/http"
    "regexp"
)

var visitors int

func handleHi(w http.ResponseWriter, r *http.Request) {
    if match, _ := regexp.MatchString(`^\w*$`, r.FormValue("color")); !match {
        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
        return
    }
    visitors++
    w.Header().Set("Content-Type", "text/html; charset=utf-8")
    w.Write([]byte("<h1 style='color: " + r.FormValue("color") +
        "'>Welcome!</h1>You are visitor number " + fmt.Sprint(visitors) + "!"))
}

func main() {
    log.Printf("Starting on port 8080")
    http.HandleFunc("/hi", handleHi)
    log.Fatal(http.ListenAndServe("127.0.0.1:8080", nil))
}

$go run demo.go
2015/08/25 09:42:35 Starting on port 8080

在浏览器输入：http://localhost:8080/hi

一切顺利的话，页面会显示：

Welcome!

You are visitor number 1!

三、添加测试代码

按照talk.md中的说明，brad repo中demo中根本没有测试代码(commit 2427d0faa12ed1fb05f1e6a1e69307c11259c2b2)。

于是我根据作者的意图，新增了demo_test.go，采用TestHandleHi_Recorder和TestHandleHi_TestServer对HandleHi进行测试：

//go-debug-profile-optimization/step0/demo_test.go
package main

import (
    "bufio"
    "net/http"
    "net/http/httptest"
    "strings"
    "testing"
)

func TestHandleHi_Recorder(t *testing.T) {
    rw := httptest.NewRecorder()
    handleHi(rw, req(t, "GET / HTTP/1.0\r\n\r\n"))
    if !strings.Contains(rw.Body.String(), "visitor number") {
        t.Errorf("Unexpected output: %s", rw.Body)
    }
}

func req(t *testing.T, v string) *http.Request {
    req, err := http.ReadRequest(bufio.NewReader(strings.NewReader(v)))
    if err != nil {
        t.Fatal(err)
    }
    return req
}

func TestHandleHi_TestServer(t *testing.T) {
    ts := httptest.NewServer(http.HandlerFunc(handleHi))
    defer ts.Close()
    res, err := http.Get(ts.URL)
    if err != nil {
        t.Error(err)
        return
    }
    if g, w := res.Header.Get("Content-Type"), "text/html; charset=utf-8"; g != w {
        t.Errorf("Content-Type = %q; want %q", g, w)
    }
    slurp, err := ioutil.ReadAll(res.Body)
    defer res.Body.Close()
    if err != nil {
        t.Error(err)
        return
    }
    t.Logf("Got: %s", slurp)
}

$ go test -v
=== RUN   TestHandleHi_Recorder
— PASS: TestHandleHi_Recorder (0.00s)
=== RUN   TestHandleHi_TestServer
— PASS: TestHandleHi_TestServer (0.00s)
    demo_test.go:45: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 2!
PASS
ok     _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step0    0.007s

测试通过！

至此，step0使命结束。

四、Race Detector(竞态分析）

并发设计使得程序可以更好更有效的利用现代处理器的多核心。但并发设计很容易引入竞态，导致严重bug。Go程序中竞态就是当多个goroutine并发访问某共享数据且未使用同步机制时，且至少一个goroutine进行了写操作。不过go工具自带race分析功能。在分析优化step0中demo代码前，我们先要保证demo代码中不存在竞态。

工具的使用方法就是在go test后加上-race标志，在step0目录下：

$ go test -v -race
=== RUN   TestHandleHi_Recorder
— PASS: TestHandleHi_Recorder (0.00s)
=== RUN   TestHandleHi_TestServer
— PASS: TestHandleHi_TestServer (0.00s)
    demo_test.go:45: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 2!
PASS
ok     _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step0    1.012s

-race通过做运行时分析做竞态分析，虽然不存在误报，但却存在实际有竞态，但工具没发现的情况。接下来我们改造一下测试代码，让test并发起来：

向step1(copy自step0)中demo_test.go中添加一个test method:

//go-debug-profile-optimization/step1/demo_test.go
… …
func TestHandleHi_TestServer_Parallel(t *testing.T) {
    ts := httptest.NewServer(http.HandlerFunc(handleHi))
    defer ts.Close()
    var wg sync.WaitGroup
    for i := 0; i < 2; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            res, err := http.Get(ts.URL)
            if err != nil {
                t.Error(err)
                return
            }
            if g, w := res.Header.Get("Content-Type"), "text/html; charset=utf-8"; g != w {
                t.Errorf("Content-Type = %q; want %q", g, w)
            }
            slurp, err := ioutil.ReadAll(res.Body)
            defer res.Body.Close()
            if err != nil {
                t.Error(err)
                return
            }
            t.Logf("Got: %s", slurp)
        }()
    }
    wg.Wait()
}
… …

执行竞态test：

$ go test -v -race
=== RUN   TestHandleHi_Recorder
— PASS: TestHandleHi_Recorder (0.00s)
=== RUN   TestHandleHi_TestServer
— PASS: TestHandleHi_TestServer (0.00s)
    demo_test.go:46: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 2!
=== RUN   TestHandleHi_TestServer_Parallel
==================
WARNING: DATA RACE
Read by goroutine 22:
_/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1.handleHi()
      /home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1/demo.go:17 +0xf5
net/http.HandlerFunc.ServeHTTP()
      /tmp/workdir/go/src/net/http/server.go:1422 +0×47
net/http/httptest.(*waitGroupHandler).ServeHTTP()
      /tmp/workdir/go/src/net/http/httptest/server.go:200 +0xfe
net/http.serverHandler.ServeHTTP()
      /tmp/workdir/go/src/net/http/server.go:1862 +0×206
net/http.(*conn).serve()
      /tmp/workdir/go/src/net/http/server.go:1361 +0x117c

Previous write by goroutine 25:
_/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1.handleHi()
      /home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1/demo.go:17 +0×111
net/http.HandlerFunc.ServeHTTP()
      /tmp/workdir/go/src/net/http/server.go:1422 +0×47
net/http/httptest.(*waitGroupHandler).ServeHTTP()
      /tmp/workdir/go/src/net/http/httptest/server.go:200 +0xfe
net/http.serverHandler.ServeHTTP()
      /tmp/workdir/go/src/net/http/server.go:1862 +0×206
net/http.(*conn).serve()
      /tmp/workdir/go/src/net/http/server.go:1361 +0x117c

Goroutine 22 (running) created at:
net/http.(*Server).Serve()
/tmp/workdir/go/src/net/http/server.go:1910 +0×464

Goroutine 25 (running) created at:
net/http.(*Server).Serve()
      /tmp/workdir/go/src/net/http/server.go:1910 +0×464
==================
— PASS: TestHandleHi_TestServer_Parallel (0.00s)
    demo_test.go:71: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 3!
    demo_test.go:71: Got: <h1 style='color: '>Welcome!</h1>You are visitor number 4!
PASS
Found 1 data race(s)
exit status 66
FAIL    _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step1    1.023s

工具发现demo.go第17行：
visitors++
是一处潜在的竞态条件。

visitors被多个goroutine访问但未采用同步机制。

既然发现了竞态条件，我们就需要fix it。有多种fix方法可选：

1、使用channel
2、使用Mutex
3、使用atomic

Brad使用了atomic：

//go-debug-profile-optimization/step1/demo.go
… …
var visitors int64 // must be accessed atomically

func handleHi(w http.ResponseWriter, r *http.Request) {
    if match, _ := regexp.MatchString(`^\w*$`, r.FormValue("color")); !match {
        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
        return
    }
    visitNum := atomic.AddInt64(&visitors, 1)
    w.Header().Set("Content-Type", "text/html; charset=utf-8")
    w.Write([]byte("<h1 style='color: " + r.FormValue("color") +
        "'>Welcome!</h1>You are visitor number " + fmt.Sprint(visitNum) + "!"))
}
… …

再做一次测试：

竞态条件被消除了！

至此，step1结束了使命！

五、CPU Profiling

要做CPU Profilling，我们需要benchmark数据，Go test提供benchmark test功能，我们只要写对应的Benchmark测试方法即可：

//go-debug-profile-optimization/step2/demo_test.go
… …
func BenchmarkHi(b *testing.B) {
b.ReportAllocs()

    req, err := http.ReadRequest(bufio.NewReader(strings.NewReader("GET / HTTP/1.0\r\n\r\n")))
    if err != nil {
        b.Fatal(err)
    }

    for i := 0; i < b.N; i++ {
        rw := httptest.NewRecorder()
        handleHi(rw, req)
    }
}
… …

$ go test -v -run=^$ -bench=.
PASS
BenchmarkHi-4 100000 14808 ns/op 4961 B/op 81 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step2 1.648s

开始CPU Profiling：

$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -cpuprofile=prof.cpu
PASS
BenchmarkHi-4 200000 14679 ns/op 4961 B/op 81 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step2 3.096s

执行完benchmark test后，step2目录下出现两个新文件prof.cpu和step2.test，这两个文件将作为后续go tool pprof的输入：
$ls
demo.go demo_test.go prof.cpu step2.test*

使用go profile viewer工具：

$ go tool pprof step2.test prof.cpu
Entering interactive mode (type "help" for commands)
(pprof) top
1830ms of 3560ms total (51.40%)
Dropped 53 nodes (cum <= 17.80ms)
Showing top 10 nodes out of 133 (cum >= 1290ms)
      flat flat%   sum%        cum   cum%
     480ms 13.48% 13.48%      980ms 27.53% runtime.growslice
     360ms 10.11% 23.60%      700ms 19.66% runtime.mallocgc
     170ms 4.78% 28.37%      170ms 4.78% runtime.heapBitsSetType
     170ms 4.78% 33.15%      200ms 5.62% runtime.scanblock
     120ms 3.37% 36.52%     1100ms 30.90% regexp.makeOnePass.func2
     120ms 3.37% 39.89%      550ms 15.45% runtime.newarray
     110ms 3.09% 42.98%      300ms 8.43% runtime.makeslice
     110ms 3.09% 46.07%      220ms 6.18% runtime.mapassign1
     100ms 2.81% 48.88%      100ms 2.81% runtime.futex
      90ms 2.53% 51.40%     1290ms 36.24% regexp.makeOnePass

(pprof) top –cum
0.18s of 3.56s total ( 5.06%)
Dropped 53 nodes (cum <= 0.02s)
Showing top 10 nodes out of 133 (cum >= 1.29s)
      flat flat%   sum%        cum   cum%
         0     0%     0%      3.26s 91.57% runtime.goexit
     0.02s 0.56% 0.56%      2.87s 80.62% BenchmarkHi
         0     0% 0.56%      2.87s 80.62% testing.(*B).launch
         0     0% 0.56%      2.87s 80.62% testing.(*B).runN
     0.03s 0.84% 1.40%      2.80s 78.65% step2.handleHi
     0.01s 0.28% 1.69%      2.46s 69.10% regexp.MatchString
         0     0% 1.69%      2.24s 62.92% regexp.Compile
         0     0% 1.69%      2.24s 62.92% regexp.compile
     0.03s 0.84% 2.53%      1.56s 43.82% regexp.compileOnePass
     0.09s 2.53% 5.06%      1.29s 36.24% regexp.makeOnePass

(pprof) list handleHi
Total: 3.56s
ROUTINE ======================== handleHi in go-debug-profile-optimization/step2/demo.go
      30ms      2.80s (flat, cum) 78.65% of Total
         .          .      9:)
         .          .     10:
         .          .     11:var visitors int64 // must be accessed atomically
         .          .     12:
         .          .     13:func handleHi(w http.ResponseWriter, r *http.Request) {
         .      2.47s     14:    if match, _ := regexp.MatchString(`^\w*$`, r.FormValue("color")); !match {
         .          .     15:        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
         .          .     16:        return
         .          .     17:    }
      10ms       20ms     18:    visitNum := atomic.AddInt64(&visitors, 1)
      10ms       90ms     19:    w.Header().Set("Content-Type", "text/html; charset=utf-8")
      10ms       20ms     20:    w.Write([]byte("<h1 style='color: " + r.FormValue("color") +
         .      200ms     21:        "'>Welcome!</h1>You are visitor number " + fmt.Sprint(visitNum) + "!"))
         .          .     22:}
         .          .     23:
         .          .     24:func main() {
         .          .     25:    log.Printf("Starting on port 8080")
         .          .     26:    http.HandleFunc("/hi", handleHi)
(pprof)

从top –cum来看，handleHi消耗cpu较大，而handleHi中，又是MatchString耗时最长。

六、第一次优化

前面已经发现MatchString较为耗时，优化手段：让正则式仅编译一次(step3)：

// go-debug-profile-optimization/step3/demo.go

… …
var visitors int64 // must be accessed atomically

var rxOptionalID = regexp.MustCompile(`^\d*$`)

func handleHi(w http.ResponseWriter, r *http.Request) {
    if !rxOptionalID.MatchString(r.FormValue("color")) {
        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
        return
    }

    visitNum := atomic.AddInt64(&visitors, 1)
    w.Header().Set("Content-Type", "text/html; charset=utf-8")
    w.Write([]byte("<h1 style='color: " + r.FormValue("color") +
        "'>Welcome!</h1>You are visitor number " + fmt.Sprint(visitNum) + "!"))
}
… …

运行一下bench：

$ go test -bench=.
PASS
BenchmarkHi-4 1000000 1678 ns/op 720 B/op 9 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step3 1.710s

对比之前在step2中运行的bench结果：

$ go test -v -run=^$ -bench=.
PASS
BenchmarkHi-4 100000 14808 ns/op 4961 B/op 81 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step2 1.648s

耗时相同，但优化后的bench运行了100w次，而之前的Bench运行10w次，相当于性能提高10倍。

再看看cpu prof结果：

$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=3s -cpuprofile=prof.cpu
PASS
BenchmarkHi-4 3000000 1640 ns/op 720 B/op 9 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step3 6.540s

$ go tool pprof step3.test prof.cpu
Entering interactive mode (type "help" for commands)
(pprof) top –cum 30
2.74s of 8.07s total (33.95%)
Dropped 72 nodes (cum <= 0.04s)
Showing top 30 nodes out of 103 (cum >= 0.56s)
      flat flat%   sum%        cum   cum%
         0     0%     0%      7.17s 88.85% runtime.goexit
     0.05s 0.62% 0.62%      6.21s 76.95% step3.BenchmarkHi
         0     0% 0.62%      6.21s 76.95% testing.(*B).launch
         0     0% 0.62%      6.21s 76.95% testing.(*B).runN
     0.06s 0.74% 1.36%      4.96s 61.46% step3.handleHi
     1.15s 14.25% 15.61%      2.35s 29.12% runtime.mallocgc
     0.02s 0.25% 15.86%      1.63s 20.20% runtime.systemstack
         0     0% 15.86%      1.53s 18.96% net/http.Header.Set
     0.06s 0.74% 16.60%      1.53s 18.96% net/textproto.MIMEHeader.Set
     0.09s 1.12% 17.72%      1.22s 15.12% runtime.newobject
     0.05s 0.62% 18.34%      1.09s 13.51% fmt.Sprint
     0.20s 2.48% 20.82%         1s 12.39% runtime.mapassign1
         0     0% 20.82%      0.81s 10.04% runtime.mcall
     0.01s 0.12% 20.94%      0.79s 9.79% runtime.schedule
     0.05s 0.62% 21.56%      0.76s 9.42% regexp.(*Regexp).MatchString
     0.09s 1.12% 22.68%      0.71s 8.80% regexp.(*Regexp).doExecute
     0.01s 0.12% 22.80%      0.71s 8.80% runtime.concatstring5
     0.20s 2.48% 25.28%      0.70s 8.67% runtime.concatstrings
         0     0% 25.28%      0.69s 8.55% runtime.gosweepone
     0.05s 0.62% 25.90%      0.69s 8.55% runtime.mSpan_Sweep
         0     0% 25.90%      0.68s 8.43% runtime.bgsweep
     0.04s   0.5% 26.39%      0.68s 8.43% runtime.newarray
     0.01s 0.12% 26.52%      0.67s 8.30% runtime.goschedImpl
     0.01s 0.12% 26.64%      0.65s 8.05% runtime.gosched_m
         0     0% 26.64%      0.65s 8.05% runtime.gosweepone.func1
     0.01s 0.12% 26.77%      0.65s 8.05% runtime.sweepone
     0.28s 3.47% 30.24%      0.62s 7.68% runtime.makemap
     0.17s 2.11% 32.34%      0.59s 7.31% runtime.heapBitsSweepSpan
     0.02s 0.25% 32.59%      0.58s 7.19% fmt.(*pp).doPrint
     0.11s 1.36% 33.95%      0.56s 6.94% fmt.(*pp).printArg

handleHi耗时有一定下降。

七、Mem Profiling

在step3目录下执行bench，获取mem分配数据：

$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -memprofile=prof.mem
PASS
BenchmarkHi-4 2000000 1657 ns/op 720 B/op 9 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step3 5.002s

使用pprof工具分析mem：

$ go tool pprof –alloc_space step3.test prof.mem
Entering interactive mode (type "help" for commands)
(pprof) top
2065.91MB of 2067.41MB total (99.93%)
Dropped 14 nodes (cum <= 10.34MB)
      flat flat%   sum%        cum   cum%
1076.35MB 52.06% 52.06% 1076.35MB 52.06% net/textproto.MIMEHeader.Set
535.54MB 25.90% 77.97% 2066.91MB   100% step3.BenchmarkHi
406.52MB 19.66% 97.63% 1531.37MB 74.07% step3.handleHi
   47.50MB 2.30% 99.93%    48.50MB 2.35% fmt.Sprint
         0     0% 99.93% 1076.35MB 52.06% net/http.Header.Set
         0     0% 99.93% 2066.91MB   100% runtime.goexit
         0     0% 99.93% 2066.91MB   100% testing.(*B).launch
         0     0% 99.93% 2066.91MB   100% testing.(*B).runN

(pprof) top -cum
2065.91MB of 2067.41MB total (99.93%)
Dropped 14 nodes (cum <= 10.34MB)
      flat flat%   sum%        cum   cum%
535.54MB 25.90% 25.90% 2066.91MB   100% step3.BenchmarkHi
         0     0% 25.90% 2066.91MB   100% runtime.goexit
         0     0% 25.90% 2066.91MB   100% testing.(*B).launch
         0     0% 25.90% 2066.91MB   100% testing.(*B).runN
406.52MB 19.66% 45.57% 1531.37MB 74.07% step3.handleHi
         0     0% 45.57% 1076.35MB 52.06% net/http.Header.Set
1076.35MB 52.06% 97.63% 1076.35MB 52.06% net/textproto.MIMEHeader.Set
   47.50MB 2.30% 99.93%    48.50MB 2.35% fmt.Sprint

(pprof) list handleHi
Total: 2.02GB
     ROUTINE =========step3.handleHi in step3/demo.go
406.52MB     1.50GB (flat, cum) 74.07% of Total
         .          .     17:        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
         .          .     18:        return
         .          .     19:    }
         .          .     20:
         .          .     21:    visitNum := atomic.AddInt64(&visitors, 1)
         .     1.05GB     22:    w.Header().Set("Content-Type", "text/html; charset=utf-8")
         .          .     23:    w.Write([]byte("<h1 style='color: " + r.FormValue("color") +
406.52MB   455.02MB     24:        "'>Welcome!</h1>You are visitor number " + fmt.Sprint(visitNum) + "!"))
         .          .     25:}
         .          .     26:
         .          .     27:func main() {
         .          .     28:    log.Printf("Starting on port 8080")
         .          .     29:    http.HandleFunc("/hi", handleHi)
(pprof)

可以看到handleHi22、23两行占用了较多内存。

八、第二次优化

第二次优化的方法：
1、删除w.Header().Set这行
2、用fmt.Fprintf替代w.Write

第二次优化的代码在step4目录中：

// go-debug-profile-optimization/step4/demo.go
… …
func handleHi(w http.ResponseWriter, r *http.Request) {
    if !rxOptionalID.MatchString(r.FormValue("color")) {
        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
        return
    }

visitNum := atomic.AddInt64(&visitors, 1)
fmt.Fprintf(w, "<html><h1 stype='color: \"%s\"'>Welcome!</h1>You are visitor number %d!", r.FormValue("color"), visitNum)
}
… …

执行一遍pprof:

$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -memprofile=prof.mem
PASS
BenchmarkHi-4 2000000 1428 ns/op 304 B/op 6 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step4 4.343s

$ go tool pprof –alloc_space step4.test prof.mem
Entering interactive mode (type "help" for commands)
(pprof) top
868.06MB of 868.56MB total (99.94%)
Dropped 5 nodes (cum <= 4.34MB)
      flat flat%   sum%        cum   cum%
559.54MB 64.42% 64.42%   868.06MB 99.94% step4.BenchmarkHi
219.52MB 25.27% 89.70%   219.52MB 25.27% bytes.makeSlice
      89MB 10.25% 99.94%   308.52MB 35.52% step4.handleHi
         0     0% 99.94%   219.52MB 25.27% bytes.(*Buffer).Write
         0     0% 99.94%   219.52MB 25.27% bytes.(*Buffer).grow
         0     0% 99.94%   219.52MB 25.27% fmt.Fprintf
         0     0% 99.94%   219.52MB 25.27% net/http/httptest.(*ResponseRecorder).Write
         0     0% 99.94%   868.06MB 99.94% runtime.goexit
         0     0% 99.94%   868.06MB 99.94% testing.(*B).launch
         0     0% 99.94%   868.06MB 99.94% testing.(*B).runN
(pprof) top –cum
868.06MB of 868.56MB total (99.94%)
Dropped 5 nodes (cum <= 4.34MB)
      flat flat%   sum%        cum   cum%
559.54MB 64.42% 64.42%   868.06MB 99.94% step4.BenchmarkHi
         0     0% 64.42%   868.06MB 99.94% runtime.goexit
         0     0% 64.42%   868.06MB 99.94% testing.(*B).launch
         0     0% 64.42%   868.06MB 99.94% testing.(*B).runN
      89MB 10.25% 74.67%   308.52MB 35.52% step4.handleHi
         0     0% 74.67%   219.52MB 25.27% bytes.(*Buffer).Write
         0     0% 74.67%   219.52MB 25.27% bytes.(*Buffer).grow
219.52MB 25.27% 99.94%   219.52MB 25.27% bytes.makeSlice
         0     0% 99.94%   219.52MB 25.27% fmt.Fprintf
         0     0% 99.94%   219.52MB 25.27% net/http/httptest.(*ResponseRecorder).Write
(pprof) list handleHi
Total: 868.56MB
ROUTINE ============ step4.handleHi in step4/demo.go
      89MB   308.52MB (flat, cum) 35.52% of Total
         .          .     17:        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
         .          .     18:        return
         .          .     19:    }
         .          .     20:
         .          .     21:    visitNum := atomic.AddInt64(&visitors, 1)
      89MB   308.52MB     22:    fmt.Fprintf(w, "<html><h1 stype='color: \"%s\"'>Welcome!</h1>You are visitor number %d!", r.FormValue("color"), visitNum)
         .          .     23:}
         .          .     24:
         .          .     25:func main() {
         .          .     26:    log.Printf("Starting on port 8080")
         .          .     27:    http.HandleFunc("/hi", handleHi)
(pprof)

可以看出内存占用大幅减少。

九、Benchcmp

golang.org/x/tools中有一个工具：benchcmp，可以给出两次bench的结果对比。

github.com/golang/tools是golang.org/x/tools的一个镜像。安装benchcmp步骤：

1、go get -u github.com/golang/tools
2、mkdir -p $GOPATH/src/golang.org/x
3、mv $GOPATH/src/github.com/golang/tools $GOPATH/src/golang.org/x
4、go install golang.org/x/tools/cmd/benchcmp

我们分别在step2、step3和step4下执行如下命令：

$ go-debug-profile-optimization/step2$ go test -bench=. -memprofile=prof.mem | tee mem.2
PASS
BenchmarkHi-4 100000 14786 ns/op 4961 B/op 81 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step2 1.644s

go-debug-profile-optimization/step3$ go test -bench=. -memprofile=prof.mem | tee mem.3
PASS
BenchmarkHi-4 1000000 1662 ns/op 720 B/op 9 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step3 1.694s

go-debug-profile-optimization/step4$ go test -bench=. -memprofile=prof.mem | tee mem.4
PASS
BenchmarkHi-4 1000000 1428 ns/op 304 B/op 6 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step4 1.456s

利用benchcmp工具对比结果（benchcmp old new）：

$ benchcmp step3/mem.3 step4/mem.4
benchmark old ns/op new ns/op delta
BenchmarkHi-4 1662 1428 -14.08%

benchmark old allocs new allocs delta
BenchmarkHi-4 9 6 -33.33%

benchmark old bytes new bytes delta
BenchmarkHi-4 720 304 -57.78%

$ benchcmp step2/mem.2 step4/mem.4
benchmark old ns/op new ns/op delta
BenchmarkHi-4 14786 1428 -90.34%

benchmark old allocs new allocs delta
BenchmarkHi-4 81 6 -92.59%

benchmark old bytes new bytes delta
BenchmarkHi-4 4961 304 -93.87%

可以看出优化后，内存分配大幅减少，gc的时间也随之减少。

十、内存来自哪

我们在BenchmarkHi中清理每次handleHi执行后的内存：

//step5/demo_test.go
… …
func BenchmarkHi(b *testing.B) {
b.ReportAllocs()

    req, err := http.ReadRequest(bufio.NewReader(strings.NewReader("GET / HTTP/1.0\r\n\r\n")))
    if err != nil {
        b.Fatal(err)
    }

    for i := 0; i < b.N; i++ {
        rw := httptest.NewRecorder()
        handleHi(rw, req)
        reset(rw)
    }
}

func reset(rw *httptest.ResponseRecorder) {
    m := rw.HeaderMap
    for k := range m {
        delete(m, k)
    }
    body := rw.Body
    body.Reset()
    *rw = httptest.ResponseRecorder{
        Body:      body,
        HeaderMap: m,
    }
}

… …
$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -memprofile=prof.mem
PASS
BenchmarkHi-4 2000000 1518 ns/op 304 B/op 6 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step5 4.577s

$ go tool pprof –alloc_space step5.test prof.mem
Entering interactive mode (type "help" for commands)
(pprof) top –cum 10
290.52MB of 291.52MB total (99.66%)
Dropped 14 nodes (cum <= 1.46MB)
      flat flat%   sum%        cum   cum%
         0     0%     0%   291.02MB 99.83% runtime.goexit
179.01MB 61.41% 61.41%   290.52MB 99.66% step5.BenchmarkHi
         0     0% 61.41%   290.52MB 99.66% testing.(*B).launch
         0     0% 61.41%   290.52MB 99.66% testing.(*B).runN
   26.50MB 9.09% 70.50%   111.51MB 38.25% step5.handleHi
         0     0% 70.50%    85.01MB 29.16% bytes.(*Buffer).Write
         0     0% 70.50%    85.01MB 29.16% bytes.(*Buffer).grow
   85.01MB 29.16% 99.66%    85.01MB 29.16% bytes.makeSlice
         0     0% 99.66%    85.01MB 29.16% fmt.Fprintf
         0     0% 99.66%    85.01MB 29.16% net/http/httptest.(*ResponseRecorder).Write
(pprof) list handleHi
Total: 291.52MB
ROUTINE ======================== _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step5.handleHi in /home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step5/demo.go
   26.50MB   111.51MB (flat, cum) 38.25% of Total
         .          .     17:        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
         .          .     18:        return
         .          .     19:    }
         .          .     20:
         .          .     21:    visitNum := atomic.AddInt64(&visitors, 1)
   26.50MB   111.51MB     22:    fmt.Fprintf(w, "<html><h1 stype='color: \"%s\"'>Welcome!</h1>You are visitor number %d!", r.FormValue("color"), visitNum)
         .          .     23:}
         .          .     24:
         .          .     25:func main() {
         .          .     26:    log.Printf("Starting on port 8080")
         .          .     27:    http.HandleFunc("/hi", handleHi)
(pprof)

内存从300MB降到111MB。内存来自哪？看到list handleHi，fmt.Fprintf分配了111.51MB。

我们来看这一行代码：
fmt.Fprintf(w, "<h1 style='color: %s'>Welcome!</h1>You are visitor number %d!",
r.FormValue("color"), num)

fmt.Fprintf的manual：

$ go doc fmt.Fprintf
func Fprintf(w io.Writer, format string, a …interface{}) (n int, err error)

Fprintf formats according to a format specifier and writes to w. It returns
the number of bytes written and any write error encountered.

这里回顾一下Go type在runtime中的内存占用：

A Go interface is 2 words of memory: (type, pointer).
A Go string is 2 words of memory: (base pointer, length)
A Go slice is 3 words of memory: (base pointer, length, capacity)

每次调用fmt.Fprintf，参数以value值形式传入函数时，程序就要为每个变参分配一个占用16bytes的empty interface，然后用传入的类型初始化该interface value。这就是这块累计分配内存较多的原因。

十一、消除所有内存分配

下面的优化代码可能在实际中并不需要，但一旦真的成为瓶颈，可以这么做：

//go-debug-profile-optimization/step6/demo.go
… …
var bufPool = sync.Pool{
    New: func() interface{} {
        return new(bytes.Buffer)
    },
}

    visitNum := atomic.AddInt64(&visitors, 1)
    buf := bufPool.Get().(*bytes.Buffer)
    defer bufPool.Put(buf)
    buf.Reset()
    buf.WriteString("<h1 style='color: ")
    buf.WriteString(r.FormValue("color"))
    buf.WriteString("'>Welcome!</h1>You are visitor number ")
    b := strconv.AppendInt(buf.Bytes(), int64(visitNum), 10)
    b = append(b, '!')
    w.Write(b)
}
… …

$ go test -v -run=^$ -bench=^BenchmarkHi$ -benchtime=2s -memprofile=prof.mem
PASS
BenchmarkHi-4 5000000 780 ns/op 192 B/op 3 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step6 4.709s

go tool pprof –alloc_space step6.test prof.mem
Entering interactive mode (type "help" for commands)
(pprof) top –cum 10
1.07GB of 1.07GB total ( 100%)
Dropped 5 nodes (cum <= 0.01GB)
      flat flat%   sum%        cum   cum%
    1.07GB   100%   100%     1.07GB   100% step6.BenchmarkHi
         0     0%   100%     1.07GB   100% runtime.goexit
         0     0%   100%     1.07GB   100% testing.(*B).launch
         0     0%   100%     1.07GB   100% testing.(*B).runN

$ go test -bench=. -memprofile=prof.mem | tee mem.6
PASS
BenchmarkHi-4 2000000 790 ns/op 192 B/op 3 allocs/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step6 2.401s

$ benchcmp step5/mem.5 step6/mem.6
benchmark old ns/op new ns/op delta
BenchmarkHi-4 1513 790 -47.79%

benchmark old allocs new allocs delta
BenchmarkHi-4 6 3 -50.00%

benchmark old bytes new bytes delta
BenchmarkHi-4 304 192 -36.84%

可以看到handleHi已经不在top列表中了。benchcmp结果也显示内存分配又有大幅下降！

十二、竞争(Contention)优化

为handleHi编写一个Parallel benchmark test:

//go-debug-profile-optimization/step7/demo_test.go
… …
func BenchmarkHiParallel(b *testing.B) {
    r, err := http.ReadRequest(bufio.NewReader(strings.NewReader("GET / HTTP/1.0\r\n\r\n")))
    if err != nil {
        b.Fatal(err)
    }

    b.RunParallel(func(pb *testing.PB) {
        rw := httptest.NewRecorder()
        for pb.Next() {
            handleHi(rw, r)
            reset(rw)
        }
    })
}
… …

执行测试，并分析结果:

$ go test -bench=Parallel -blockprofile=prof.block
PASS
BenchmarkHiParallel-4 5000000 305 ns/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step7 1.947s

$ go tool pprof step7.test prof.block
Entering interactive mode (type "help" for commands)
(pprof) top –cum 10
3.68s of 3.72s total (98.82%)
Dropped 29 nodes (cum <= 0.02s)
Showing top 10 nodes out of 20 (cum >= 1.84s)
      flat flat%   sum%        cum   cum%
         0     0%     0%      3.72s   100% runtime.goexit
     1.84s 49.46% 49.46%      1.84s 49.46% runtime.chanrecv1
         0     0% 49.46%      1.84s 49.46% main.main
         0     0% 49.46%      1.84s 49.46% runtime.main
         0     0% 49.46%      1.84s 49.46% testing.(*M).Run
         0     0% 49.46%      1.84s 49.43% testing.(*B).run
         0     0% 49.46%      1.84s 49.43% testing.RunBenchmarks
         0     0% 49.46%      1.84s 49.36% step7.BenchmarkHiParallel
     1.84s 49.36% 98.82%      1.84s 49.36% sync.(*WaitGroup).Wait
         0     0% 98.82%      1.84s 49.36% testing.(*B).RunParallel
(pprof) list BenchmarkHiParallel
Total: 3.72s
ROUTINE ====== step7.BenchmarkHiParallel in step7/demo_test.go
         0      1.84s (flat, cum) 49.36% of Total
         .          .    113:        rw := httptest.NewRecorder()
         .          .    114:        for pb.Next() {
         .          .    115:            handleHi(rw, r)
         .          .    116:            reset(rw)
         .          .    117:        }
         .      1.84s    118:    })
         .          .    119:}
ROUTINE ==== step7.BenchmarkHiParallel.func1 in step7/demo_test.go
         0    43.02ms (flat, cum) 1.16% of Total
         .          .    110:    }
         .          .    111:
         .          .    112:    b.RunParallel(func(pb *testing.PB) {
         .          .    113:        rw := httptest.NewRecorder()
         .          .    114:        for pb.Next() {
         .    43.02ms    115:            handleHi(rw, r)
         .          .    116:            reset(rw)
         .          .    117:        }
         .          .    118:    })
         .          .    119:}
(pprof) list handleHi
Total: 3.72s
ROUTINE =====step7.handleHi in step7/demo.go
         0    43.02ms (flat, cum) 1.16% of Total
         .          .     18:        return new(bytes.Buffer)
         .          .     19:    },
         .          .     20:}
         .          .     21:
         .          .     22:func handleHi(w http.ResponseWriter, r *http.Request) {
         .    43.01ms     23:    if !rxOptionalID.MatchString(r.FormValue("color")) {
         .          .     24:        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
         .          .     25:        return
         .          .     26:    }
         .          .     27:
         .          .     28:    visitNum := atomic.AddInt64(&visitors, 1)
         .     2.50us     29:    buf := bufPool.Get().(*bytes.Buffer)
         .          .     30:    defer bufPool.Put(buf)
         .          .     31:    buf.Reset()
         .          .     32:    buf.WriteString("<h1 style='color: ")
         .          .     33:    buf.WriteString(r.FormValue("color"))
         .          .     34:    buf.WriteString("'>Welcome!</h1>You are visitor number ")
(pprof)

handleHi中MatchString这块是一个焦点，这里耗时较多。

优化方法（step8）：

//go-debug-profile-optimization/step8/demo.go
… …
var colorRxPool = sync.Pool{
New: func() interface{} { return regexp.MustCompile(`\w*$`) },
}

func handleHi(w http.ResponseWriter, r *http.Request) {
    if !colorRxPool.Get().(*regexp.Regexp).MatchString(r.FormValue("color")) {
        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
        return
    }

测试执行与分析：

$ go test -bench=Parallel -blockprofile=prof.block
PASS
BenchmarkHiParallel-4 100000 19190 ns/op
ok _/home1/tonybai/proj/opensource/github/experiments/go-debug-profile-optimization/step8 2.219s

$ go tool pprof step8.test prof.block
Entering interactive mode (type "help" for commands)
(pprof) top –cum 10
4.22s of 4.23s total (99.69%)
Dropped 28 nodes (cum <= 0.02s)
Showing top 10 nodes out of 12 (cum >= 2.11s)
      flat flat%   sum%        cum   cum%
         0     0%     0%      4.23s   100% runtime.goexit
     2.11s 49.90% 49.90%      2.11s 49.90% runtime.chanrecv1
         0     0% 49.90%      2.11s 49.89% main.main
         0     0% 49.90%      2.11s 49.89% runtime.main
         0     0% 49.90%      2.11s 49.89% testing.(*M).Run
         0     0% 49.90%      2.11s 49.86% testing.(*B).run
         0     0% 49.90%      2.11s 49.86% testing.RunBenchmarks
         0     0% 49.90%      2.11s 49.79% step8.BenchmarkHiParallel
     2.11s 49.79% 99.69%      2.11s 49.79% sync.(*WaitGroup).Wait
         0     0% 99.69%      2.11s 49.79% testing.(*B).RunParallel
(pprof) list BenchmarkHiParallel
Total: 4.23s
ROUTINE ======step8.BenchmarkHiParallel in step8/demo_test.go
         0      2.11s (flat, cum) 49.79% of Total
         .          .    113:        rw := httptest.NewRecorder()
         .          .    114:        for pb.Next() {
         .          .    115:            handleHi(rw, r)
         .          .    116:            reset(rw)
         .          .    117:        }
         .      2.11s    118:    })
         .          .    119:}
ROUTINE ======step8.BenchmarkHiParallel.func1 in step8/demo_test.go
         0    11.68ms (flat, cum) 0.28% of Total
         .          .    110:    }
         .          .    111:
         .          .    112:    b.RunParallel(func(pb *testing.PB) {
         .          .    113:        rw := httptest.NewRecorder()
         .          .    114:        for pb.Next() {
         .    11.68ms    115:            handleHi(rw, r)
         .          .    116:            reset(rw)
         .          .    117:        }
         .          .    118:    })
         .          .    119:}
(pprof) list handleHi
Total: 4.23s
ROUTINE ======step8.handleHi in step8/demo.go
         0    11.68ms (flat, cum) 0.28% of Total
         .          .     21:var colorRxPool = sync.Pool{
         .          .     22:    New: func() interface{} { return regexp.MustCompile(`\w*$`) },
         .          .     23:}
         .          .     24:
         .          .     25:func handleHi(w http.ResponseWriter, r *http.Request) {
         .     5.66ms     26:    if !colorRxPool.Get().(*regexp.Regexp).MatchString(r.FormValue("color")) {
         .          .     27:        http.Error(w, "Optional color is invalid", http.StatusBadRequest)
         .          .     28:        return
         .          .     29:    }
         .          .     30:
         .          .     31:    visitNum := atomic.AddInt64(&visitors, 1)
         .     6.02ms     32:    buf := bufPool.Get().(*bytes.Buffer)
         .          .     33:    defer bufPool.Put(buf)
         .          .     34:    buf.Reset()
         .          .     35:    buf.WriteString("<h1 style='color: ")
         .          .     36:    buf.WriteString(r.FormValue("color"))
         .          .     37:    buf.WriteString("'>Welcome!</h1>You are visitor number ")
(pprof)