GOPATH | Tony Bai

标签 GOPATH 下的文章

Go 1.15中值得关注的几个变化

十月 11, 2020
0 条评论

img{512x368}

Go 1.15版本在8月12日就正式发布了，给我的感觉就是发布的挺痛快^_^。这种感觉来自与之前版本发布时间的对比：Go 1.13版本发布于当年的9月4日，更早的Go 1.11版本发布于当年的8月25日。

不过这个时间恰与我家二宝出生和老婆月子时期有重叠，每天照顾孩子团团转的我实在抽不出时间研究Go 1.15的变化:(。如今，我逐渐从照顾二宝的工作中脱离出来^_^，于是“Go x.xx版本值得关注的几个变化”系列将继续下去。关注Go语言的演变对掌握和精通Go语言大有裨益，凡是致力于成为一名高级Gopher的读者都应该密切关注Go的演进。
截至写稿时，Go 1.15最新版是Go 1.15.2。Go 1.15一如既往的遵循Go1兼容性承诺。语言规范方面没有任何变化。可以说这是一个“面子”上变化较小的一个版本，但“里子”的变化还是不少的，在本文中我就和各位读者一起就重要变化逐一了解一下。

一. 平台移植性

Go 1.15版本不再对darwin/386和darwin/arm两个32位平台提供支持了。Go 1.15及以后版本仅对darwin/amd64和darwin/arm64版本提供支持。并且不再对macOS 10.12版本之前的版本提供支持。

Go 1.14版本中，Go编译器在被传入-race和-msan的情况下，默认会执行-d=checkptr，即对unsafe.Pointer的使用进行合法性检查。-d=checkptr主要检查两项内容：

当将unsafe.Pointer转型为*T时，T的内存对齐系数不能高于原地址的；
做完指针算术后，转换后的unsafe.Pointer仍应指向原先Go堆对象

但在Go 1.14中，这个检查并不适用于Windows操作系统。Go 1.15中增加了对windows系统的支持。

对于RISC-V架构，Go社区展现出十分积极的姿态，早在Go 1.11版本，Go就为RISC-V cpu架构预留了GOARCH值：riscv和riscv64。Go 1.14版本则为64bit RISC-V提供了在linux上的实验性支持(GOOS=linux, GOARCH=riscv64)。在Go 1.15版本中，Go在GOOS=linux, GOARCH=riscv64的环境下的稳定性和性能得到持续提升，并且已经可以支持goroutine异步抢占式调度了。

二. 工具链

1. GOPROXY新增以管道符为分隔符的代理列表值

在Go 1.13版本中，GOPROXY支持设置为多个proxy的列表，多个proxy之间采用逗号分隔。Go工具链会按顺序尝试列表中的proxy以获取依赖包数据，但是当有proxy server服务不可达或者是返回的http状态码不是404也不是410时，go会终止数据获取。但是当列表中的proxy server返回其他错误时，Go命令不会向GOPROXY列表中的下一个值所代表的的proxy server发起请求，这种行为模式没能让所有gopher满意，很多Gopher认为Go工具链应该向后面的proxy server请求，直到所有proxy server都返回失败。Go 1.15版本满足了Go社区的需求，新增以管道符“|”为分隔符的代理列表值。如果GOPROXY配置的proxy server列表值以管道符分隔，则无论某个proxy server返回什么错误码，Go命令都会向列表中的下一个proxy server发起新的尝试请求。

注：Go 1.15版本中GOPROXY环境变量的默认值依旧为https://proxy.golang.org,direct。

2. module cache的存储路径可设置

Go module机制自打在Go 1.11版本中以试验特性的方式引入时就将module的本地缓存默认放在了\$GOPATH/pkg/mod下（如果没有显式设置GOPATH，那么默认值将是~/go；如果GOPATH下面配置了多个路径，那么选择第一个路径），一直到Go 1.14版本，这个位置都是无法配置的。

Go module的引入为去除GOPATH提供了前提，于是module cache的位置也要尽量与GOPATH“脱钩”：Go 1.15提供了GOMODCACHE环境变量用于自定义module cache的存放位置。如果没有显式设置GOMODCACHE，那么module cache的默认存储路径依然是\$GOPATH/pkg/mod。

三. 运行时、编译器和链接器

1. panic展现形式变化

在Go 1.15之前，如果传给panic的值是bool, complex64, complex128, float32, float64, int, int8, int16, int32, int64, string, uint, uint8, uint16, uint32, uint64, uintptr等原生类型的值，那么panic在触发时会输出具体的值，比如：

// go1.15-examples/runtime/panic.go

package main

func foo() {
    var i uint32 = 17
    panic(i)
}

func main() {
    foo()
}

使用Go 1.14运行上述代码，得到如下结果：

$go run panic.go
panic: 17

goroutine 1 [running]:
main.foo(...)
    /Users/tonybai/go/src/github.com/bigwhite/experiments/go1.15-examples/runtime/panic.go:5
main.main()
    /Users/tonybai/go/src/github.com/bigwhite/experiments/go1.15-examples/runtime/panic.go:9 +0x39
exit status 2

Go 1.15版本亦是如此。但是对于派生于上述原生类型的自定义类型而言，Go 1.14只是输出变量地址：

// go1.15-examples/runtime/panic.go

package main

type myint uint32

func bar() {
    var i myint = 27
    panic(i)
}

func main() {
    bar()
}

使用Go 1.14运行上述代码：

$go run panic.go
panic: (main.myint) (0x105fca0,0xc00008e000)

goroutine 1 [running]:
main.bar(...)
    /Users/tonybai/go/src/github.com/bigwhite/experiments/go1.15-examples/runtime/panic.go:12
main.main()
    /Users/tonybai/go/src/github.com/bigwhite/experiments/go1.15-examples/runtime/panic.go:17 +0x39
exit status 2

Go 1.15针对此情况作了展示优化，即便是派生于这些原生类型的自定义类型变量，panic也可以输出其值。使用Go 1.15运行上述代码的结果如下：

$go run panic.go
panic: main.myint(27)

goroutine 1 [running]:
main.bar(...)
    /Users/tonybai/go/src/github.com/bigwhite/experiments/go1.15-examples/runtime/panic.go:12
main.main()
    /Users/tonybai/go/src/github.com/bigwhite/experiments/go1.15-examples/runtime/panic.go:17 +0x39
exit status 2

2. 将小整数([0,255])转换为interface类型值时将不会额外分配内存

Go 1.15在runtime/iface.go中做了一些优化改动：增加一个名为staticuint64s的数组，预先为[0,255]这256个数分配了内存。然后在convT16、convT32等运行时转换函数中判断要转换的整型值是否小于256(len(staticuint64s))，如果小于，则返回staticuint64s数组中对应的值的地址；否则调用mallocgc分配新内存。

$GOROOT/src/runtime/iface.go

// staticuint64s is used to avoid allocating in convTx for small integer values.
var staticuint64s = [...]uint64{
        0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
        0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
        0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
        0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
        0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27,
        0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,
        0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,

        ... ...

        0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
        0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff,

}

func convT16(val uint16) (x unsafe.Pointer) {
        if val < uint16(len(staticuint64s)) {
                x = unsafe.Pointer(&staticuint64s[val])
                if sys.BigEndian {
                        x = add(x, 6)
                }
        } else {
                x = mallocgc(2, uint16Type, false)
                *(*uint16)(x) = val
        }
        return
}

func convT32(val uint32) (x unsafe.Pointer) {
        if val < uint32(len(staticuint64s)) {
                x = unsafe.Pointer(&staticuint64s[val])
                if sys.BigEndian {
                        x = add(x, 4)
                }
        } else {
                x = mallocgc(4, uint32Type, false)
                *(*uint32)(x) = val
        }
        return
}

我们可以用下面例子来验证一下：

// go1.15-examples/runtime/tinyint2interface.go

package main

import (
    "math/rand"
)

func convertSmallInteger() interface{} {
    i := rand.Intn(256)
    var j interface{} = i
    return j
}

func main() {
    for i := 0; i < 100000000; i++ {
        convertSmallInteger()
    }
}

我们分别用go 1.14和go 1.15.2编译这个源文件（注意关闭内联等优化，否则很可能看不出效果）：

// go 1.14

go build  -gcflags="-N -l" -o tinyint2interface-go14 tinyint2interface.go

// go 1.15.2

go build  -gcflags="-N -l" -o tinyint2interface-go15 tinyint2interface.go

我们使用下面命令输出程序执行时每次GC的信息：

$env GODEBUG=gctrace=1 ./tinyint2interface-go14
gc 1 @0.025s 0%: 0.009+0.18+0.021 ms clock, 0.079+0.079/0/0.20+0.17 ms cpu, 4->4->0 MB, 5 MB goal, 8 P
gc 2 @0.047s 0%: 0.003+0.14+0.013 ms clock, 0.031+0.099/0.064/0.037+0.10 ms cpu, 4->4->0 MB, 5 MB goal, 8 P
gc 3 @0.064s 0%: 0.008+0.20+0.016 ms clock, 0.071+0.071/0.018/0.081+0.13 ms cpu, 4->4->0 MB, 5 MB goal, 8 P
gc 4 @0.081s 0%: 0.005+0.14+0.013 ms clock, 0.047+0.059/0.023/0.032+0.10 ms cpu, 4->4->0 MB, 5 MB goal, 8 P
gc 5 @0.098s 0%: 0.005+0.10+0.017 ms clock, 0.042+0.073/0.027/0.080+0.13 ms cpu, 4->4->0 MB, 5 MB goal, 8 P

... ...

gc 192 @3.264s 0%: 0.003+0.11+0.013 ms clock, 0.024+0.060/0.005/0.035+0.11 ms cpu, 4->4->0 MB, 5 MB goal, 8 P
gc 193 @3.281s 0%: 0.005+0.13+0.032 ms clock, 0.042+0.075/0.041/0.050+0.25 ms cpu, 4->4->0 MB, 5 MB goal, 8 P
gc 194 @3.298s 0%: 0.004+0.12+0.013 ms clock, 0.033+0.072/0.030/0.033+0.10 ms cpu, 4->4->0 MB, 5 MB goal, 8 P
gc 195 @3.315s 0%: 0.003+0.17+0.023 ms clock, 0.029+0.062/0.055/0.024+0.18 ms cpu, 4->4->0 MB, 5 MB goal, 8 P

$env GODEBUG=gctrace=1 ./tinyint2interface-go15

我们看到和go 1.14编译的程序不断分配内存，不断导致GC相比，go1.15.2没有输出GC信息，间接证实了小整数转interface变量值时不会触发内存分配。

3. 加入更现代化的链接器(linker)

一个新版的现代化linker正在逐渐加入到Go中，Go 1.15是新版linker的起点。后续若干版本，linker优化会逐步加入进来。在Go 1.15中，对于大型项目，新链接器的性能要提高20%，内存占用减少30%。

4. objdump支持输出GNU汇编语法

go 1.15为objdump工具增加了-gnu选项，以在Go汇编的后面，辅助输出GNU汇编，便于对照：

// go 1.14：

$go tool objdump -S tinyint2interface-go15|more
TEXT go.buildid(SB)

  0x1001000             ff20                    JMP 0(AX)
  0x1001002             476f                    OUTSD DS:0(SI), DX
  0x1001004             206275                  ANDB AH, 0x75(DX)
  0x1001007             696c642049443a20        IMULL $0x203a4449, 0x20(SP), BP
... ...

//go 1.15.2：

$go tool objdump  -S -gnu tinyint2interface-go15|more
TEXT go.buildid(SB)

  0x1001000             ff20                    JMP 0(AX)                            // jmpq *(%rax)           

  0x1001002             476f                    OUTSD DS:0(SI), DX                   // rex.RXB outsl %ds:(%rsi),(%dx)
  0x1001004             206275                  ANDB AH, 0x75(DX)                    // and %ah,0x75(%rdx)     

  0x1001007             696c642049443a20        IMULL $0x203a4449, 0x20(SP), BP      // imul $0x203a4449,0x20(%rsp,%riz,2),%ebp

... ...

四. 标准库

和以往发布的版本一样，标准库有大量小改动，这里挑出几个笔者感兴趣的和大家一起看一下。

1. 增加tzdata包

Go time包中很多方法依赖时区数据，但不是所有平台上都自带时区数据。Go time包会以下面顺序搜寻时区数据：

- ZONEINFO环境变量指示的路径中

- 在类Unix系统中一些常见的存放时区数据的路径（zoneinfo_unix.go中的zoneSources数组变量中存放这些常见路径）：

    "/usr/share/zoneinfo/",
    "/usr/share/lib/zoneinfo/",
    "/usr/lib/locale/TZ/"

- 如果平台没有，则尝试使用$GOROOT/lib/time/zoneinfo.zip这个随着go发布包一起发布的时区数据。但在应用部署的环境中，很大可能不会进行go安装。

如果go应用找不到时区数据，那么go应用运行将会受到影响，就如下面这个例子：

// go1.15-examples/stdlib/tzdata.go

package main

import (
    "fmt"
    "time"
)

func main() {
    loc, err := time.LoadLocation("America/New_York")
    if err != nil {
        fmt.Println("LoadLocation error:", err)
        return
    }
    fmt.Println("LoadLocation is:", loc)
}

我们移除系统的时区数据(比如将/usr/share/zoneinfo改名)和Go安装包自带的zoneinfo.zip(改个名)后，在Go 1.15.2下运行该示例：

$ go run tzdata.go
LoadLocation error: unknown time zone America/New_York

为此，Go 1.15提供了一个将时区数据嵌入到Go应用二进制文件中的方法：导入time/tzdata包：

// go1.15-examples/stdlib/tzdata.go

package main

import (
    "fmt"
    "time"
    _ "time/tzdata"
)

func main() {
    loc, err := time.LoadLocation("America/New_York")
    if err != nil {
        fmt.Println("LoadLocation error:", err)
        return
    }
    fmt.Println("LoadLocation is:", loc)
}

我们再用go 1.15.2运行一下上述导入tzdata包的例子：

$go run testtimezone.go
LoadLocation is: America/New_York

不过由于附带tzdata数据，应用二进制文件的size会增大大约800k，下面是在ubuntu下的实测值：

-rwxr-xr-x 1 root root 2.0M Oct 11 02:42 tzdata-withouttzdata*
-rwxr-xr-x 1 root root 2.8M Oct 11 02:42 tzdata-withtzdata*

2. 增加json解码限制

json包是日常使用最多的go标准库包之一，在Go 1.15中，go按照json规范的要求，为json的解码增加了一层限制：

// json规范要求

//https://tools.ietf.org/html/rfc7159#section-9

A JSON parser transforms a JSON text into another representation.  A
   JSON parser MUST accept all texts that conform to the JSON grammar.
   A JSON parser MAY accept non-JSON forms or extensions.

   An implementation may set limits on the size of texts that it
   accepts.  An implementation may set limits on the maximum depth of
   nesting.  An implementation may set limits on the range and precision
   of numbers.  An implementation may set limits on the length and
   character contents of strings.

这个限制就是增加了一个对json文本最大缩进深度值：

// $GOROOT/src/encoding/json/scanner.go

// This limits the max nesting depth to prevent stack overflow.
// This is permitted by https://tools.ietf.org/html/rfc7159#section-9
const maxNestingDepth = 10000

如果一旦传入的json文本数据缩进深度超过maxNestingDepth，那json包就会panic。当然，绝大多数情况下，我们是碰不到缩进10000层的超大json文本的。因此，该limit对于99.9999%的gopher都没啥影响。

3. reflect包

Go 1.15版本之前reflect包存在一处行为不一致的问题，我们看下面例子(例子来源于https://play.golang.org/p/Jnga2_6Rmdf)：

// go1.15-examples/stdlib/reflect.go

package main

import "reflect"

type u struct{}

func (u) M() { println("M") }

type t struct {
    u
    u2 u
}

func call(v reflect.Value) {
    defer func() {
        if err := recover(); err != nil {
            println(err.(string))
        }
    }()
    v.Method(0).Call(nil)
}

func main() {
    v := reflect.ValueOf(t{}) // v := t{}
    call(v)                   // v.M()
    call(v.Field(0))          // v.u.M()
    call(v.Field(1))          // v.u2.M()
}

我们使用Go 1.14版本运行该示例：

$go run reflect.go
M
M
reflect: reflect.flag.mustBeExported using value obtained using unexported field

我们看到同为类型t中的非导出字段(field)的u和u2(u是以嵌入类型方式称为类型t的字段的)，通过reflect包可以调用字段u的导出方法(如输出中的第二行的M)，却无法调用非导出字段u2的导出方法（如输出中的第三行的panic信息）。

这种不一致在Go 1.15版本中被修复，我们使用Go 1.15.2运行上述示例：

$go run reflect.go
M
reflect: reflect.Value.Call using value obtained using unexported field
reflect: reflect.Value.Call using value obtained using unexported field

我们看到reflect无法调用非导出字段u和u2的导出方法了。但是reflect依然可以通过提升到类型t的方法来间接使用u的导出方法，正如运行结果中的第一行输出。
这一改动可能会影响到遗留代码中使用reflect调用以类型嵌入形式存在的非导出字段方法的代码，如果你的代码中存在这样的问题，可以直接通过提升(promote)到包裹类型(如例子中的t)中的方法（如例子中的call(v)）来替代之前的方式。

五. 小结

由于Go 1.15删除了一些GC元数据和一些无用的类型元数据，Go 1.15编译出的二进制文件size会减少5%左右。我用一个中等规模的go项目实测了一下：

-rwxr-xr-x   1 tonybai  staff    23M 10 10 16:54 yunxind*
-rwxr-xr-x   1 tonybai  staff    24M  9 30 11:20 yunxind-go14*

二进制文件size的确有变小，大约4%-5%。

如果你还没有升级到Go 1.15，那么现在正是时候。

本文中涉及的代码可以在这里下载。https://github.com/bigwhite/experiments/tree/master/go1.15-examples

我的Go技术专栏：“改善Go语⾔编程质量的50个有效实践”上线了，欢迎大家订阅学习！

我的网课“Kubernetes实战：高可用集群搭建、配置、运维与应用”在慕课网上线了，感谢小伙伴们学习支持！

我爱发短信：企业级短信平台定制开发专家 https://tonybai.com/
smspush : 可部署在企业内部的定制化短信平台，三网覆盖，不惧大并发接入，可定制扩展；短信内容你来定，不再受约束, 接口丰富，支持长短信，签名可选。

2020年4月8日，中国三大电信运营商联合发布《5G消息白皮书》，51短信平台也会全新升级到“51商用消息平台”，全面支持5G RCS消息。

著名云主机服务厂商DigitalOcean发布最新的主机计划，入门级Droplet配置升级为：1 core CPU、1G内存、25G高速SSD，价格5$/月。有使用DigitalOcean需求的朋友，可以打开这个链接地址：https://m.do.co/c/bff6eed92687 开启你的DO主机之路。

Gopher Daily(Gopher每日新闻)归档仓库 – https://github.com/bigwhite/gopherdaily

我的联系方式：

微博：https://weibo.com/bigwhite20xx
微信公众号：iamtonybai
博客：tonybai.com
github: https://github.com/bigwhite

微信赞赏：
img{512x368}

商务合作方式：撰稿、出书、培训、在线课程、合伙创业、咨询、广告合作。

go protobuf v1败给了gogo protobuf，那v2呢？

四月 24, 2020
1 条评论

近期的一个项目有对结构化数据进行序列化和反序列化的需求，该项目具有performance critical属性，因此我们在选择序列化库包时是要考虑包的性能的。

github上有一个有关Go序列化方法性能比较的repo：go_serialization_benchmarks，这个repo横向比较了数十种数据序列化方法的正确性、性能、内存分配等，并给出了一个结论：推荐gogo protobuf。对于这样一个粗选的结果，我们是直接笑纳的^_^。接下来就是进一步对gogo protobuf做进一步探究。

img{512x368}

一. go protobuf v1 vs. gogo protobuf

gogo protobuf是既go protobuf官方api之外的另一个go protobuf的api实现，它兼容go官方protobuf api(更准确的说是v1版本)。gogo protobuf提供了三种代码生成方式：protoc-gen-gogofast、protoc-gen-gogofaster和protoc-gen-gogoslick。究竟选择哪一个呢？这里我也写了一些benchmark来比较，并顺便将官方go protobuf api也一并加入比较了。

我们首先安装一下gogo protobuf实现的protoc的三个插件，用于生成proto文件对应的Go包源码文件：

go get github.com/gogo/protobuf/protoc-gen-gofast
go get github.com/gogo/protobuf/protoc-gen-gogofaster

go get github.com/gogo/protobuf/protoc-gen-gogoslick

安装后，我们在$GOPATH/bin下将看到这三个文件(protoc-gen-go是go protobuf官方实现的代码生成插件)：

$ls -l $GOPATH/bin|grep proto
-rwxr-xr-x   1 tonybai  staff   6252344  4 24 14:43 protoc-gen-go*
-rwxr-xr-x   1 tonybai  staff   9371384  2 28 09:35 protoc-gen-gofast*
-rwxr-xr-x   1 tonybai  staff   9376152  2 28 09:40 protoc-gen-gogofaster*
-rwxr-xr-x   1 tonybai  staff   9380728  2 28 09:40 protoc-gen-gogoslick*

为了对采用不同插件生成的数据序列化和反序列化方法进行性能基准测试，我们建立了下面repo。在repo中，每一种方法生成的代码放入独立的module中：

$tree -L 2 -F
.
├── IDL/
│   └── submit.proto
├── Makefile
├── gogoprotobuf-fast/
│   ├── go.mod
│   ├── go.sum
│   ├── submit/
│   └── submit_test.go
├── gogoprotobuf-faster/
│   ├── go.mod
│   ├── go.sum
│   ├── submit/
│   └── submit_test.go
├── gogoprotobuf-slick/
│   ├── go.mod
│   ├── go.sum
│   ├── submit/
│   └── submit_test.go
└── goprotobuf/
    ├── go.mod
    ├── go.sum
    ├── submit/
    └── submit_test.go

我们的proto文件如下:

$cat IDL/submit.proto
syntax = "proto3";

option go_package = ".;submit";

package submit;

message request {
        int64 recvtime = 1;
        string uniqueid = 2;
        string token = 3;
        string phone = 4;
        string content = 5;
        string sign = 6;
        string type = 7;
        string extend = 8;
        string version = 9;
}

我们还建立了Makefile，用于简化操作：

$cat Makefile

gen-protobuf: gen-goprotobuf gen-gogoprotobuf-fast gen-gogoprotobuf-faster gen-gogoprotobuf-slick

gen-goprotobuf:
    protoc -I ./IDL submit.proto --go_out=./goprotobuf/submit

gen-gogoprotobuf-fast:
    protoc -I ./IDL submit.proto --gofast_out=./gogoprotobuf-fast/submit

gen-gogoprotobuf-faster:
    protoc -I ./IDL submit.proto --gogofaster_out=./gogoprotobuf-faster/submit

gen-gogoprotobuf-slick:
    protoc -I ./IDL submit.proto --gogoslick_out=./gogoprotobuf-slick/submit

benchmark: goprotobuf-bench gogoprotobuf-fast-bench gogoprotobuf-faster-bench  gogoprotobuf-slick-bench

goprotobuf-bench:
    cd goprotobuf && go test -bench .

gogoprotobuf-fast-bench:
    cd gogoprotobuf-fast && go test -bench .

gogoprotobuf-faster-bench:
    cd gogoprotobuf-faster && go test -bench .

gogoprotobuf-slick-bench:
    cd gogoprotobuf-slick && go test -bench .

针对每一种方法，我们建立一个benchmark test。benchmark test代码都是一样的，我们以gogoprotobuf-fast为例：

// submit_test.go

package protobufbench

import (
    "fmt"
    "os"
    "testing"

    "github.com/bigwhite/protobufbench_gogoprotofast/submit"
    "github.com/gogo/protobuf/proto"
)

var request = submit.Request{
    Recvtime: 170123456,
    Uniqueid: "a1b2c3d4e5f6g7h8i9",
    Token:    "xxxx-1111-yyyy-2222-zzzz-3333",
    Phone:    "13900010002",
    Content:  "Customizing the fields of the messages to be the fields that you actually want to use removes the need to copy between the structs you use and structs you use to serialize. gogoprotobuf also offers more serialization formats and generation of tests and even more methods.",
    Sign:     "tonybaiXZYDFDS",
    Type:     "submit",
    Extend:   "",
    Version:  "v1.0.0",
}

var requestToUnMarshal []byte

func init() {
    var err error
    requestToUnMarshal, err = proto.Marshal(&request)
    if err != nil {
        fmt.Printf("marshal err:%s\n", err)
        os.Exit(1)
    }
}

func BenchmarkMarshal(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        _, _ = proto.Marshal(&request)
    }
}
func BenchmarkUnmarshal(b *testing.B) {
    b.ReportAllocs()
    var request submit.Request
    for i := 0; i < b.N; i++ {
        _ = proto.Unmarshal(requestToUnMarshal, &request)

    }
}

func BenchmarkMarshalInParalell(b *testing.B) {
    b.ReportAllocs()

    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            _, _ = proto.Marshal(&request)
        }
    })
}
func BenchmarkUnmarshalParalell(b *testing.B) {
    b.ReportAllocs()
    var request submit.Request
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            _ = proto.Unmarshal(requestToUnMarshal, &request)
        }
    })
}

我们看到，对每种方法生成的代码，我们都会进行顺序和并行的marshal和unmarshal基准测试。

我们首先分别使用不同方式生成对应的go代码：

$make gen-protobuf
protoc -I ./IDL submit.proto --go_out=./goprotobuf/submit
protoc -I ./IDL submit.proto --gofast_out=./gogoprotobuf-fast/submit
protoc -I ./IDL submit.proto --gogofaster_out=./gogoprotobuf-faster/submit
protoc -I ./IDL submit.proto --gogoslick_out=./gogoprotobuf-slick/submit

然后运行基准测试(使用macos上的go 1.14)：

$make benchmark
cd goprotobuf && go test -bench .
goos: darwin
goarch: amd64
pkg: github.com/bigwhite/protobufbench_goproto
BenchmarkMarshal-8                  2437068           483 ns/op         384 B/op           1 allocs/op
BenchmarkUnmarshal-8                2262229           529 ns/op         400 B/op           7 allocs/op
BenchmarkMarshalInParalell-8        7592120           162 ns/op         384 B/op           1 allocs/op
BenchmarkUnmarshalParalell-8        5306744           225 ns/op         400 B/op           7 allocs/op
PASS
ok      github.com/bigwhite/protobufbench_goproto    6.239s
cd gogoprotobuf-fast && go test -bench .
goos: darwin
goarch: amd64
pkg: github.com/bigwhite/protobufbench_gogoprotofast
BenchmarkMarshal-8                  7186828           164 ns/op         384 B/op           1 allocs/op
BenchmarkUnmarshal-8                4706794           251 ns/op         400 B/op           7 allocs/op
BenchmarkMarshalInParalell-8       15107896            83.0 ns/op         384 B/op           1 allocs/op
BenchmarkUnmarshalParalell-8        6258507           179 ns/op         400 B/op           7 allocs/op
PASS
ok      github.com/bigwhite/protobufbench_gogoprotofast    5.449s
cd gogoprotobuf-faster && go test -bench .
goos: darwin
goarch: amd64
pkg: github.com/bigwhite/protobufbench_gogoprotofaster
BenchmarkMarshal-8                  7036842           166 ns/op         384 B/op           1 allocs/op
BenchmarkUnmarshal-8                4666698           256 ns/op         400 B/op           7 allocs/op
BenchmarkMarshalInParalell-8       15444961            83.2 ns/op         384 B/op           1 allocs/op
BenchmarkUnmarshalParalell-8        6936337           202 ns/op         400 B/op           7 allocs/op
PASS
ok      github.com/bigwhite/protobufbench_gogoprotofaster    5.750s
cd gogoprotobuf-slick && go test -bench .
goos: darwin
goarch: amd64
pkg: github.com/bigwhite/protobufbench_gogoprotoslick
BenchmarkMarshal-8                  6529311           176 ns/op         384 B/op           1 allocs/op
BenchmarkUnmarshal-8                4737463           252 ns/op         400 B/op           7 allocs/op
BenchmarkMarshalInParalell-8       15700746            81.8 ns/op         384 B/op           1 allocs/op
BenchmarkUnmarshalParalell-8        6528390           202 ns/op         400 B/op           7 allocs/op
PASS
ok      github.com/bigwhite/protobufbench_gogoprotoslick    5.668s

在我的macpro(4核8线程)上，我们看到两点结论：

官方go protobuf实现生成的代码性能的确弱于gogo protobuf生成的代码，在顺序测试中，差距还较大；
针对我预置的proto文件中数据格式，gogo protobuf的三种生成方法产生的代码的性能差异并不大，选择protoc-gen-gofast生成的代码在性能上即可满足。

二. go protobuf v2

今年三月份初，Go官方发布了protobuf的新API版本，这个版本与原go protobuf并不兼容。新版API旨在使protobuf的类型系统与go类型系统充分融合，提供反射功能和自定义消息实现。那么该版本生成的序列/反序列化代码在性能上有提升吗？我们将其加入我们的benchmark。

我们先下载go protobuf v2的代码生成插件(注意：由于go protobuf v1和go protobuf v2的插件名称相同，需要先备份好原先已经安装的protoc-gen-go)：

$  go get google.golang.org/protobuf/cmd/protoc-gen-go
go: found google.golang.org/protobuf/cmd/protoc-gen-go in google.golang.org/protobuf v1.21.0

然后将新安装的插件名称改为protoc-gen-gov2，这样$GOPATH/bin下的插件文件列表如下：

$ls -l $GOPATH/bin/|grep proto
-rwxr-xr-x   1 tonybai  staff   6252344  4 24 14:43 protoc-gen-go*
-rwxr-xr-x   1 tonybai  staff   9371384  2 28 09:35 protoc-gen-gofast*
-rwxr-xr-x   1 tonybai  staff   9376152  2 28 09:40 protoc-gen-gogofaster*
-rwxr-xr-x   1 tonybai  staff   9380728  2 28 09:40 protoc-gen-gogoslick*
-rwxr-xr-x   1 tonybai  staff   8716064  4 24 14:56 protoc-gen-gov2*

在Makefile中增加针对go protobuf v2的代码生成和Benchmark target：

gen-goprotobufv2:
        protoc -I ./IDL submit.proto --gov2_out=./goprotobufv2/submit

goprotobufv2-bench:
        cd goprotobufv2 && go test -bench .

由于go protobuf v2与v1版本不兼容，因此也无法与gogo protobuf兼容，我们需要修改一下go protobuf v2对应的submit_test.go，将导入的“github.com/gogo/protobuf/proto”包换为“google.golang.org/protobuf/proto”。

重新生成代码：

$make gen-protobuf
protoc -I ./IDL submit.proto --go_out=./goprotobuf/submit
protoc -I ./IDL submit.proto --gov2_out=./goprotobufv2/submit
protoc -I ./IDL submit.proto --gofast_out=./gogoprotobuf-fast/submit
protoc -I ./IDL submit.proto --gogofaster_out=./gogoprotobuf-faster/submit
protoc -I ./IDL submit.proto --gogoslick_out=./gogoprotobuf-slick/submit

运行benchmark:

$make benchmark
cd goprotobuf && go test -bench .
goos: darwin
goarch: amd64
pkg: github.com/bigwhite/protobufbench_goproto
BenchmarkMarshal-8                  2420620           485 ns/op         384 B/op           1 allocs/op
BenchmarkUnmarshal-8                2186240           538 ns/op         400 B/op           7 allocs/op
BenchmarkMarshalInParalell-8        7334412           162 ns/op         384 B/op           1 allocs/op
BenchmarkUnmarshalParalell-8        4537429           222 ns/op         400 B/op           7 allocs/op
PASS
ok      github.com/bigwhite/protobufbench_goproto    6.052s
cd goprotobufv2 && go test -bench .
goos: darwin
goarch: amd64
pkg: github.com/bigwhite/protobufbench_goprotov2
BenchmarkMarshal-8                  2404473           506 ns/op         384 B/op           1 allocs/op
BenchmarkUnmarshal-8                1901947           626 ns/op         400 B/op           7 allocs/op
BenchmarkMarshalInParalell-8        6629139           171 ns/op         384 B/op           1 allocs/op
BenchmarkUnmarshalParalell-8       panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x11d4956]

goroutine 196 [running]:
google.golang.org/protobuf/internal/impl.(*messageState).protoUnwrap(0xc00007e210, 0xc000010360, 0xc00008ce01)
    /Users/tonybai/Go/pkg/mod/google.golang.org/protobuf@v1.21.0/internal/impl/message_reflect_gen.go:27 +0x26
google.golang.org/protobuf/internal/impl.(*messageState).Interface(0xc00007e210, 0xc00007e210, 0xc00012c000)
    /Users/tonybai/Go/pkg/mod/google.golang.org/protobuf@v1.21.0/internal/impl/message_reflect_gen.go:24 +0x2b
google.golang.org/protobuf/proto.UnmarshalOptions.unmarshal(0x0, 0x12acc00, 0xc000010360, 0xc00012c000, 0x177, 0x177, 0x12b23e0, 0xc00007e210, 0xc000200001, 0x0, ...)
    /Users/tonybai/Go/pkg/mod/google.golang.org/protobuf@v1.21.0/proto/decode.go:71 +0x2c5
google.golang.org/protobuf/proto.Unmarshal(0xc00012c000, 0x177, 0x177, 0x12ac180, 0xc00007e210, 0x0, 0x0)
    /Users/tonybai/Go/pkg/mod/google.golang.org/protobuf@v1.21.0/proto/decode.go:48 +0x89
github.com/bigwhite/protobufbench_goprotov2.BenchmarkUnmarshalParalell.func1(0xc0004a8000)
    /Users/tonybai/test/go/protobuf/goprotobufv2/submit_test.go:65 +0x6a
testing.(*B).RunParallel.func1(0xc0000161b0, 0xc0000161a8, 0xc0000161a0, 0xc00010c700, 0xc00004a000)
    /Users/tonybai/.bin/go1.14/src/testing/benchmark.go:763 +0x99
created by testing.(*B).RunParallel
    /Users/tonybai/.bin/go1.14/src/testing/benchmark.go:756 +0x192
exit status 2
FAIL    github.com/bigwhite/protobufbench_goprotov2    4.878s
make: *** [goprotobufv2-bench] Error 1

我们看到go protobuf v2并未完成所有benchmark test，在运行并行unmarshal测试中panic了。目前go protobuf v2官方并未在github开通issue，因此尚不知道哪里去提issue。于是回到test代码，再仔细看一下submit_test.go中 BenchmarkUnmarshalParalell的代码：

func BenchmarkUnmarshalParalell(b *testing.B) {
        b.ReportAllocs()
        var request submit.Request
        b.RunParallel(func(pb *testing.PB) {
                for pb.Next() {
                        _ = proto.Unmarshal(requestToUnMarshal, &request)
                }
        })
}

这里存在一个“问题”，那就是多goroutine会共享一个request。但在其他几个测试中同样的代码并未引发panic。我修改一下代码，将其放入for循环中：

func BenchmarkUnmarshalParalell(b *testing.B) {
        b.ReportAllocs()
        b.RunParallel(func(pb *testing.PB) {
                for pb.Next() {
                        var request submit.Request
                        _ = proto.Unmarshal(requestToUnMarshal, &request)
                }
        })
}

再运行go protobuf v2的benchmark：

$go test -bench .
goos: darwin
goarch: amd64
pkg: github.com/bigwhite/protobufbench_goprotov2
BenchmarkMarshal-8                  2348630           509 ns/op         384 B/op           1 allocs/op
BenchmarkUnmarshal-8                1913904           627 ns/op         400 B/op           7 allocs/op
BenchmarkMarshalInParalell-8        7133936           175 ns/op         384 B/op           1 allocs/op
BenchmarkUnmarshalParalell-8        4841752           232 ns/op         576 B/op           8 allocs/op
PASS
ok      github.com/bigwhite/protobufbench_goprotov2    6.355s

看来的确是这个问题。

从Benchmark结果来看，即便是与go protobuf v1相比，go protobuf v2生成的代码性能也要逊色一些，更不要说与gogo protobuf相比了。

三. 小结

从性能角度考虑，如果要使用go protobuf api，首选gogo protobuf。

如果从功能角度考虑，显然go protobuf v2在成熟稳定了以后，会成为Go语言功能上最为强大的protobuf API。

本文涉及源码可以在这里下载。