Stack | Tony Bai

标签 Stack 下的文章

理解unsafe-assume-no-moving-gc包

四月 16, 2023
0 条评论

本文永久链接 – https://tonybai.com/2023/04/16/understanding-unsafe-assume-no-moving-gc

1. 背景

在之前的《Go与神经网络：张量计算》一文中，不知道大家是否发现了，所有例子代码执行时，前面都加了一个环境变量ASSUME_NO_MOVING_GC_UNSAFE_RISK_IT_WITH，就像下面这样：

$ASSUME_NO_MOVING_GC_UNSAFE_RISK_IT_WITH=go1.20 go run tensor.go

这是怎么回事儿呢？如果不加上这个环境变量会发生什么呢？我们来试试：

// https://github.com/bigwhite/experiments/blob/master/go-and-nn/tensor-operations/tensor.go

$go run tensor.go
panic: Something in this program imports go4.org/unsafe/assume-no-moving-gc to declare that it assumes a non-moving garbage collector, but your version of go4.org/unsafe/assume-no-moving-gc hasn't been updated to assert that it's safe against the go1.20 runtime. If you want to risk it, run with environment variable ASSUME_NO_MOVING_GC_UNSAFE_RISK_IT_WITH=go1.20 set. Notably, if go1.20 adds a moving garbage collector, this program is unsafe to use.

goroutine 1 [running]:
go4.org/unsafe/assume-no-moving-gc.init.0()
    /Users/tonybai/Go/pkg/mod/go4.org/unsafe/assume-no-moving-gc@v0.0.0-20220617031537-928513b29760/untested.go:25 +0x1ba
exit status 2

我们看到，程序panic了！我们看到panic的错误信息提到了go4.org/unsafe/assume-no-moving-gc这个包，显然是这个包在“作祟”，那么assume-no-moving-gc这个包究竟是做什么的呢？究竟有何功用？为何gorgonia.org/tensor会依赖这个包？这超出了《Go与神经网络：张量计算》那篇文章的范畴，所以我并未提及。在这篇文章中，我就和大家一起来理解一下unsafe-assume-no-moving-gc这个包。

2. unsafe-assume-no-moving-gc究竟是什么包？

unsafe-assume-no-moving-gc这个包的canonical import path是go4.org/unsafe/assume-no-moving-gc，显然它是go4.org这个组织开源的包。我们看看go4.org的主页(如下图)：

这个站点主页非常“简陋”，最大的价值在于解释了go4的来历：gopher的谐音。go4.org开源了一些Go包，这个在其官方github站点可以看到：

项目不多，Star数也不多，但随便翻看一个项目的contributor，我们能看到前Googler、前Go核心团队成员、net/http包的设计者Brad Fitzpatrick(bradfitz)以及Go runtime的核心贡献者Josh Bleecher Snyder(josharian)。现在这两人似乎都在初创公司tailscale任职，做基于wireguard协议的远程安全控制平台(简单理解就是VPN平台)。tailscale汇集了一撮Go语言的原核心开发，go4.org就是他们开源的一些misc go包。而unsafe-assume-no-moving-gc这个包就是其中之一。

那么这个包究竟是做什么的呢？我们接着往下看。

3. unsafe-assume-no-moving-gc的工作原理

unsafe-assume-no-moving-gc是一个非常简单的包：

$tree unsafe-assume-no-moving-gc -F
unsafe-assume-no-moving-gc
├── LICENSE
├── README.md
├── assume-no-moving-gc.go
├── assume-no-moving-gc_test.go
├── go.mod
└── untested.go

0 directories, 6 files

除了test源文件外，它的源文件只有两个assume-no-moving-gc.go和untested.go。打开这两个源文件，你会发现这个包甚至都没有提供任何API。那这个包究竟是做什么用的呢？下面是这个包的README：

大致的理解就是如果你的代码中使用了Go中的unsafe tip，那么你的程序可以正常工作的前提是Go运行时垃圾回收器不是一个带迁移机制的回收器(collector)。

所谓带迁移机制的collector，即在GC回收时可能将某些heap object挪到其他内存地址上。你的程序如果导入unsafe-assume-no-moving-gc这个包，就可以在Go GC支持迁移机制时以“程序启动崩溃”的行为提醒你。

我们来看一个例子：

// main.go
package main

import (
    "fmt"

    _ "go4.org/unsafe/assume-no-moving-gc"
)

func main() {
    fmt.Println("unsafe-assume-no-moving-gc demo")
}

go mod tidy后，使用Go 1.20版本运行该源文件：

$go mod tidy
go: finding module for package go4.org/unsafe/assume-no-moving-gc
go: downloading go4.org/unsafe/assume-no-moving-gc v0.0.0-20230221090011-e4bae7ad2296
go: downloading go4.org v0.0.0-20230225012048-214862532bf5

$go run main.go
unsafe-assume-no-moving-gc demo

由于目前最新Go 1.20.x版本的GC并非带迁移机制的GC，因此使用Go 1.20跑上面程序不会导致panic。

我们将unsafe-assume-no-moving-gc包回退到以前的版本，比如：v0.0.0-20230221090011-e4bae7ad2296，然后再run一遍main.go：

$go get go4.org/unsafe/assume-no-moving-gc@v0.0.0-20201222180813-1025295fd063
go: downgraded go4.org/unsafe/assume-no-moving-gc v0.0.0-20230221090011-e4bae7ad2296 => v0.0.0-20201222180813-1025295fd063

$go run main.go
panic: Something in this program imports go4.org/unsafe/assume-no-moving-gc to declare that it assumes a non-moving garbage collector, but your version of go4.org/unsafe/assume-no-moving-gc hasn't been updated to assert that it's safe against the go1.20 runtime. If you want to risk it, run with environment variable ASSUME_NO_MOVING_GC_UNSAFE_RISK_IT_WITH=go1.20 set. Notably, if go1.20 adds a moving garbage collector, this program is unsafe to use.

goroutine 1 [running]:
go4.org/unsafe/assume-no-moving-gc.init.0()
    /Users/tonybai/Go/pkg/mod/go4.org/unsafe/assume-no-moving-gc@v0.0.0-20201222180813-1025295fd063/untested.go:24 +0x1ba
exit status 2

从输出的panic error信息中，我们看到go4.org/unsafe/assume-no-moving-gc尚未被升级到可以信任go 1.20版本的版本，因此以Go 1.20运行该程序可能有风险。如果你能确认不会存在问题，可以用ASSUME_NO_MOVING_GC_UNSAFE_RISK_IT_WITH=go1.20这个环境变量来避免panic，比如下面这个输出：

$ASSUME_NO_MOVING_GC_UNSAFE_RISK_IT_WITH=go1.20 go run main.go
unsafe-assume-no-moving-gc demo

那么unsafe-assume-no-moving-gc包是怎么做到上述“检测”的呢？其诀窍就在untested.go这个源文件中。我们下载go4.org/unsafe/assume-no-moving-gc源码，并将其“回退”到1025295fd063这个commit时刻：

$git checkout 1025295fd063
Note: checking out '1025295fd063'.

... ...

HEAD is now at 1025295 flesh out package doc

查看untested.go：

// Copyright 2020 Brad Fitzpatrick. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

// +build go1.18

package assume_no_moving_gc

import (
    "os"
    "runtime"
    "strings"
)

func init() {
    dots := strings.SplitN(runtime.Version(), ".", 3)
    v := runtime.Version()
    if len(dots) >= 2 {
        v = dots[0] + "." + dots[1]
    }
    if os.Getenv(env) == v {
        return
    }
    panic("Something in this program imports go4.org/unsafe/assume-no-moving-gc to declare that it assumes a non-moving garbage collector, but your version of go4.org/unsafe/assume-no-moving-gc hasn't been updated to assert that it's safe against the " + v + " runtime. If you want to risk it, run with environment variable " + env + "=" + v + " set. Notably, if " + v + " adds a moving garbage collector, this program is unsafe to use.")
}

这个文件有两个特点：

使用了build constraint：// +build go1.18，这意味着在你使用Go 1.18及更高版本时，该源文件才会参与编译。
包含了init函数，你的代码在导入assume_no_moving_gc包时，该init函数会执行，产生“副作用”。

注：关于build constraint的用法，参见go help buildconstraint。

这样，我们使用go 1.20版本运行上面main.go时，由于go 1.20版本大于go 1.18版本，untested.go将被编译且其中的init函数将被执行，如果env这个常量(“ASSUME_NO_MOVING_GC_UNSAFE_RISK_IT_WITH”)所对应的环境变量没有设置，那么init函数将走到panic，从而导致程序退出并输出panic信息。

现在我们将assume_no_moving_gc包的版本切换回最新版本，最新版本的untested.go中的build constraint如下：

  //go:build go1.21
  // +build go1.21

这意味着你使用Go 1.21或以上版本时，untested.go文件才会被编译，如果我们使用go 1.20版本运行main.go，我们便不会“触发”untested.go中init函数的副作用，于是main.go得以正常运行。

注：截至go 1.20版本，Go GC依然不会挪动heap object。

在理解unsafe-assume-no-moving-gc包之前，我就该包的功用“咨询”了ChatGPT，ChatGPT的回答如下：

可以看出，ChatGPT基本上是一本正经地“胡说八道”。

4. 小结

unsafe-assume-no-moving-gc只针对GC对heap object的迁移，而不会保证栈地址的迁移，我们知道，Go中栈地址是会变的，因为goroutine的初始栈才2KB，一旦超出这个范围，Go runtime就会对栈进行扩展，即分配一个更大的地址范围作为goroutine的栈，然后将原栈上的变量迁移到新栈中，这样原先栈上变量的地址就都会发生变化。

不过，如果你的Go源码中采用了unsafe tips，依赖了heap object的地址，那么这里建议你导入unsafe-assume-no-moving-gc包。但要注意，随着go最新版本的发布，你要及时更新依赖的unsafe-assume-no-moving-gc的版本。否则当用户使用最新版本go时，依赖你的包的程序就会以panic来提醒。

“Gopher部落”知识星球旨在打造一个精品Go学习和进阶社群！高品质首发Go技术文章，“三天”首发阅读权，每年两期Go语言发展现状分析，每天提前1小时阅读到新鲜的Gopher日报，网课、技术专栏、图书内容前瞻，六小时内必答保证等满足你关于Go语言生态的所有需求！2023年，Gopher部落将进一步聚焦于如何编写雅、地道、可读、可测试的Go代码，关注代码质量并深入理解Go核心技术，并继续加强与星友的互动。欢迎大家加入！

img{512x368}

著名云主机服务厂商DigitalOcean发布最新的主机计划，入门级Droplet配置升级为：1 core CPU、1G内存、25G高速SSD，价格5$/月。有使用DigitalOcean需求的朋友，可以打开这个链接地址：https://m.do.co/c/bff6eed92687 开启你的DO主机之路。

Gopher Daily(Gopher每日新闻)归档仓库 – https://github.com/bigwhite/gopherdaily

我的联系方式：

微博(暂不可用)：https://weibo.com/bigwhite20xx
微博2：https://weibo.com/u/6484441286
博客：tonybai.com
github: https://github.com/bigwhite

商务合作方式：撰稿、出书、培训、在线课程、合伙创业、咨询、广告合作。

手把手教你使用ANTLR和Go实现一门DSL语言（第五部分）：错误处理

五月 30, 2022
0 条评论

本文永久链接 – https://tonybai.com/2022/05/30/an-example-of-implement-dsl-using-antlr-and-go-part5

无论是端应用还是云应用，要上生产环境，有一件事必须要做好，那就是错误处理。在本系列前面的文章中，我们设计了文法与语法、建立并验证了语义模型，但我们没有特别关注错误处理。在这一篇中，我们就来补上这个环节。

DSL设计与实现过程有以下几个主要环节，在不同环节，我们关注的错误处理的主要对象是不同的。如下图所示：

在文法设计与验证环节，我们更多关注文法设计的正确性。错误的文法会导致解析法示例时失败，但这个环节是在生产Parser代码之前，我们更多是通过ANTLR提供的调试工具对文法的正确性进行调试，无需自己写代码做错误处理。

在语法解析与建立语法树环节，由于文法问题已经解决，生成的Parser可以解析正确的语法示例了。此时，错误处理主要聚焦在如何处理语法错误上面。

而在组装语义模型并语义模型执行环节，我们关注的则是用于组装语义模型的元素值的合理性。以windowsRange为例，在语义模型中，它有两个元素low和max，代表的windowsRange为[low, max]。但如果你的源码中low的值大于了max的值，从语法的角度是合法的，是可以通过语法解析的。但在语义层面，这就是不合理的。在组装语义模型与执行环节，我们需要将这类问题找出来，报告错误并进行处理。

在本文中我们将对后面两个环节的错误处理的思路与方法做简要说明。

一. 语法解析的错误处理

语法解析这个环节就好比静态语言的编译或动态语言的解析，如果发现语法错误，则提供源码中语法错误的位置和相关辅助信息。ANTLR的Go runtime中提供了ErrorListener接口以及一个DefaultErrorListener的空实现：

// github.com/antlr/antlr4/runtime/Go/antlr/error_listener.go
type ErrorListener interface {
    SyntaxError(recognizer Recognizer, offendingSymbol interface{}, line, column int, msg string, e RecognitionException)
    ReportAmbiguity(recognizer Parser, dfa *DFA, startIndex, stopIndex int, exact bool, ambigAlts *BitSet, configs ATNConfigSet)
    ReportAttemptingFullContext(recognizer Parser, dfa *DFA, startIndex, stopIndex int, conflictingAlts *BitSet, configs ATNConfigSet)
    ReportContextSensitivity(recognizer Parser, dfa *DFA, startIndex, stopIndex, prediction int, configs ATNConfigSet)
}

ErrorListener这个接口中的SyntaxError方法正是我们在这个环节需要的，它可以帮助我们捕捉到语法示例解析时的语法错误。

Parser内置了ErrorListener的实现，比如antlr.ConsoleErrorListener。但这个Listener在源码示例的解析过程中啥也不会输出，毫无存在感，我们需要自定义一个可以提示错误语法信息的ErrorListener实现。

下面是我参考《ANTLR4权威指南》中的Java例子实现的一个Go版本的VerboseErrorListener：

// tdat/error_listener.go
type VerboseErrorListener struct {
    *antlr.DefaultErrorListener
    hasError bool
}

func NewVerboseErrorListener() *VerboseErrorListener {
    return new(VerboseErrorListener)
}

func (d *VerboseErrorListener) HasError() bool {
    return d.hasError
}

func (d *VerboseErrorListener) SyntaxError(recognizer antlr.Recognizer, offendingSymbol interface{}, line, column int, msg string, e antlr.RecognitionException) {
    p := recognizer.(antlr.Parser)
    stack := p.GetRuleInvocationStack(p.GetParserRuleContext())

    fmt.Printf("rule stack: %v ", stack[0])
    fmt.Printf("line %d: %d at %v : %s\n", line, column, offendingSymbol, msg)

    d.hasError = true
}

Parser在解析源码过程中，在发现语法错误时会回调VerboseErrorListener的SyntaxError方法，SyntaxError传入的各个参数中包含语法错误的详细信息，我们只需向上面这样将这些信息按一定格式组装起来输出即可。

另外这里给VerboseErrorListener增加了一个hasError布尔字段，用于标识源文件解析过程中是否出现了语法错误，程序可以根据这个错误标识选择后续的执行路径。

下面是main函数中VerboseErrorListener的用法：

func main() {
    ... ...
    lexer := parser.NewTdatLexer(input)
    stream := antlr.NewCommonTokenStream(lexer, 0)
    p := parser.NewTdatParser(stream)

    el := NewVerboseErrorListener()
    p.RemoveErrorListeners()
    p.AddErrorListener(el)

    tree := p.Prog()

    if el.HasError() {
        return
    }
    ... ...
}

从上面代码可以看到，我们在创建TdatParser实例后面，在解析源码(p.Prog())之前，需要先将其默认内置的ErrorListener删除掉，然后加入我们自己的VerboseErrorListener实例。之后main函数根据VerboseErrorListener是否包含监听到语法错误的状态决定是否继续向下执行，如果发现有语法错误，则终止程序运行。

我们添加一个带有语法错误的语法示例sample5-invalid.t：

// tdat/samples/sample5-invalid.t

r0006: Aach { |1,3| ($speed < 50e) and (($temperature + 1) < 4) or ((roundDown($salinity) <= 600.0) or (roundUp($ph) > 8.0)) } => ();

让tdat程序解析一下sample5-invalid.t，我们得到下面结果：

$./tdat samples/sample5-invalid.t
input file: samples/sample5-invalid.t
rule: enumerableFunc line 2: 7 at [@2,8:11='Aach',<29>,2:7] : mismatched input 'Aach' expecting {'Each', 'None', 'Any'}
rule: conditionExpr line 2: 32 at [@13,33:33='e',<29>,2:32] : extraneous input 'e' expecting ')'

我们看到，程序输出了语法问题的详细信息，并停止了继续执行。

二. 语义模型组装与执行环节的错误处理

和语法解析时相对形式固定的错误处理不同，语义层面的错误形式更加多种多样，分布的位置也比较光，每个解析规则(parse rule)处都可能存在语义问题，就像前面提到的windowsRange的low > high的问题。再比如在传入的数据中找不到result中指明的字段等。

无论是组装语义模型，还是语义模型的执行，都是树的遍历，遍历函数存在递归，且层次可能很深，这样传统的error作为返回值不太适合。最好的方式是结合panic+recover的方式，当某个环节的语义出现问题时，直接panic，然后在上层通过recover捕捉panic，再以error方式将panic携带的error信息返回。我们就以windowRange的语义问题作为一个例子来看看语义模型组装和执行过程中如何处理错误。

首先，我们改造一下ReversePolishExprListener的ExitWindowsWithLowAndHighIndex方法，当解析后发现low > high时，抛出panic：

// tdat/reverse_polish_expr_listener.go

func (l *ReversePolishExprListener) ExitWindowsWithLowAndHighIndex(c *parser.WindowsWithLowAndHighIndexContext) {
    s := c.GetText()
    s = s[1 : len(s)-1] // remove two '|'

    t := strings.Split(s, ",")

    if t[0] == "" {
        l.low = 1
    } else {
        l.low, _ = strconv.Atoi(t[0])
    }

    if t[1] == "" {
        l.high = windowsRangeMax
    } else {
        l.high, _ = strconv.Atoi(t[1])
    }

    if l.high < l.low {
        panic(fmt.Sprintf("windowsRange: low(%d) > high(%d)", l.low, l.high))
    }
}

为了不在main中直接捕获panic，我们将原先的遍历tree的语句：

antlr.ParseTreeWalkerDefault.Walk(l, tree)

挪到一个新函数extractReversePolishExpr中，我们在extractReversePolishExpr中捕获panic，并以普通error的形式将错误返回给main函数：

// tdat/main.go

func extractReversePolishExpr(listener antlr.ParseTreeListener, t antlr.Tree) (err error) {
    defer func() {
        if x := recover(); x != nil {
            err = fmt.Errorf("semantic tree assembly error: %v", x)
        }
    }()

    antlr.ParseTreeWalkerDefault.Walk(listener, t)

    return nil
}

在main函数中，我们像下面这样使用extractReversePolishExpr：

// tdat/main.go

func main() {
    ... ...
    l := NewReversePolishExprListener()
    err = extractReversePolishExpr(l, tree)
    if err != nil {
        fmt.Printf("%s\n", err)
        return
    }
    ... ...
}

当extractReversePolishExpr返回错误时，意味着提取逆波兰式的过程出现了问题，我们将终止程序运行。

接下来我们就构造一个语义错误的例子samples/sample6-windowrange-invalid.t来看看上述程序捕捉语义错误的过程：

// samples/sample6-windowrange-invalid.t
r0006: Each { |3,1| ($speed < 50) and (($temperature + 1) < 4) or ((roundDown($salinity) <= 600.0) or (roundUp($ph) > 8.0)) } => ();

运行一下我们的新程序：

$./tdat samples/sample6-windowrange-invalid.t
input file: samples/sample6-windowrange-invalid.t
semantic tree assembly error: windowsRange: low(3) > high(1)

我们看到：程序成功捕捉到了预期的语义错误。

在后续的语义模型执行过程中，semantic包的Evaluate函数也使用了defer + recover捕捉了可能在表达式树求值过程中可能出现的panic，并通过error形式返回给其调用者。甚至在组装过程中没有被捕捉到的语义问题，一旦引发语义执行错误，同样也会被捕捉到。

由于原理相同，针对语义模型执行过程的错误处理，这里就不赘述了。

三. 小结

在本篇文章中，我们补充了设计与实现DSL过程中错误处理，针对语法解析和语义模型组装与执行两个环节给出相应的错误处理方案。

在《领域特定语言》一书中，Martin Fowler写道：“解析和生成输出是编写编译器中容易的部分，真正的难点在于给出更好的错误信息”。错误处理在基于DSL的处理引擎中占有十分重要的地位，良好的错误处理设计对后续引擎的问题诊断、演进与维护大有裨益。

本文中涉及的代码可以在这里下载 – https://github.com/bigwhite/experiments/tree/master/antlr/tdat 。

“Gopher部落”知识星球旨在打造一个精品Go学习和进阶社群！高品质首发Go技术文章，“三天”首发阅读权，每年两期Go语言发展现状分析，每天提前1小时阅读到新鲜的Gopher日报，网课、技术专栏、图书内容前瞻，六小时内必答保证等满足你关于Go语言生态的所有需求！2022年，Gopher部落全面改版，将持续分享Go语言与Go应用领域的知识、技巧与实践，并增加诸多互动形式。欢迎大家加入！

img{512x368}

我爱发短信：企业级短信平台定制开发专家 https://tonybai.com/。smspush : 可部署在企业内部的定制化短信平台，三网覆盖，不惧大并发接入，可定制扩展；短信内容你来定，不再受约束, 接口丰富，支持长短信，签名可选。2020年4月8日，中国三大电信运营商联合发布《5G消息白皮书》，51短信平台也会全新升级到“51商用消息平台”，全面支持5G RCS消息。

Gopher Daily(Gopher每日新闻)归档仓库 – https://github.com/bigwhite/gopherdaily

我的联系方式：