Twitter | Tony Bai

标签 twitter 下的文章

Golang程序配置方案小结

七月 1, 2015
3 条评论

在Twitter上看到一篇关于Golang程序配置方案总结的系列文章（一个mini series，共6篇），原文链接：在这里。我觉得不错，这里粗略整理（非全文翻译）一下，供大家参考。

一、背景

无论使用任何编程语言开发应用，都离不开配置数据。配置数据提供的形式有多样，不外乎命令行选项(options)、参数（parameters)，环境变量（env vars)以及配置文件等。Golang也不例外。Golang内置flag标准库，可以用来支持部分命令行选项和参数的解析；Golang通过os包提供的方法可以获取当前环境变量；但Golang没有规定标准配置文件格式(虽说内置支持xml、json)，多通过第三方包来解决配置文件读取的问题。Golang配置相关的第三方包邮很多，作者在本文中给出的配置方案中就包含了主流的第三方配置数据操作包。

文章作者认为一个良好的应用配置层次应该是这样的：
1、程序内内置配置项的初始默认值
2、配置文件中的配置项值可以覆盖(override)程序内配置项的默认值。
3、命令行选项和参数值具有最高优先级，可以override前两层的配置项值。

下面就按作者的思路循序渐进探讨golang程序配置方案。

二、解析命令行选项和参数

这一节关注golang程序如何访问命令行选项和参数。

golang对访问到命令行参数提供了内建的支持：

//cmdlineargs.go
package main

import (
    //      "fmt"
    "os"
    "path/filepath"
)

func main() {
println("I am ", os.Args[0])

baseName := filepath.Base(os.Args[0])
println("The base name is ", baseName)

// The length of array a can be discovered using the built-in function len
println("Argument # is ", len(os.Args))

    // the first command line arguments
    if len(os.Args) > 1 {
        println("The first command line argument: ", os.Args[1])
    }
}

执行结果如下：
$go build cmdlineargs.go
$cmdlineargs test one
I am cmdlineargs
The base name is cmdlineargs
Argument # is 3
The first command line argument: test

对于命令行结构复杂一些的程序，我们最起码要用到golang标准库内置的flag包：

//cmdlineflag.go
package main

import (
    "flag"
    "fmt"
    "os"
    "strconv"
)

var (
// main operation modes
write = flag.Bool("w", false, "write result back instead of stdout\n\t\tDefault: No write back")

// layout control
tabWidth = flag.Int("tabwidth", 8, "tab width\n\t\tDefault: Standard")

// debugging
cpuprofile = flag.String("cpuprofile", "", "write cpu profile to this file\n\t\tDefault: no default")
)

func usage() {
    // Fprintf allows us to print to a specifed file handle or stream
    fmt.Fprintf(os.Stderr, "\nUsage: %s [flags] file [path ...]\n\n",
        "CommandLineFlag") // os.Args[0]
    flag.PrintDefaults()
    os.Exit(0)
}

func main() {
    fmt.Printf("Before parsing the flags\n")
    fmt.Printf("T: %d\nW: %s\nC: '%s'\n",
        *tabWidth, strconv.FormatBool(*write), *cpuprofile)

flag.Usage = usage
flag.Parse()

    // There is also a mandatory non-flag arguments
    if len(flag.Args()) < 1 {
        usage()
    }

    fmt.Printf("Testing the flag package\n")
    fmt.Printf("T: %d\nW: %s\nC: '%s'\n",
        *tabWidth, strconv.FormatBool(*write), *cpuprofile)

    for index, element := range flag.Args() {
        fmt.Printf("I: %d C: '%s'\n", index, element)
    }
}

这个例子中：
- 说明了三种类型标志的用法：Int、String和Bool。
- 说明了每个标志的定义都由类型、命令行选项文本、默认值以及含义解释组成。
- 最后说明了如何处理标志选项(flag option)以及非option参数。

不带参数运行：

$cmdlineflag
Before parsing the flags
T: 8
W: false
C: ''

Usage: CommandLineFlag [flags] file [path ...]

-cpuprofile="": write cpu profile to this file
        Default: no default
-tabwidth=8: tab width
        Default: Standard
-w=false: write result back instead of stdout
        Default: No write back

带命令行标志以及参数运行(一个没有flag，一个有两个flag)：

$cmdlineflag aa bb
Before parsing the flags
T: 8
W: false
C: ''
Testing the flag package
T: 8
W: false
C: ''
I: 0 C: 'aa'
I: 1 C: 'bb'

$cmdlineflag -tabwidth=2 -w aa
Before parsing the flags
T: 8
W: false
C: ''
Testing the flag package
T: 2
W: true
C: ''
I: 0 C: 'aa'

从例子可以看出，简单情形下，你无需编写自己的命令行parser或使用第三方包，使用go内建的flag包即可以很好的完成工作。但是golang的 flag包与命令行Parser的事实标准：Posix getopt（C/C++/Perl/Shell脚本都可用）相比，还有较大差距，主要体现在：

1、无法支持区分long option和short option，比如：-h和–help。
2、不支持short options合并，比如：ls -l -h <=> ls -hl
3、命令行标志的位置不能任意放置，比如无法放在non-flag parameter的后面。

不过毕竟flag是golang内置标准库包，你无须付出任何cost，就能使用它的功能。另外支持bool型的flag也是其一大亮点。

三、TOML，Go配置文件的事实标准（这个可能不能得到认同）

命令行虽然是一种可选的配置方案，但更多的时候，我们使用配置文件来存储静态的配置数据。就像Java配xml，ruby配yaml，windows配 ini，Go也有自己的搭配组合，那就是TOML（Tom's Obvious, Minimal Language）。

初看toml语法有些类似windows ini，但细致研究你会发现它远比ini强大的多，下面是一个toml配置文件例子：

# This is a TOML document. Boom.

title = "TOML Example"

[owner]
name = "Lance Uppercut"
dob = 1979-05-27T07:32:00-08:00 # First class dates? Why not?

[database]
server = "192.168.1.1"
ports = [ 8001, 8001, 8002 ]
connection_max = 5000
enabled = true

[servers]

# You can indent as you please. Tabs or spaces. TOML don't care.
[servers.alpha]
ip = "10.0.0.1"
dc = "eqdc10"

[servers.beta]
ip = "10.0.0.2"
dc = "eqdc10"

[clients]
data = [ ["gamma", "delta"], [1, 2] ]

# Line breaks are OK when inside arrays
hosts = [
"alpha",
"omega"
]

看起来很强大，也很复杂，但解析起来却很简单。以下面这个toml 文件为例：

Age = 25
Cats = [ "Cauchy", "Plato" ]
Pi = 3.14
Perfection = [ 6, 28, 496, 8128 ]
DOB = 1987-07-05T05:45:00Z

和所有其他配置文件parser类似，这个配置文件中的数据可以被直接解析成一个golang struct：

type Config struct {
Age int
Cats []string
Pi float64
Perfection []int
DOB time.Time // requires `import time`
}

其解析的步骤也很简单：

var conf Config
if _, err := toml.Decode(tomlData, &conf); err != nil {
// handle error
}

是不是简单的不能简单了！

不过toml也有其不足之处。想想如果你需要使用命令行选项的参数值来覆盖这些配置文件中的选项，你应该怎么做？事实上，我们常常会碰到类似下面这种三层配置结构的情况：

1、程序内内置配置项的初始默认值
2、配置文件中的配置项值可以覆盖(override)程序内配置项的默认值。
3、命令行选项和参数值具有最高优先级，可以override前两层的配置项值。

在go中，toml映射的结果体字段没有初始值。而且go内建flag包也没有将命令行参数值解析为一个go结构体，而是零散的变量。这些可以通过第三方工具来解决，但如果你不想用第三方工具，你也可以像下面这样自己解决，虽然难看一些。

func ConfigGet() *Config {
var err error
var cf *Config = NewConfig()

    // set default values defined in the program
    cf.ConfigFromFlag()
    //log.Printf("P: %d, B: '%s', F: '%s'\n", cf.MaxProcs, cf.Webapp.Path)

    // Load config file, from flag or env (if specified)
    _, err = cf.ConfigFromFile(*configFile, os.Getenv("APPCONFIG"))
    if err != nil {
        log.Fatal(err)
    }
    //log.Printf("P: %d, B: '%s', F: '%s'\n", cf.MaxProcs, cf.Webapp.Path)

    // Override values from command line flags
    cf.ConfigToFlag()
    flag.Usage = usage
    flag.Parse()
    cf.ConfigFromFlag()
    //log.Printf("P: %d, B: '%s', F: '%s'\n", cf.MaxProcs, cf.Webapp.Path)

cf.ConfigApply()

return cf
}

就像上面代码中那样，你需要：
1、用命令行标志默认值设置配置(cf)默认值。
2、接下来加载配置文件
3、用配置值(cf)覆盖命令行标志变量值
4、解析命令行参数
5、用命令行标志变量值覆盖配置(cf)值。

少一步你都无法实现三层配置能力。

四、超越TOML

本节将关注如何克服TOML的各种局限。

为了达成这个目标，很多人会说：使用viper，不过在介绍viper这一重量级选手之前，我要为大家介绍另外一位不那么知名的选手：multiconfig。

有些人总是认为大的就是好的，但我相信适合的还是更好的。因为：

1、viper太重量级，使用viper时你需要pull另外20个viper依赖的第三方包
2、事实上，viper单独使用还不足以满足需求，要想得到viper全部功能，你还需要另外一个包配合，而后者又依赖13个外部包
3、与viper相比，multiconfig使用起来更简单。

好了，我们再来回顾一下我们现在面临的问题：

1、在程序里定义默认配置，这样我们就无需再在toml中定义它们了。
2、用toml配置文件中的数据override默认配置
3、用命令行或环境变量的值override从toml中读取的配置。

下面是一个说明如何使用multiconfig的例子：

func main() {
m := multiconfig.NewWithPath("config.toml") // supports TOML and JSON

// Get an empty struct for your configuration
serverConf := new(Server)

// Populated the serverConf struct
m.MustLoad(serverConf) // Check for error

fmt.Println("After Loading: ")
fmt.Printf("%+v\n", serverConf)

    if serverConf.Enabled {
        fmt.Println("Enabled field is set to true")
    } else {
        fmt.Println("Enabled field is set to false")
    }
}

这个例子中的toml文件如下：

Name              = "koding"
Enabled           = false
Port              = 6066
Users             = ["ankara", "istanbul"]

[Postgres]
Enabled           = true
Port              = 5432
Hosts             = ["192.168.2.1", "192.168.2.2", "192.168.2.3"]
AvailabilityRatio = 8.23

toml映射后的go结构如下：

type (
    // Server holds supported types by the multiconfig package
    Server struct {
        Name     string
        Port     int `default:"6060"`
        Enabled bool
        Users    []string
        Postgres Postgres
    }

    // Postgres is here for embedded struct feature
    Postgres struct {
        Enabled           bool
        Port              int
        Hosts             []string
        DBName            string
        AvailabilityRatio float64
    }
)

multiconfig的使用是不是很简单，后续与viper对比后，你会同意我的观点的。

multiconfig支持默认值，也支持显式的字段赋值需求。
支持toml、json、结构体标签（struct tags)以及环境变量。
你可以自定义配置源（例如一个远程服务器），如果你想这么做的话。
可高度扩展（通过loader接口），你可以创建你自己的loader。

下面是例子的运行结果，首先是usage help：

$cmdlinemulticonfig -help
Usage of cmdlinemulticonfig:
-enabled=false: Change value of Enabled.
-name=koding: Change value of Name.
-port=6066: Change value of Port.
-postgres-availabilityratio=8.23: Change value of Postgres-AvailabilityRatio.
-postgres-dbname=: Change value of Postgres-DBName.
-postgres-enabled=true: Change value of Postgres-Enabled.
-postgres-hosts=[192.168.2.1 192.168.2.2 192.168.2.3]: Change value of Postgres-Hosts.
-postgres-port=5432: Change value of Postgres-Port.
-users=[ankara istanbul]: Change value of Users.

Generated environment variables:
   SERVER_NAME
   SERVER_PORT
   SERVER_ENABLED
   SERVER_USERS
   SERVER_POSTGRES_ENABLED
   SERVER_POSTGRES_PORT
   SERVER_POSTGRES_HOSTS
   SERVER_POSTGRES_DBNAME
   SERVER_POSTGRES_AVAILABILITYRATIO

$cmdlinemulticonfig
After Loading:
&{Name:koding Port:6066 Enabled:false Users:[ankara istanbul] Postgres:{Enabled:true Port:5432 Hosts:[192.168.2.1 192.168.2.2 192.168.2.3] DBName: AvailabilityRatio:8.23}}
Enabled field is set to false

检查一下输出结果吧，是不是每项都符合我们之前的预期呢！

五、Viper

我们的重量级选手viper(https://github.com/spf13/viper)该出场了！

毫无疑问，viper非常强大。但如果你想用命令行参数覆盖预定义的配置项值，viper自己还不足以。要想让viper爆发，你需要另外一个包配合，它就是cobra（https://github.com/spf13/cobra）。

不同于注重简化配置处理的multiconfig，viper让你拥有全面控制力。不幸的是，在得到这种控制力之前，你需要做一些体力活。

我们再来回顾一下使用multiconfig处理配置的代码：

func main() {
m := multiconfig.NewWithPath("config.toml") // supports TOML and JSON

// Get an empty struct for your configuration
serverConf := new(Server)

// Populated the serverConf struct
m.MustLoad(serverConf) // Check for error

fmt.Println("After Loading: ")
fmt.Printf("%+v\n", serverConf)

    if serverConf.Enabled {
        fmt.Println("Enabled field is set to true")
    } else {
        fmt.Println("Enabled field is set to false")
    }
}

这就是使用multiconfig时你要做的所有事情。现在我们来看看使用viper和cobra如何来完成同样的事情：

func init() {
mainCmd.AddCommand(versionCmd)

viper.SetEnvPrefix("DISPATCH")
viper.AutomaticEnv()

    /*
      When AutomaticEnv called, Viper will check for an environment variable any
      time a viper.Get request is made. It will apply the following rules. It
      will check for a environment variable with a name matching the key
      uppercased and prefixed with the EnvPrefix if set.
    */

flags := mainCmd.Flags()

    flags.Bool("debug", false, "Turn on debugging.")
    flags.String("addr", "localhost:5002", "Address of the service")
    flags.String("smtp-addr", "localhost:25", "Address of the SMTP server")
    flags.String("smtp-user", "", "User to authenticate with the SMTP server")
    flags.String("smtp-password", "", "Password to authenticate with the SMTP server")
    flags.String("email-from", "noreply@example.com", "The from email address.")

    viper.BindPFlag("debug", flags.Lookup("debug"))
    viper.BindPFlag("addr", flags.Lookup("addr"))
    viper.BindPFlag("smtp_addr", flags.Lookup("smtp-addr"))
    viper.BindPFlag("smtp_user", flags.Lookup("smtp-user"))
    viper.BindPFlag("smtp_password", flags.Lookup("smtp-password"))
    viper.BindPFlag("email_from", flags.Lookup("email-from"))

// Viper supports reading from yaml, toml and/or json files. Viper can
// search multiple paths. Paths will be searched in the order they are
// provided. Searches stopped once Config File found.

    viper.SetConfigName("CommandLineCV") // name of config file (without extension)
    viper.AddConfigPath("/tmp")          // path to look for the config file in
    viper.AddConfigPath(".")             // more path to look for the config files

    err := viper.ReadInConfig()
    if err != nil {
        println("No config file found. Using built-in defaults.")
    }
}

可以看出，你需要使用BindPFlag来让viper和cobra结合一起工作。但这还不算太糟。

cobra的真正威力在于提供了subcommand能力。同时cobra还提供了与posix 全面兼容的命令行标志解析能力，包括长短标志、内嵌命令、为command定义你自己的help或usage等。

下面是定义子命令的例子代码：

// The main command describes the service and defaults to printing the
// help message.
var mainCmd = &cobra.Command{
    Use:   "dispatch",
    Short: "Event dispatch service.",
    Long: `HTTP service that consumes events and dispatches them to subscribers.`,
    Run: func(cmd *cobra.Command, args []string) {
        serve()
    },
}

// The version command prints this service.
var versionCmd = &cobra.Command{
    Use:   "version",
    Short: "Print the version.",
    Long: "The version of the dispatch service.",
    Run: func(cmd *cobra.Command, args []string) {
        fmt.Println(version)
    },
}

有了上面subcommand的定义，我们就可以得到如下的help信息了：

Usage:
dispatch [flags]
dispatch [command]

Available Commands:
version Print the version.
help Help about any command

Flags:
      –addr="localhost:5002": Address of the service
      –debug=false: Turn on debugging.
      –email-from="noreply@example.com": The from email address.
-h, –help=false: help for dispatch
      –smtp-addr="localhost:25": Address of the SMTP server
      –smtp-password="": Password to authenticate with the SMTP server
      –smtp-user="": User to authenticate with the SMTP server

Use "dispatch help [command]" for more information about a command.

六、小结

以上例子的完整源码在作者的github repository里可以找到。

关于golang配置文件，我个人用到了toml这一层次，因为不需要太复杂的配置，不需要环境变量或命令行override默认值或配置文件数据。不过从作者的例子中可以看到multiconfig、viper的确强大，后续在实现复杂的golang应用时会考虑真正应用。

Appdash，用Go实现的分布式系统跟踪神器

六月 17, 2015
1 条评论

在“云”盛行的今天，分布式系统已不是什么新鲜的玩意儿。用脚也能想得出来：Google、baidu、淘宝、亚马逊、twitter等IT巨头背后的巨型计算平台都是分布式系统了，甚至就连一个简单的微信公众号应用的后端也都分布式了，即便仅有几台机器而已。分布式让系统富有弹性，面对纷繁变化的需求，可以伸缩自如。但分布式系统也给开发以及运维人员带来了难题：如何监控和优化分布式系统的行为。

以google为例，想象一下，用户通过浏览器发起一个搜索请求，Google后端可能会有成百上千台机器、多种编程语言实现的几十个、上百个应用服务开始忙碌起来，一起计算请求的返回结果。一旦这个过程中某一个环节出现问题/bug，那么查找和定位起来是相当困难的，于是乎分布式系统跟踪系统出炉了。Google在2010年发表了著名论文《Dapper, a Large-Scale Distributed Systems Tracing Infrastructure》(中文版在这里)。Dapper是google内部使用的一个分布式系统跟踪基础设施，与之前的一些跟踪系统相比，Dapper以低消耗、对应用透明以及良好的扩展性著称。并且 Google Dapper更倾向于性能数据方面的收集和调查，可以辅助开发人员和运维人员发现分布式系统的性能瓶颈并着手优化。Dapper出现后，各大巨头开始跟风，比如twitter的Zipkin（开源）、淘宝的“鹰眼”、eBay的Centralized Activity Logging (CAL)等，它们基本上都是参考google的dapper论文设计和实现的。

而本文将要介绍的Appdash则是sourcegraph开源的一款用Go实现的分布式系统跟踪工具套件，它同样是以google的 dapper为原型设计和实现的，目前用于sourcegraph平台的性能跟踪和监控。

一、原理

Appdash实现了Google dapper中的四个主要概念：

【Span】

Span指的是一个服务调用的跨度，在实现中用SpanId标识。根服务调用者的Span为根span（root span)，在根级别进行的下一级服务调用Span的Parent Span为root span。以此类推，服务调用链构成了一棵tree，整个tree构成了一个Trace。

Appdash中SpanId由三部分组成：TraceID/SpanID/parentSpanID，例如： 34c31a18026f61df/aab2a63e86ac0166/592043d0a5871aaf。TraceID用于唯一标识一次Trace。traceid在申请RootSpanID时自动分配。

在上面原理图中，我们也可以看到一次Trace过程中SpanID的情况。图中调用链大致是：

frontservice:
        call serviceA
        call serviceB
                call serviceB1
        … …
        call serviceN

对应服务调用的Span的树形结构如下：

frontservice: SpanId = xxxxx/nnnn1，该span为root span：traceid=xxxxx, spanid=nnnn1，parent span id为空。
serviceA: SpanId = xxxxx/nnnn2/nnnn1，该span为child span：traceid=xxxxx, spanid=nnnn2，parent span id为root span id:nnnn1。
serviceB: SpanId = xxxxx/nnnn3/nnnn1，该span为child span：traceid=xxxxx, spanid=nnnn3，parent span id为root span id:nnnn1。
… …
serviceN: SpanId = xxxxx/nnnnm/nnnn1，该span为child span：traceid=xxxxx, spanid=nnnnm，parent span id为root span id:nnnn1。
serviceB1: SpanId = xxxxx/nnnn3-1/nnnn3，该span为serviceB的child span，traceid=xxxxx, spanid=nnnn3-1，parent span id为serviceB的spanid：nnnn3

【Event】

个人理解在Appdash中Event是服务调用跟踪信息的wrapper。最终我们在Appdash UI上看到的信息，都是由event承载的并且发给Appdash Server的信息。在Appdash中，你可以显式使用event埋点，吐出跟踪信息，也可以使用Appdash封装好的包接口，比如 httptrace.Transport等发送调用跟踪信息，这些包的底层实现也是基于event的。event在传输前会被encoding为 Annotation的形式。

【Recorder】

在Appdash中，Recorder是用来发送event给Appdash的Collector的，每个Recorder会与一个特定的span相关联。

【Collector】

从Recorder那接收Annotation（即encoded event）。通常一个appdash server会运行一个Collector，监听某个跟踪信息收集端口，将收到的信息存储在Store中。

二、安装

appdash是开源的，通过go get即可得到源码并安装example：

go get -u sourcegraph.com/sourcegraph/appdash/cmd/…

appdash自带一个example，在examples/cmd/webapp下面。执行webapp，你会看到如下结果：

$webapp
2015/06/17 13:14:55 Appdash web UI running on HTTP :8700
[negroni] listening on :8699

这是一个集appdash server, frontservice, fakebackendservice于一身的example，其大致结构如下图：

通过浏览器打开:localhost:8700页面，你会看到appdash server的UI，通过该UI你可以看到所有Trace的全貌。

访问http://localhost:8699/，你就触发了一次Trace。在appdash server ui下可以看到如下画面：

从页面上展示的信息可以看出，该webapp在处理用户request时共进行了三次服务调用，三次调用的耗时分别为：201ms，202ms， 218ms，共耗时632ms。

一个更复杂的例子在cmd/appdash下面，后面的应用实例也是根据这个改造出来的，这里就不细说了。

三、应用实例

这里根据cmd/appdash改造出一个应用appdash的例子，例子的结构如下图：

例子大致分为三部分：
appdash — 实现了一个appdash server，该server带有一个collector，用于收集跟踪信息，收集后的信息存储在一个memstore中；appdash server提供ui，ui从memstore提取信息并展示在ui上供operator查看。
backendservices — 实现两个模拟的后端服务，供frontservice调用。
frontservice — 服务调用的起始端，当用户访问系统时触发一次跟踪。

先从backendservice这个简单的demo service说起，backendservice下有两个service: ServiceA和ServiceB，两个service几乎一模一样，我们看一个就ok了：

//appdash_examples/backendservices/serviceA.go
package main

import (
    "fmt"
    "net/http"
    "time"
)

func handleRequest(w http.ResponseWriter, r *http.Request) {
    var err error
    if err = r.ParseForm(); err != nil {
        fmt.Println("Http parse form err:", err)
        return
    }
    fmt.Println("SpanId =", r.Header.Get("Span-Id"))

time.Sleep(time.Millisecond * 101)
w.Write([]byte("service1 ok"))
}

func main() {
http.HandleFunc("/", handleRequest)
http.ListenAndServe(":6601", nil)
}

这是一个"hello world"级别的web server。值得注意的只有两点：
1、在handleRequest中我们故意Sleep 101ms，用来模拟服务的耗时。
2、打印出request头中的"Span-Id"选项值，用于跟踪Span-Id的分配情况。

接下来我们来看appdash server。appdash server = collector +store +ui。

//appdash.go
var c Server

func init() {
    c = Server{
        CollectorAddr: ":3001",
        HTTPAddr:      ":3000",
    }
}

type Server struct {
CollectorAddr string
HTTPAddr string
}

func main() {
    var (
        memStore = appdash.NewMemoryStore()
        Store    = appdash.Store(memStore)
        Queryer = memStore
    )

    app := traceapp.New(nil)
    app.Store = Store
    app.Queryer = Queryer

    var h http.Handler = app
    var l net.Listener
    var proto string
    var err error
    l, err = net.Listen("tcp", c.CollectorAddr)
    if err != nil {
        log.Fatal(err)
    }
    proto = "plaintext TCP (no security)"
    log.Printf("appdash collector listening on %s (%s)",
                c.CollectorAddr, proto)
    cs := appdash.NewServer(l, appdash.NewLocalCollector(Store))
    go cs.Start()

    log.Printf("appdash HTTP server listening on %s", c.HTTPAddr)
    err = http.ListenAndServe(c.HTTPAddr, h)
    if err != nil {
        fmt.Println("listenandserver listen err:", err)
    }
}

appdash中的Store是用来存储收集到的跟踪结果的，Store是Collector接口的超集，这个例子中，直接利用memstore(实现了 Collector接口)作为local collector，利用store的Collect方法收集trace数据。UI侧则从store中读取结果展示给用户。

最后我们说说：frontservice。frontservice是Trace的触发起点。当用户访问8080端口时，frontservice调用两个backend service：

//frontservice.go
func handleRequest(w http.ResponseWriter, r *http.Request) {
    var result string
    span := appdash.NewRootSpanID()
    fmt.Println("span is ", span)
    collector := appdash.NewRemoteCollector(":3001")

    httpClient := &http.Client{
        Transport: &httptrace.Transport{
            Recorder: appdash.NewRecorder(span, collector),
            SetName: true,
        },
    }

    //Service A
    resp, err := httpClient.Get("http://localhost:6601")
    if err != nil {
        log.Println("access serviceA err:", err)
    } else {
        log.Println("access serviceA ok")
        resp.Body.Close()
        result += "access serviceA ok\n"
    }

    //Service B
    resp, err = httpClient.Get("http://localhost:6602")
    if err != nil {
        log.Println("access serviceB err:", err)
        return
    } else {
        log.Println("access serviceB ok")
        resp.Body.Close()
        result += "access serviceB ok\n"
    }
    w.Write([]byte(result))
}

func main() {
http.HandleFunc("/", handleRequest)
http.ListenAndServe(":8080", nil)
}

从代码看，处理每个请求时都会分配一个root span，同时traceid也随之分配出来。例子中没有直接使用Recorder埋点发送event，而是利用了appdash封装好的 httptrace.Transport，在初始化httpClient时，将transport实例与span和一个remoteCollector想关联。后续每次调用httpClient进行Get/Post操作时，底层代码会自动调用httptrace.Transport的RoundTrip方法，后者在Request header上添加"Span-Id"参数，并调用Recorder的Event方法将跟踪信息发给RemoteCollector：

//appdash/httptrace/client.go
func (t *Transport) RoundTrip(req *http.Request) (*http.Response, error) {
    var transport http.RoundTripper
    if t.Transport != nil {
        transport = t.Transport
    } else {
        transport = http.DefaultTransport
    }

… …
req = cloneRequest(req)

    child := t.Recorder.Child()
    if t.SetName {
        child.Name(req.URL.Host)
    }
    SetSpanIDHeader(req.Header, child.SpanID)

e := NewClientEvent(req)
e.ClientSend = time.Now()

// Make the HTTP request.
resp, err := transport.RoundTrip(req)

    e.ClientRecv = time.Now()
    if err == nil {
        e.Response = responseInfo(resp)
    } else {
        e.Response.StatusCode = -1
    }
    child.Event(e)

return resp, err
}

这种方法在一定程度上实现了trace对应用的透明性。

你也可以显式的在代码中调用Recorder的Event的方法将trace信息发送给Collector，下面是一个fake SQLEvent的跟踪发送：

// SQL event
traceRec := appdash.NewRecorder(span, collector)
traceRec.Name("sqlevent example")

    // A random length for the trace.
    length := time.Duration(rand.Intn(1000)) * time.Millisecond
    startTime := time.Now().Add(-time.Duration(rand.Intn(100)) * time.Minute)
    traceRec.Event(&sqltrace.SQLEvent{
        ClientSend: startTime,
        ClientRecv: startTime.Add(length),
        SQL:        "SELECT * FROM table_name;",
        Tag:        fmt.Sprintf("fakeTag%d", rand.Intn(10)),
    })

不过这种显式埋点需要程序配合做一些改造。

四、小结

目前Appdash的资料甚少，似乎只是其东家sourcegraph在production环境有应用。在github.com上受到的关注度也不算高。

appdash是参考google dapper实现的，但目前来看appdash只是实现了“形”，也许称为神器有些言过其实^_^。

首先，dapper强调对应用透明，并使用了Thread LocalStorage。appdash实现了底层的recorder+event机制，上层通过httptrace、sqltrace做了封装，以降低对应用代码的侵入性。但从上面的应用来看，透明性还有很大提高空间。

其次，appdash的性能数据、扩展方案sourcegraph并没有给出明确说明。

不过作为用go实现的第一个分布式系统跟踪工具，appdash还是值得肯定的。在小规模分布式系统中应用对于系统行为的优化还是会有很大帮助的。

BTW，上述例子的完整源码在这里可以下载到。