Tony Bai » nginx

谁“杀”死了你的 HTTP 连接？—— 揭秘云环境下连接池配置的隐形陷阱

bigwhite — Tue, 25 Nov 2025 00:31:38 +0000

本文永久链接 – https://tonybai.com/2025/11/25/who-killed-your-http-connection-traps-of-connection-pooling

大家好，我是Tony Bai。

你是否在生产环境中遇到过偶现的 EOF、connection reset by peer 或 unexpected end of stream 错误？
你是否检查了代码逻辑、防火墙规则甚至抓了包，发现应用层一切正常，但请求就是偶尔会失败？
最令人费解的是，这往往发生在低频请求的场景下，或者系统刚从闲置状态“醒来”的时候。

很多开发者——无论是写 Android 的还是写 Go 的——往往将目光局限在代码逻辑层面。然而，在云原生时代，应用代码只是庞大网络链路中的一环。本文将以一个真实的跨云通信故障为引子，深入探讨 HTTP 连接池（Connection Pool）中 Idle Timeout 的机制，并以 Go 语言为例，给出最佳实践配置。

案发现场：一个“幽灵”般的报错

最近，我们在排查一个跨云调用的故障时发现了一个经典现象：

客户端：运行在容器内的应用，使用okhttp的 HTTP 连接池（Keep-Alive）。
服务端：部署在公有云上的 SaaS 服务，前端挂载了负载均衡器（LB）。
现象：偶现网络请求失败，报错 unexpected end of stream。
排查：客户端 SNAT 设置了长达 1 小时的 TCP 保持时间，网络链路非常稳定。服务端日志却显示“没收到请求”。

真相是：连接被“静默”关闭了。

在 HTTP Keep-Alive 机制下，为了性能，客户端会复用空闲的 TCP 连接。但是，每条连接都要经过复杂的网络链路：客户端 -> NAT 网关 -> 互联网 -> 负载均衡器 (LB) -> 服务端。

这是一个典型的“木桶效应”：连接的有效存活时间，取决于整条链路中超时时间最短的那个节点。

如果客户端的连接池认为连接能活 300秒(okhttp的默认值)，而中间的云厂商 LB 配置了 60秒 的空闲超时（Idle Timeout）：

连接空闲到第 61 秒，LB 默默切断了连接。
客户端毫不知情（因为没有发包，可能没收到 FIN/RST，或者收到了没处理）。
第 100 秒，客户端复用这条“僵尸连接”发请求，直接撞墙，报错 EOF。

Go 语言中的默认“陷阱”

在 Go 语言中，net/http 标准库提供了非常强大的连接池管理，主要由 http.Transport 结构体控制。但是，Go 的默认配置在现代云环境中也并不总是安全的。

让我们看看 Go (1.25.3) 的 DefaultTransport 源码片段：

var DefaultTransport RoundTripper = &Transport{
    Proxy: ProxyFromEnvironment,
    DialContext: defaultTransportDialContext(&net.Dialer{
        Timeout:   30 * time.Second,
        KeepAlive: 30 * time.Second, // TCP层面的KeepAlive探活间隔
    }),
    ForceAttemptHTTP2:     true,
    MaxIdleConns:          100,
    IdleConnTimeout:       90 * time.Second, // <--- 关键点在这里！
    TLSHandshakeTimeout:   10 * time.Second,
    ExpectContinueTimeout: 1 * time.Second,
}

注意看 IdleConnTimeout: 90 * time.Second。

这意味着，Go 的 HTTP 客户端默认会保持空闲连接 90秒。

冲突爆发点

现在主流公有云的负载均衡器（AWS ALB, 阿里云 SLB, Google LB 等）的默认 Idle Timeout 通常是多少？

AWS ALB: 默认为 60秒。
阿里云 SLB: 默认为 60秒 (TCP监听可能不同，但HTTP/7层通常较短)。
Nginx (默认): keepalive_timeout 往往设为 65秒 或 75秒。

风险显而易见： Go 客户端认为连接在 60~90 秒之间是可用的，但云端的 LB 已经在第 60 秒把它杀掉了。这就导致了那 30 秒的时间窗口内，复用连接必定失败。

黄金法则：连接池配置指南

要彻底解决这个问题，开发者（无论是 Go, Java 还是 Node.js）必须遵循一条核心的配置原则：

Client Idle Timeout < Infrastructure Idle Timeout < Server KeepAlive Timeout

客户端的空闲超时时间，必须小于链路中任何中间设备（LB, NAT, Firewall）的超时时间。

建议将客户端的空闲超时设置为 中间设备超时时间减去 5~10 秒 的安全缓冲。对于大多数公有云环境，30秒 ~ 45秒 是一个极其安全的数值。

Go 实战：如何正确配置 http.Client

不要直接使用 http.Get() 或 &http.Client{}（它们使用默认 Transport）。在生产级代码中，你应该总是显式定义 Transport。

关键参数详解

IdleConnTimeout (最重要):
- 含义: 一个连接在归还给连接池后，允许空闲多久。
- 建议: 30s – 45s。这能保证客户端主动关闭连接，而不是被动等待服务端发送 RST，从而避免复用“陈旧连接(Stale Connection)”。
MaxIdleConnsPerHost:
- 含义: 针对同一个目标 Host，连接池里最多保留多少个空闲连接。Go 的默认值是 2。
- 坑点: 在微服务高并发场景下，默认值 2 极小。这会导致请求并发上来时创建大量连接，请求处理完后只有 2 个能回池，剩下的全部被关闭。下次并发请求来时又要重新握手。
- 建议: 根据你的 QPS 估算，通常建议设为 10 ~ 50 甚至更高。
DisableKeepAlives:
- 调试用: 如果你实在搞不定网络问题，可以将其设为 true，强制短连接（用完即关）。但这会显著降低性能，仅用于排查问题。

最后的防线：重试机制

即使你配置了完美的 Timeout，网络抖动依然不可避免。连接池配置只能降低 Stale Connection(陈旧连接) 的概率，不能 100% 消除。

对于 幂等 (Idempotent) 的请求（如 GET, PUT, DELETE），应用层必须具备重试机制。

Go 标准库 net/http 默认不会自动重试。你可以使用优秀的开源库如 hashicorp/go-retryablehttp，或者自行实现简单的重试逻辑：

// 简单的重试逻辑伪代码
var err error
for i := 0; i < 3; i++ {
    resp, err = client.Do(req)
    if err == nil {
        return resp, nil
    }
    // 只有特定的错误才重试，比如连接重置
    if isConnectionReset(err) {
        continue
    }
    break
}

小结

Infrastructure as Code 并不意味着你的代码可以忽略 Infrastructure 的物理限制。

关于 HTTP 连接池，请记住这三点：

不要相信默认值：OkHttp 的 5分钟，Go 的 90秒，在 60秒超时的公有云 LB 面前都是隐患。
主动示弱：客户端的空闲超时一定要比服务端和中间网关短。让客户端主动回收连接，永远比被服务端强行切断要安全。
拥抱失败：配置合理的重试策略，是构建健壮分布式系统的必修课。

下次再遇到 unexpected end of stream，先别急着怀疑人生，去检查一下你的 IdleTimeout 设置吧！

还在为“复制粘贴喂AI”而烦恼？我的新专栏 《AI原生开发工作流实战》 将带你：

告别低效，重塑开发范式
驾驭AI Agent(Claude Code)，实现工作流自动化
从“AI使用者”进化为规范驱动开发的“工作流指挥家”

扫描下方二维码，开启你的AI原生开发之旅。

你的Go技能，是否也卡在了“熟练”到“精通”的瓶颈期？

想写出更地道、更健壮的Go代码，却总在细节上踩坑？
渴望提升软件设计能力，驾驭复杂Go项目却缺乏章法？
想打造生产级的Go服务，却在工程化实践中屡屡受挫？

继《Go语言第一课》后，我的《Go语言进阶课》终于在极客时间与大家见面了！

我的全新极客时间专栏《Tony Bai·Go语言进阶课》就是为这样的你量身打造！30+讲硬核内容，带你夯实语法认知，提升设计思维，锻造工程实践能力，更有实战项目串讲。

目标只有一个：助你完成从“Go熟练工”到“Go专家”的蜕变！现在就加入，让你的Go技能再上一个新台阶！

商务合作方式：撰稿、出书、培训、在线课程、合伙创业、咨询、广告合作。如有需求，请扫描下方公众号二维码，与我私信联系。

Go语言正在成为“老旧”生态的“新引擎”？从 FrankenPHP 和新版 TypeScript 编译器谈起

bigwhite — Wed, 06 Aug 2025 00:09:10 +0000

本文永久链接 – https://tonybai.com/2025/08/06/go-new-engine-of-old-languages

大家好，我是Tony Bai。

我先来描述一种编程语言生态，请你猜猜它是谁：

它诞生于 1995 年，旨在为当时一个叫“万维网”的新平台构建应用。起初只是个小项目，却在互联网泡沫中野蛮生长，成为史上用户最广的语言之一。它曾被“严肃”的程序员们嘲笑了几十年，但最终得到了科技巨头的加持，迎来了事业的第二春。如今，它正迈向 30 岁，而其生态中最重要的一环——它的一个超集语言的编译器，正在被 Go 语言 重写以驱动未来。

你的第一反应，很可能是 JavaScript 生态。完全正确。这个超集语言，就是 TypeScript。

但这段描述，同样完美地适用于另一个名字：PHP。它也诞生于 1995 年，同样在 Web 浪潮中崛起，同样被嘲笑，同样迎来了第二春，而现在，一个基于 Go 语言 的新项目，也正在驱动着它的未来。

这两种语言，就像是同一枚硬币的两面，共同定义了 Web 编程的客户端与服务器端。而今天，我想和你聊的，正是它们故事中那个令人意想不到的、与我们 Gopher 息息相关的交集——Go 语言的角色。

编程语言中的“丰田卡罗拉”

在深入主题之前，我们必须先理解 PHP 的生态位。一篇精彩的博文将其比作编程语言中的“丰田卡罗拉”——无聊、坚固、简单、实惠。

它或许永远不会出现在技术发布会最酷炫的 Demo 上，但它和它经典的 LAMP（Linux, Apache, MySQL, PHP）组合，让全世界数以百万计的普通开发者，能以最低的成本、最可靠的方式，解决一个最实际的问题：搭建一个能用的网站。

C++ 的创造者 Bjarne Stroustrup 有一句名言：“世界上只有两种语言：一种是被人拼命吐槽的，另一种是没人用的。”

PHP 显然属于前者。它曾被嘲笑为“糟糕设计的集合体”，但它也支撑着全球 70% 以上的网站。这个数字，无论你用何种挑剔的眼光审视，都无法否认其巨大的成功和顽强的生命力。

Go：一个意想不到的“新引擎”

多年以来，PHP 和 JavaScript 这两个庞大的生态，在各自的轨道上独立演进。但最近，一个令人瞩目的趋势正在浮现：Go 语言，正在成为驱动这两个“老旧”生态进行现代化改造的“新引擎”。

案例一：FrankenPHP – 用 Go 为 PHP “换心”

如果你经历过在容器时代部署 PHP 应用的痛苦，你一定对 Nginx + FPM + Supervisor 这套复杂而脆弱的“三件套”记忆犹新。配置繁琐、性能瓶颈、进程管理困难，每一个都是噩梦。

现在，FrankenPHP 出现了。这是一个用 Go 语言编写的、全新的、高性能的 PHP 应用服务器，最近已被 PHP 基金会正式采纳。

它的革命性在于：

部署极简：它是一个单一的静态 Go 二进制文件。部署一个 PHP 应用，现在只需要一个包含这个二进制文件和你的 PHP 代码的、极其简单的 Dockerfile。Nginx, FPM, Supervisor 通通被扔进了历史的垃圾堆。
性能卓越：它内置了一个基于 Caddy（另一个伟大的 Go 项目）的高性能 HTTP 服务器，并提供了全新的执行模型，性能远超传统模式。
能力强大：Go 强大的并发能力和成熟的网络库，让 FrankenPHP 天生具备了现代应用服务器所需的一切。

是 Go 语言，以一种釜底抽薪的方式，解决了 PHP 生态在云原生时代最大的部署和运维难题。

案例二：新版 TypeScript 编译器 – 用 Go 提速

无独有偶，在 Web 的另一端，JavaScript 生态也迎来了 Go 语言的赋能。微软最近宣布了一个激动人心的项目：用 Go 语言来重写 TypeScript 编译器。

TypeScript 作为 JavaScript 的超集，已经成为构建大型、复杂前端和后端应用的事实标准。它的编译器，是整个生态中至关重要的基础设施。

为什么选择 Go？答案同样简单而直接：性能，当然也有其他一些考虑。

编译器本质上是极其消耗 CPU 的密集型任务。随着 TypeScript 项目日益庞大和复杂，原有的编译器性能逐渐成为瓶颈。而 Go 语言，凭借其接近 C/C++ 的运行效率、卓越的并发模型以及内存安全保证，成为了构建下一代高性能编译器的理想选择。

Go 语言的新角色：从“建新城”到“改旧都”

这两个案例，揭示了 Go 语言一个正在崛起的新角色。

过去，我们谈论 Go，更多的是用它来构建全新的云原生微服务——我们用它在一片空地上“建新城”。但现在，我们看到，Go 凭借其三大核心优势，正在成为改造和赋能现有庞大技术生态的“基础设施底座”。我们开始用它来“改造旧都”。

这三大优势是：

极致的性能：对于需要压榨性能的系统工具（如编译器、服务器），Go 提供了一个远比 C/C++ 更安全、更具生产力的选择。
无与伦比的部署简便性：静态链接的单一二进制文件，是为容器和 DevOps 时代而生的“终极交付物”。
现代化的并发模型：Goroutine 和 Channel，为解决现代软件中无处不在的并发问题，提供了最优雅、最高效的语言级方案。

Go 语言，正在从一个单纯的应用开发语言，下沉为更底层的、为其他生态提供核心动力的“引擎层”。

结论：拥抱务实，而非追逐光环

PHP 的故事，以及它与 Go 的这段奇妙姻缘，带给我们最深刻的启示，是一种超越语言之争的工程实用主义精神。

真正的技术进步，不仅仅在于创造全新的、闪闪发光的东西，更在于用更强大的工具，去务实地优化、改造和盘活那些已经支撑着世界运转的庞大系统。这是一种更深沉、更具影响力的贡献。

而 Go 语言，正在这个伟大的进程中，扮演着越来越重要的角色。作为 Gopher，我们不仅在“建新城”，我们也在为这个数字世界的“旧都”，换上一个更强劲、更可靠的“新引擎”。这，或许是 Go 语言未来最激动人心的篇章之一。

资料链接：https://deprogrammaticaipsum.com/the-toyota-corolla-of-programming/

你的Go技能，是否也卡在了“熟练”到“精通”的瓶颈期？

想写出更地道、更健壮的Go代码，却总在细节上踩坑？
渴望提升软件设计能力，驾驭复杂Go项目却缺乏章法？
想打造生产级的Go服务，却在工程化实践中屡屡受挫？

继《Go语言第一课》后，我的《Go语言进阶课》终于在极客时间与大家见面了！

目标只有一个：助你完成从“Go熟练工”到“Go专家”的蜕变！现在就加入，让你的Go技能再上一个新台阶！

商务合作方式：撰稿、出书、培训、在线课程、合伙创业、咨询、广告合作。如有需求，请扫描下方公众号二维码，与我私信联系。

从简单到强大：再次探索Caddy服务器的魅力

bigwhite — Wed, 06 Nov 2024 22:46:44 +0000

本文永久链接 – https://tonybai.com/2024/11/07/exploring-caddy

Go语言诞生十多年来，社区涌现出众多优秀的Web服务器和反向代理解决方案。其中，最引人注目的无疑是Caddy和Traefik。这两者都为开发者和系统管理员提供了更简单、更安全的现代化Web服务器和反向代理部署选项。尽管它们的目标略有不同，Caddy最初旨在满足开发者快速搭建反向代理的需求，特别关注配置的简易性，并在后期增加了自动HTTPS和全面的API支持；而Traefik则更强调云原生架构，适合基于微服务的应用，尤其是使用Docker或Kubernetes部署的场景，提供动态服务发现和灵活的路由能力。

我于2015年首次体验了开源发布的Caddy，其超简单的配置确实给我留下了深刻的印象。之后也一直关注着Caddy的发展，Caddy在支持通过ACME协议自动为服务的域名获取免费HTTPS证书的功能后，Caddy就被我部署在自己的VPS上，为Gopher Daily等站点提供反向代理服务，运行十分稳定。Caddy这一为域名自动获取免费HTTPS证书的功能是其简化站点部署初衷的延续，也为Caddy赢得的广泛的用户和赞誉，并且这一特性不仅使得Caddy在个人项目和小型部署中大受欢迎，也让它在企业级应用中占有一席之地。

近10年后，我打算在这篇文章中再次探索一下Caddy，了解一下如今的Caddy都提供哪些强大的功能特性，为后续更好地使用Caddy做铺垫。

注：Caddy发展了近10年，支持了很多标准特性以及非标准特性(由社区提供，caddy官方不提供保证和support)，这里仅就笔者感兴趣的特性做探索。目前Caddy依靠sponsor的赞助进行着可持续演进，其所有标准功能都是免费的，但其作者Matt Holt也会为企业级赞助商进行定制功能开发。

1. Caddy的运行方法与基本配置

1.1 Caddy的启停

Caddy使用Go开发，因此继承了Go应用部署的一贯特点：只有一个可执行文件。将下载的Caddy放到\$PATH路径下，我们就可以在任意目录下执行它了：

$caddy version
v2.8.4 h1:q3pe0wpBj1OcHFZ3n/1nl4V4bxBrYoSoab7rL9BMYNk=

$caddy run
2024/10/11 07:56:24.664 INFO    admin   admin endpoint started  {"address": "localhost:2019", "enforce_origin": false, "origins": ["//127.0.0.1:2019", "//localhost:2019", "//[::1]:2019"]}

这么启动后，caddy就会作为一个前台进程一直运行着，直到你停掉它。当然，我们也可以使用start命令将caddy作为后台进程启动：

$caddy start
2024/10/11 08:32:07.557 INFO    admin   admin endpoint started  {"address": "localhost:2019", "enforce_origin": false, "origins": ["//127.0.0.1:2019", "//localhost:2019", "//[::1]:2019"]}
2024/10/11 08:32:07.557 INFO    serving initial configuration
Successfully started Caddy (pid=31215) - Caddy is running in the background

使用stop命令可以停到该后台进程：

$caddy stop
2024/10/11 08:32:37.043 INFO    admin.api   received request    {"method": "POST", "host": "localhost:2019", "uri": "/stop", "remote_ip": "127.0.0.1", "remote_port": "65178", "headers": {"Accept-Encoding":["gzip"],"Content-Length":["0"],"Origin":["http://localhost:2019"],"User-Agent":["Go-http-client/1.1"]}}
2024/10/11 08:32:37.043 WARN    admin.api   exiting; byeee!!
2024/10/11 08:32:37.043 INFO    admin   stopped previous server {"address": "localhost:2019"}
2024/10/11 08:32:37.043 INFO    admin.api   shutdown complete   {"exit_code": 0}

1.2 使用Caddyfile配置站点信息

不过如此启动后的caddy并没有什么卵用，因为没有任何关于站点的配置信息。但caddy提供了config API（默认使用2019端口），我们可以使用下面方式访问该API：

$curl localhost:2019/config/
null

由于没有任何配置数据，该接口返回null。Caddy提供了强大的API可以在Caddy运行是动态设置站点配置信息，这个我们后续再说，因为首次使用Caddy时，开发者通常更愿意使用Caddyfile来提供初始配置信息，Caddyfile也是最初caddy开源时唯一支持的配置方式。我们以server1.com为例来看看在本地使用caddy为其建立反向代理有多简单。下面是Caddyfile的内容：

server1.com {
    tls internal
    reverse_proxy localhost:9001
}

然后我们基于该Caddyfile启动caddy，如果不显式传入配置文件，caddy默认使用当前目录(cwd)下的Caddyfile作为配置文件：

$caddy run
2024/10/11 08:49:36.916 INFO    using adjacent Caddyfile
2024/10/11 08:49:36.920 INFO    adapted config to JSON  {"adapter": "caddyfile"}
2024/10/11 08:49:36.926 INFO    admin   admin endpoint started  {"address": "localhost:2019", "enforce_origin": false, "origins": ["//localhost:2019", "//[::1]:2019", "//127.0.0.1:2019"]}
2024/10/11 08:49:36.928 INFO    tls.cache.maintenance   started background certificate maintenance  {"cache": "0xc0005add80"}
2024/10/11 08:49:36.936 INFO    http.auto_https server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS {"server_name": "srv0", "https_port": 443}
2024/10/11 08:49:36.936 INFO    http.auto_https enabling automatic HTTP->HTTPS redirects    {"server_name": "srv0"}
2024/10/11 08:49:36.964 WARN    pki.ca.local    installing root certificate (you might be prompted for password)    {"path": "storage:pki/authorities/local/root.crt"}
2024/10/11 08:49:37.024 INFO    warning: "certutil" is not available, install "certutil" with "brew install nss" and try again
2024/10/11 08:49:37.024 INFO    define JAVA_HOME environment variable to use the Java trust
Password:
2024/10/11 08:49:41.629 INFO    certificate installed properly in macOS keychain
2024/10/11 08:49:41.629 INFO    http    enabling HTTP/3 listener    {"addr": ":443"}
2024/10/11 08:49:41.632 INFO    http.log    server running  {"name": "srv0", "protocols": ["h1", "h2", "h3"]}
2024/10/11 08:49:41.632 INFO    http.log    server running  {"name": "remaining_auto_https_redirects", "protocols": ["h1", "h2", "h3"]}
2024/10/11 08:49:41.632 INFO    http    enabling automatic TLS certificate management   {"domains": ["server1.com"]}
2024/10/11 08:49:41.656 INFO    tls cleaning storage unit   {"storage": "FileStorage:/Users/tonybai/Library/Application Support/Caddy"}
2024/10/11 08:49:41.656 INFO    autosaved config (load with --resume flag)  {"file": "/Users/tonybai/Library/Application Support/Caddy/autosave.json"}
2024/10/11 08:49:41.656 INFO    serving initial configuration
2024/10/11 08:49:41.657 INFO    tls finished cleaning storage units
2024/10/11 08:49:41.657 INFO    tls.obtain  acquiring lock  {"identifier": "server1.com"}
2024/10/11 08:49:41.676 INFO    tls.obtain  lock acquired   {"identifier": "server1.com"}
2024/10/11 08:49:41.676 INFO    tls.obtain  obtaining certificate   {"identifier": "server1.com"}
2024/10/11 08:49:41.684 INFO    tls.obtain  certificate obtained successfully   {"identifier": "server1.com", "issuer": "local"}
2024/10/11 08:49:41.685 INFO    tls.obtain  releasing lock  {"identifier": "server1.com"}
2024/10/11 08:49:41.686 WARN    tls stapling OCSP   {"error": "no OCSP stapling for [server1.com]: no OCSP server specified in certificate", "identifiers": ["server1.com"]}

这段日志“信息量”很大，我们后面一点点来看。现在我们先验证一下caddy启动后是否能成功访问到server1.com这个“站点”，拓扑图如下：

server1.com的程序如下：

// server1.go
package main

import (
    "fmt"
    "net/http"
)

func handler(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintln(w, "hello, server1.com")
}

func main() {
    http.HandleFunc("/", handler)
    fmt.Println("Server is listening on port 9001...")
    if err := http.ListenAndServe("localhost:9001", nil); err != nil {
        fmt.Println("Error starting server:", err)
    }
}

启动server1后，我们使用curl访问server1.com（注：请先将server1.com放入/etc/hosts中，映射到本地127.0.0.1）：

$go run server1.go
$curl https://server1.com
hello, server1.com

是不是非常简单 – 短短几行配置就能在本地搭建出一个可以测试https站点的环境！

1.3 Caddyfile背后的那些事儿

现在是时候基于上面caddy run之后输出的日志以及Caddyfile的内容来说说caddy的一些运行机制了。

首先，当前版本的Caddy的默认配置信息格式已经不再是我们在Caddyfile中看到的那样了，而是改为了json格式。虽然上面我们是基于Caddyfile启动的caddy，但实际上caddy程序会在内部启用caddyfile adapt，将Caddyfile的格式转换为json格式后，再作为配置信息提供给caddy的后续逻辑：

比如上面的Caddyfile被转换为json后的配置如下：

{
  "apps": {
    "http": {
      "servers": {
        "srv0": {
          "listen": [
            ":443"
          ],
          "routes": [
            {
              "handle": [
                {
                  "handler": "subroute",
                  "routes": [
                    {
                      "handle": [
                        {
                          "handler": "reverse_proxy",
                          "upstreams": [
                            {
                              "dial": "localhost:9001"
                            }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ],
              "match": [
                {
                  "host": [
                    "server1.com"
                  ]
                }
              ],
              "terminal": true
            }
          ]
        }
      }
    },
    "tls": {
      "automation": {
        "policies": [
          {
            "issuers": [
              {
                "module": "internal"
              }
            ],
            "subjects": [
              "server1.com"
            ]
          }
        ]
      }
    }
  }
}

当然caddy也支持直接将该json格式配置作为启动时所需的初始配置文件：

$caddy run --config caddy.json

即便是基于Caddyfile启动，caddy也会将当前配置自动保存起来(以下是macOS下启动caddy的日志)：

2024/10/11 08:49:41.656 INFO    autosaved config (load with --resume flag)  {"file": "/Users/tonybai/Library/Application Support/Caddy/autosave.json"}

注：linux上caddy默认保存config的位置为/var/lib/caddy/.config/caddy/autosave.json。

正如日志中所提到的，下次启动时如果带上了–resume标志位，Caddy会基于自动保存的json配置文件启动！

如果caddy启动时带有–resume标志位，但在指定路径下找不到autosave.json时，它就会基于当前目录下的Caddyfile启动，除非使用–config指定配置文件。

在Caddyfile的server1.com site block中，我们使用tls directive：

server1.com {
    tls internal
    reverse_proxy localhost:9001
}

tls directive的值是internal，意味着使用Caddy的内部、本地受信任的CA为本站点生成证书。Caddy会在本地创建自签的CA(默认名字是local)，并会尝试将自建的CA根证书安装到系统信任存储区，当以非特权用户运行Caddy时，可能会让你输入sudo用户的密码。接下来，Caddy就会用该CA为像server1.com这样的域名签发证书了。在macOS的用户的Library/Application Support/Caddy下我们能看到CA相关和为站点域名生成的相关私钥和证书：

➜  /Users/tonybai/Library/Application Support/Caddy git:(master) ✗ $tree
.
├── autosave.json
├── certificates
│   └── local
│       └── server1.com
│           ├── server1.com.crt
│           ├── server1.com.json
│           └── server1.com.key
├── instance.uuid
├── last_clean.json
├── locks
└── pki
    └── authorities
        └── local
            ├── intermediate.crt
            ├── intermediate.key
            ├── root.crt
            └── root.key

1.4 四层代理配置和grpc

日常工作中，除了http/https代理，还有两个最常见的反向代理和负载均衡配置，一个是纯四层的Raw TCP和UDP，另外一个则是RPC(以gRPC最为广泛)。那么Caddy对这两种情况支持的如何呢？我们接下来就来看看。

1.4.1 Raw TCP和UDP

Caddy正式版目前不支持四层反向代理和负载均衡，但通过一些插件可以支持，其中mholt/caddy-l4是其中最著名的，这也是由Caddy作者建立的项目，但目前还处于WIP状态，可以体验，但不建议用于生产环境。

由于Caddy是Go实现的，Go对插件实现的方案方面不是很友好，Caddy采用了重新编译的方案，但提供了名为xcaddy的构建工具可以十分方便的支持带有插件的caddy编译，这也算将Go在编译方面的优势充分利用了起来了。

如果本地已经安装了go，那么安装xcaddy十分方便：

$go install github.com/caddyserver/xcaddy/cmd/xcaddy@latest
go: downloading github.com/caddyserver/xcaddy v0.4.2
go: downloading github.com/Masterminds/semver/v3 v3.2.1
go: downloading github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510
go: downloading github.com/josephspurrier/goversioninfo v1.4.0
go: downloading github.com/akavel/rsrc v0.10.2

接下来，我们就以用xcaddy编译带有mholt/caddy-l4插件了，这个过程大约持续1-2分钟吧，主要是下载依赖包耗时较长：

$xcaddy build --with github.com/mholt/caddy-l4
2024/10/11 12:31:46 [INFO] absolute output file path: /Users/tonybai/caddy
2024/10/11 12:31:46 [INFO] Temporary folder: /Users/tonybai/buildenv_2024-10-17-1231.4160508500
2024/10/11 12:31:46 [INFO] Writing main module: /Users/tonybai/buildenv_2024-10-17-1231.4160508500/main.go
package main

import (
    caddycmd "github.com/caddyserver/caddy/v2/cmd"

    // plug in Caddy modules here
    _ "github.com/caddyserver/caddy/v2/modules/standard"
    _ "github.com/mholt/caddy-l4"
)

func main() {
    caddycmd.Main()
}
2024/10/11 12:31:46 [INFO] Initializing Go module
2024/10/11 12:31:46 [INFO] exec (timeout=0s): /Users/tonybai/.bin/go1.23.0/bin/go mod init caddy
go: creating new go.mod: module caddy
go: to add module requirements and sums:
    go mod tidy
2024/10/11 12:31:46 [INFO] Pinning versions
2024/10/11 12:31:46 [INFO] exec (timeout=0s): /Users/tonybai/.bin/go1.23.0/bin/go get -d -v github.com/caddyserver/caddy/v2
go: -d flag is deprecated. -d=true is a no-op
go: downloading github.com/caddyserver/caddy v1.0.5
go: downloading github.com/caddyserver/caddy/v2 v2.8.4
go: downloading github.com/caddyserver/certmagic v0.21.3
go: downloading github.com/prometheus/client_golang v1.19.1
go: downloading github.com/quic-go/quic-go v0.44.0
go: downloading github.com/cespare/xxhash v1.1.0
go: downloading go.uber.org/zap/exp v0.2.0
go: downloading golang.org/x/term v0.20.0
go: downloading golang.org/x/time v0.5.0
go: downloading go.uber.org/multierr v1.11.0
... ...
go: added golang.org/x/term v0.20.0
go: added golang.org/x/text v0.15.0
go: added golang.org/x/time v0.5.0
go: added golang.org/x/tools v0.21.0
go: added google.golang.org/protobuf v1.34.1
2024/10/11 12:31:53 [INFO] exec (timeout=0s): /Users/tonybai/.bin/go1.23.0/bin/go get -d -v github.com/mholt/caddy-l4 github.com/caddyserver/caddy/v2
go: -d flag is deprecated. -d=true is a no-op
go: downloading github.com/mholt/caddy-l4 v0.0.0-20241012124037-5764d700c21c
go: accepting indirect upgrade from github.com/google/pprof@v0.0.0-20231212022811-ec68065c825e to v0.0.0-20240207164012-fb44976bdcd5
go: accepting indirect upgrade from github.com/miekg/dns@v1.1.59 to v1.1.62
go: accepting indirect upgrade from github.com/onsi/ginkgo/v2@v2.13.2 to v2.15.0
go: accepting indirect upgrade from golang.org/x/crypto@v0.23.0 to v0.28.0
go: accepting indirect upgrade from golang.org/x/mod@v0.17.0 to v0.18.0
go: accepting indirect upgrade from golang.org/x/net@v0.25.0 to v0.30.0
... ...
go: upgraded golang.org/x/sys v0.20.0 => v0.26.0
go: upgraded golang.org/x/term v0.20.0 => v0.25.0
go: upgraded golang.org/x/text v0.15.0 => v0.19.0
go: upgraded golang.org/x/time v0.5.0 => v0.7.0
go: upgraded golang.org/x/tools v0.21.0 => v0.22.0
2024/10/11 12:32:10 [INFO] exec (timeout=0s): /Users/tonybai/.bin/go1.23.0/bin/go get -d -v
go: -d flag is deprecated. -d=true is a no-op
go: downloading github.com/go-chi/chi/v5 v5.0.12
go: downloading gopkg.in/natefinch/lumberjack.v2 v2.2.1
go: downloading github.com/fxamacker/cbor/v2 v2.6.0
go: downloading github.com/google/go-tpm v0.9.0
... ...
go: downloading github.com/google/certificate-transparency-go v1.1.8-0.20240110162603-74a5dd331745
go: downloading github.com/go-logr/stdr v1.2.2
go: downloading github.com/cenkalti/backoff/v4 v4.2.1
go: downloading github.com/grpc-ecosystem/grpc-gateway/v2 v2.18.0
2024/10/11 12:32:15 [INFO] Build environment ready
2024/10/11 12:32:15 [INFO] Building Caddy
2024/10/11 12:32:15 [INFO] exec (timeout=0s): /Users/tonybai/.bin/go1.23.0/bin/go mod tidy -e
go: downloading github.com/onsi/gomega v1.30.0
... ...
go: downloading golang.org/x/oauth2 v0.20.0
go: downloading cloud.google.com/go/auth/oauth2adapt v0.2.2
go: downloading github.com/google/s2a-go v0.1.7
go: downloading cloud.google.com/go/compute/metadata v0.3.0
go: downloading cloud.google.com/go/compute v1.24.0
go: downloading go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.49.0
go: downloading github.com/googleapis/enterprise-certificate-proxy v0.3.2
2024/10/11 12:32:31 [INFO] exec (timeout=0s): /Users/tonybai/.bin/go1.23.0/bin/go build -o /Users/tonybai/caddy -ldflags -w -s -trimpath -tags nobadger
2024/10/11 12:33:22 [INFO] Build complete: ./caddy
2024/10/11 12:33:22 [INFO] Cleaning up temporary folder: /Users/tonybai/buildenv_2024-10-17-1231.4160508500

././caddy version
v2.8.4 h1:q3pe0wpBj1OcHFZ3n/1nl4V4bxBrYoSoab7rL9BMYNk=

编译后得到的caddy放在当前目录下：

$./caddy version
v2.8.4 h1:q3pe0wpBj1OcHFZ3n/1nl4V4bxBrYoSoab7rL9BMYNk=

为了与原先的caddy做区分，我们将新编译出来的caddy重命名为caddy-with-l4。下面我们就来看一个四层负载均衡的示例，先看一下Caddyfile的配置：

{
    layer4 {
        127.0.0.1:5000 {
            route {
                proxy localhost:9003 localhost:9004 {
                    lb_policy round_robin
                }
            }
        }
    }
}

这个配置非常好理解！如下面示意图，caddy将来自客户端到5000端口的连接按照round robin负载均衡算法分配到后面的两个服务localhost:9003和localhost:9004上：

看完TCP，我们再来看看UDP的反向代理的例子，我们修改一下Caddyfile：

{
    layer4 {
        udp/127.0.0.1:5000 {
            route {
                proxy udp/localhost:9005 udp/localhost:9006 {
                    lb_policy round_robin
                }
            }
        }
    }
}

这个配置同样非常好理解！如下面示意图，caddy将来自客户端到5000端口的udp连接按照round robin负载均衡算法分配到后面的两个服务localhost:9005和localhost:9006上：

注：关于上面两个tcp和udp的示例的client端和server端的代码，可以在github.com/bigwhite/experiments下的caddy-examples中找到，这里鉴于篇幅，就不贴出来了。

接下来，我们再看看RPC。

1.4.2 RPC

我们以最为流行的gRPC为例，来看看如何配置Caddy，试验拓扑如下：

请提前将rpc-server.com配置到/etc/hosts中，ip为localhost。然后，根据上面拓扑图，我们将Caddyfile更新为下面内容：

rpc-server.com {
    tls internal
    reverse_proxy h2c://localhost:9007 h2c://localhost:9008
}

gRPC使用HTTP/2帧，h2c://可以确保后端启用明文HTTP/2。

注：关于gRPC的grpc-client、grpc-server1和grpc-server2的代码，可以在github.com/bigwhite/experiments下的caddy-examples的rpc目录中找到，这里鉴于篇幅，就不贴出来了。

到这里，关于Caddy的运行方法以及针对各种协议的基本配置方法已经初步探索完了，接下来我们再来看一下Caddy的另一个强大的功能：基于API的运行时动态配置。

2. 运行时使用API对Caddy进行动态配置

Caddy提供了admin和config API，允许我们在运行时动态配置和管理服务器。前面提到过，Caddy默认的API端口和路径是http://localhost:2019/config/。不过，需要注意的是：通过API设置的路由配置仅存储在内存中，并未持久化。这意味着当Caddy服务器重启后，如果没有使用–resume恢复autosave.json中的配置，那么之前通过API进行的各种设置将失效。

在Caddy提供的API中，我们最关心的还是与服务器(server)、路由(routes)、处理器(handle)以及匹配器(match)的设置，以下面Caddyfile所表示的https服务器设置为例：

server1.com {
    tls internal
    reverse_proxy localhost:9001
}
server2.com {
    tls internal
    reverse_proxy localhost:9002 localhost:9012
}

该Caddyfile对应的拓扑图如下：

该Caddyfile转换为JSON格式后的配置数据如下：

{
  "apps": {
    "http": {
      "servers": {
        "srv0": {
          "listen": [
            ":443"
          ],
          "routes": [
            {
              "handle": [
                {
                  "handler": "subroute",
                  "routes": [
                    {
                      "handle": [
                        {
                          "handler": "reverse_proxy",
                          "upstreams": [
                            {
                              "dial": "localhost:9001"
                            }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ],
              "match": [
                {
                  "host": [
                    "server1.com"
                  ]
                }
              ],
              "terminal": true
            },
            {
              "handle": [
                {
                  "handler": "subroute",
                  "routes": [
                    {
                      "handle": [
                        {
                          "handler": "reverse_proxy",
                          "upstreams": [
                            {
                              "dial": "localhost:9002"
                            },
                            {
                              "dial": "localhost:9012"
                            }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ],
              "match": [
                {
                  "host": [
                    "server2.com"
                  ]
                }
              ],
              "terminal": true
            }
          ]
        }
      }
    },
    "tls": {
      "automation": {
        "policies": [
          {
            "issuers": [
              {
                "module": "internal"
              }
            ],
            "subjects": [
              "server1.com",
              "server2.com"
            ]
          }
        ]
      }
    }
  }
}

其中，我们关注的服务器(server)、路由(routes)、处理器(handle)和匹配器(match)之间的隶属关系如下图，其他配置将由Caddy自动完成：

接下来，我们就基于这个示例，来看看通过Caddy API如何完成一些常见的站点设置操作。

2.1 POST /load

我们先看看整体替换的POST /load接口。通过该接口，我们可以用新的Caddy配置整体覆盖当前生效的Caddy配置，Caddy收到这个请求后，会阻塞住该调用，直到新配置加载完成或加载失败才会返回。如果加载失败，Caddy会回滚之前的配置。与caddy reload命令一样，该接口可以实现不停机更新并生效配置，无论是加载成功还是加载失败回滚。

下面我们修改一下上面json，将server2.com路由中的那个监听9012的upstream server去掉，并保存为caddy-load.json。如果担心自己修改的配置信息不正确，可以在调用接口之前，先用caddy validate对caddy-load.json进行有效性检查：

$caddy validate -c caddy-load.json
2024/10/11 02:50:28.649 INFO    using config from file  {"file": "caddy-load.json"}
2024/10/11 02:50:28.651 INFO    tls.cache.maintenance   started background certificate maintenance  {"cache": "0xc00012dd00"}
2024/10/11 02:50:28.652 INFO    http.auto_https server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS {"server_name": "srv0", "https_port": 443}
2024/10/11 02:50:28.652 INFO    http.auto_https enabling automatic HTTP->HTTPS redirects    {"server_name": "srv0"}
2024/10/11 02:50:28.652 INFO    tls.cache.maintenance   stopped background certificate maintenance  {"cache": "0xc00012dd00"}
Valid configuration

然后用下面curl命令调用load接口尝试新配置加载：

$curl "http://localhost:2019/load" \
    -H "Content-Type: application/json" \
    -d @caddy-load.json

此时Caddy会输出类似如下日志：

2024/10/11 02:53:15.191 INFO    admin.api   received request    {"method": "POST", "host": "localhost:2019", "uri": "/load", "remote_ip": "127.0.0.1", "remote_port": "60898", "headers": {"Accept":["*/*"],"Content-Length":["1968"],"Content-Type":["application/json"],"Expect":["100-continue"],"User-Agent":["curl/7.54.0"]}}
2024/10/11 02:53:15.226 INFO    admin   admin endpoint started  {"address": "localhost:2019", "enforce_origin": false, "origins": ["//[::1]:2019", "//127.0.0.1:2019", "//localhost:2019"]}
2024/10/11 02:53:15.240 INFO    http.auto_https server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS {"server_name": "srv0", "https_port": 443}
2024/10/11 02:53:15.240 INFO    http.auto_https enabling automatic HTTP->HTTPS redirects    {"server_name": "srv0"}
2024/10/11 02:53:15.254 INFO    pki.ca.local    root certificate is already trusted by system   {"path": "storage:pki/authorities/local/root.crt"}
2024/10/11 02:53:15.256 INFO    http    enabling HTTP/3 listener    {"addr": ":443"}
2024/10/11 02:53:15.257 INFO    http.log    server running  {"name": "srv0", "protocols": ["h1", "h2", "h3"]}
2024/10/11 02:53:15.257 INFO    http.log    server running  {"name": "remaining_auto_https_redirects", "protocols": ["h1", "h2", "h3"]}
2024/10/11 02:53:15.257 INFO    http    enabling automatic TLS certificate management   {"domains": ["server1.com", "server2.com"]}
2024/10/11 02:53:15.257 INFO    http    servers shutting down with eternal grace period
2024/10/11 02:53:15.258 INFO    autosaved config (load with --resume flag)  {"file": "/Users/tonybai/Library/Application Support/Caddy/autosave.json"}
2024/10/11 02:53:15.258 INFO    admin.api   load complete
2024/10/11 02:53:15.263 INFO    admin   stopped previous server {"address": "localhost:2019"}

更新后，你可以通过config API或autosaved.json查看变更后的配置，也可以通过测试验证新配置是否生效。

不过，这种整体替换显然更容易失败，如果Caddy代理的站点路由很多，json文件的Size也不可小觑。此外，要维护全量的配置，还要对Caddy的配置有较为系统的了解。在日常维护中，按配置路径更新局部配置更为实用一些，接下来我们就来看看如何基于配置路径管理服务器(server)、路由(routes)、处理器(handle)以及匹配器(match)的设置。

2.2 /config/[path]

通过在config后面加上要操作的配置路径，我们可以读取和更新对应路径上的配置信息。

2.2.1 读取特定路径下的配置

使用Http Get请求，可以读取在/config后面的指定路径上的配置。

读取全部

$curl "http://localhost:2019/config/"

读取所有服务器(server)配置

$curl "http://localhost:2019/config/apps/http/servers"
{"srv0":{"listen":[":443"],"routes":[{"handle":[{"handler":"subroute","routes":[{"handle":[{"handler":"reverse_proxy","upstreams":[{"dial":"localhost:9001"}]}]}]}],"match":[{"host":["server1.com"]}],"terminal":true},{"handle":[{"handler":"subroute","routes":[{"handle":[{"handler":"reverse_proxy","upstreams":[{"dial":"localhost:9002"},{"dial":"localhost:9012"}]}]}]}],"match":[{"host":["server2.com"]}],"terminal":true}]}}

读取某个服务器(server)的配置

以srv0为例：

$curl "http://localhost:2019/config/apps/http/servers/srv0"
{"listen":[":443"],"routes":[{"handle":[{"handler":"subroute","routes":[{"handle":[{"handler":"reverse_proxy","upstreams":[{"dial":"localhost:9001"}]}]}]}],"match":[{"host":["server1.com"]}],"terminal":true},{"handle":[{"handler":"subroute","routes":[{"handle":[{"handler":"reverse_proxy","upstreams":[{"dial":"localhost:9002"},{"dial":"localhost:9012"}]}]}]}],"match":[{"host":["server2.com"]}],"terminal":true}]}

读取srv0的listen配置

$curl "http://localhost:2019/config/apps/http/servers/srv0/listen/"
[":443"]

读取srv0的所有路由

$curl "http://localhost:2019/config/apps/http/servers/srv0/routes/"
[{"handle":[{"handler":"subroute","routes":[{"handle":[{"handler":"reverse_proxy","upstreams":[{"dial":"localhost:9001"}]}]}]}],"match":[{"host":["server1.com"]}],"terminal":true},{"handle":[{"handler":"subroute","routes":[{"handle":[{"handler":"reverse_proxy","upstreams":[{"dial":"localhost:9002"},{"dial":"localhost:9012"}]}]}]}],"match":[{"host":["server2.com"]}],"terminal":true}]

路由是一个数组，要读取某个路由，可以使用数组下标，比如：

$curl "http://localhost:2019/config/apps/http/servers/srv0/routes/0/"
{"handle":[{"handler":"subroute","routes":[{"handle":[{"handler":"reverse_proxy","upstreams":[{"dial":"localhost:9001"}]}]}]}],"match":[{"host":["server1.com"]}],"terminal":true}

读取某路由的handle和match

$curl "http://localhost:2019/config/apps/http/servers/srv0/routes/0/handle/"
[{"handler":"subroute","routes":[{"handle":[{"handler":"reverse_proxy","upstreams":[{"dial":"localhost:9001"}]}]}]}]

$curl "http://localhost:2019/config/apps/http/servers/srv0/routes/0/match/"
[{"host":["server1.com"]}]

我们看到，就像上面这样按配置路径逐步细化，便可以读取到所有对应的配置，遇到数组类型，可以使用下标读取对应的“数组元素”的配置。

接下来，我们再来看看基于路径的配置修改方法。

2.2.2 更新特定路径下的配置

使用Http Post请求，可以创建或更新在/config后面的指定路径上的配置。如果指定路径对应的配置目标为一个数组，则POST会将json作为元素追加到数组中；如果目标是一个对象，则post会基于json信息创建新对象或更新对象。

我们先以apps/http/servers/srv0/listen/这个数组对象为例，为其添加一个新元素”:80″：

$curl -H "Content-Type: application/json" -d '":80"' "http://localhost:2019/config/apps/http/servers/srv0/listen"

成功之后，我们可以看到listen数组的变化：

$curl "http://localhost:2019/config/apps/http/servers/srv0/listen"
[":443",":80"]

如果是要更改某个数组元素，我们可以使用PATCH请求，比如将刚刚创建的”:80″改为”:90″：

$curl -X PATCH -H "Content-Type: application/json" -d '":90"' "http://localhost:2019/config/apps/http/servers/srv0/listen/1"
$curl "http://localhost:2019/config/apps/http/servers/srv0/listen"
[":443",":90"]

如果要删除刚才添加的数组元素，可以使用DELETE请求，根据下标值路径进行删除：

$curl -X DELETE  "http://localhost:2019/config/apps/http/servers/srv0/listen/1"
$curl "http://localhost:2019/config/apps/http/servers/srv0/listen"
[":443"]

下面我们来添加一个srv1对象，与上面的srv0并齐：

$curl -H "Content-Type: application/json" -d '{ "listen" : [":444"]}' "http://localhost:2019/config/apps/http/servers/srv1/"

创建后，我们得到下面配置：

$curl  "http://localhost:2019/config/apps/http/servers/" | gojq
{
  "srv0": {
    "listen": [
      ":443"
    ],
    "routes": [
      ... ...
    ]
  },
  "srv1": {
    "listen": [
      ":444"
    ]
  }
}

但我们不能这么创建：

$curl -H "Content-Type: application/json" -d '{ "srv1" : { "listen" : [":444"]}}' "http://localhost:2019/config/apps/http/servers/"

这样会覆盖掉servers的全部信息，整个servers信息将变为：

$curl  "http://localhost:2019/config/apps/http/servers/" | gojq
{
  "srv1": {
    "listen": [
      ":444"
    ]
  }
}

2.3 @id

虽然通过上面指定路径可以获取和更新对应的配置，但我们也看到了Caddy的json的缩进非常深，这给API的调用者带来了心智负担。Caddy提供了一种强大而灵活的方式来快速访问和修改配置中的特定部分，这就是使用@id标识符。通过在配置中为某些元素分配唯一的@id，我们可以直接引用这些元素，而无需指定完整的路径。这在处理复杂配置或需要频繁修改特定部分时特别有用。

在Caddy的配置中，@id可以应用于多个层次的配置元素。具体来说，在apps/http/servers下的各个层次都支持@id，包括但不限于：

服务器（server）级别
路由（routes）级别
处理器（handle）级别
匹配器（match）级别

下面让我们通过具体的例子来看看如何在这些不同的层次上使用@id。由于Caddyfile不支持@id，我们将使用新的配置作为示例：

我们建立一个新的json作为Caddy的启动配置文件：

{
  "apps": {
    "http": {
      "servers": {
        "myserver": {
          "@id": "main_server",
          "listen": [
            ":80"
          ],
          "routes": [
            {
              "@id": "main_route",
              "handle": [
                {
                  "@id": "main_handler",
                  "body": "Hello from main server!",
                  "handler": "static_response"
                }
              ],
              "match": [
                {
                  "@id": "path_matcher",
                  "path": [
                    "/api/*"
                  ]
                }
              ]
            }
          ]
        }
      }
    }
  }
}

我们先看看服务器级别的@id使用。在这里我们为myserver这个服务器赋予了一个新的@id字段，值为main_server，接下来，我们就可以使用下面路径获取和更新该server的配置信息：

$curl  "http://localhost:2019/id/main_server"
{"@id":"main_server","listen":[":80"],"routes":[{"handle":[{"body":"Hello from main server!","handler":"static_response"}]}]}

$curl  "http://localhost:2019/id/main_server/listen"
[":80"]

同理，在路由级别，我们也为为其中的一个路由设置了@id字段，值为main_route，通过下面命令便可以获取和更新该路由信息：

$curl  "http://localhost:2019/id/main_route/"
{"@id":"main_route","handle":[{"@id":"main_handler","body":"Hello from main server!","handler":"static_response"}],"match":[{"@id":"path_matcher","path":["/api/*"]}]}

$curl  "http://localhost:2019/id/main_route/handle"
[{"@id":"main_handler","body":"Hello from main server!","handler":"static_response"}]

通过handle（处理器）级别的@id，我们同样可以直接访问@id对应的对象的信息：

$curl  "http://localhost:2019/id/main_handler/"
{"@id":"main_handler","body":"Hello from main server!","handler":"static_response"}

$curl  "http://localhost:2019/id/main_handler/body"
"Hello from main server!"

最后是通过@id访问matcher：

$curl  "http://localhost:2019/id/path_matcher/"
{"@id":"path_matcher","path":["/api/*"]}

$curl  "http://localhost:2019/id/path_matcher/path"
["/api/*"]

我们看到：使用@id方式，我们可以像一个使用指针或传送点那样，直达特定路径下面，而无需一层一层的输入路径信息。在处理大型或复杂的配置时，它为管理员和开发者提供了一种更灵活、更直观的方式来操作Caddy的配置。

3. 生产环境的实践与ACME

最后我们来简单说说在生产环境使用Caddy的一些实践方法。

3.1 生产环境的Caddy配置方法

前面说了那么多的Caddy配置方法，那么在生产环境究竟应该使用哪种方法来进行Caddy的初始配置、运行时动态配置更新以及配置的持久化呢？

虽然Caddyfile简单，但如果要在生产环境中进行运行时的动态配置更新，json格式才是不二之选，我们首先可以基于标准格式准备一份json的初始配置作为caddy的初始启动配置，这个配置后续就可以不再使用了。

启动caddy时建议使用–resume，初始情况下因为还没有autosaved.json，caddy会基于初始配置启动，之后重启caddy都会基于autosaved.json启动。

而运行时，我们可直接基于API对caddy的配置进行修改，所有的修改都会立即生效，而且无需停机，并且配置变更会save到autosave.json中，即便caddy重启，下一次启动时caddy也会加载停机前的最新配置，而这一切都不需要我们干预。

3.2 自动HTTPS与ACME

在生产环境使用Caddy，除了其超级简单的配置和相对不错的性能之外，最主要就要用它的自动https，即自动为代理的站点域名从Let’s Encrypt或zerossl申请受信任的免费证书，并可以在证书过期前自动更新证书。Caddy是通过ACME协议与这两个站点进行交互并获取和维护证书的。

ACME协议是一个用于自动化数字证书管理的协议。它允许服务器或客户端软件自动向证书颁发机构 (CA) 请求、更新和撤销SSL/TLS证书。ACME协议的优势在于减少了人为错误，支持短期证书，提高了证书安全性，同时由于支持自动化，让大规模证书部署和管理成为可能。

该协议最早在2015年由Let’s Encrypt推出，旨在推广HTTPS，并使证书管理自动化和标准化。

ACME的API版本有两个，API v1规范于2016年发布。它支持为完全限定的域名颁发证书，例如example.com或cluster.example.com，但不支持*.example.com等通配符证书。API v2规范于2018年发布，被称为ACME v2，ACME v2不向后兼容v1。v2版本支持通配符域名证书，例如*.example.com。同时新增新的挑战(challenge)类型TLS-ALPN-01。

IETF在2019年正式将ACME作为标准协议发布(RFC 8555)。2021年，ACME v1版本废弃，不再提供支持。

ACME协议的主要组件包括客户端、ACME服务器（如Let’s Encrypt或ZeroSSL）、挑战机制（Challenges）以及证书颁发流程。客户端首先向ACME服务器请求证书，服务器通过挑战机制要求客户端证明对域名的控制权，验证通过后颁发证书。这里最复杂的就是挑战机制了。

Caddy Server支持以下ACME 挑战机制：

HTTP Challenge

CA机构执行该挑战时会对候选主机名的A/AAAA记录执行权威DNS查找，然后在端口80上使用HTTP请求一个临时的加密资源。如果CA（证书颁发机构）看到了预期的资源，则会颁发证书。该挑战机制要求端口80必须对外部可访问。在Caddy中，此挑战机制默认启用且无需显式配置。

TLS-ALPN Challenge

CA机构执行该挑战时会对候选主机名的A/AAAA记录执行权威DNS查找，然后在端口443上使用一个包含特殊ServerName和ALPN值的TLS握手请求临时的加密资源。如果CA看到了预期的资源，则会颁发证书。该挑战机制要求端口443必须对外部可访问。在Caddy中，此挑战机制也是默认启用的，且无需显式配置。

DNS Challenge

CA机构执行该挑战时会对候选主机名的TXT记录执行权威DNS查找，并查找包含特定值的TXT记录。如果CA看到了预期的值，则会颁发证书。

该挑战机制的优点是无需开放任何端口，并且请求证书的服务器不需要对外部可访问。但需要Caddy配置访问候选主机域名的DNS提供商的凭据(api token)，以便Caddy能够通过api设置（和清除）特殊的TXT记录。如果启用了DNS挑战，默认情况下其他挑战会被禁用。

这三种挑战机制在不同场景下都有各自的优势，Caddy默认启用HTTP和TLS-ALPN挑战，并在需要时会自动选择最成功的挑战类型来使用。同时Caddy也为DNS challenge提供了对各种DNS提供商的插件支持，这些插件可以在https://github.com/caddy-dns中查找。

Go在ACME方面有着广泛的应用，很多标准的ACME client以及服务端都是由go实现的，比如cert-manager等，甚至包括支撑let’s encrypt自身的服务都是基于Go实现的，即用于实现CA的boulder开源项目。

4. 小结

在本文中，我们深入探索了Caddy服务器的强大功能与简便配置。Caddy以其独特的设计理念，简化了Web服务器和反向代理的搭建过程，尤其是在自动HTTPS证书管理和API支持方面表现突出。通过Caddyfile的简单配置，用户可以迅速部署安全的HTTPS站点，而无需繁琐的步骤。

此外，Caddy的动态配置能力使得在运行时调整服务器设置成为可能，极大提高了灵活性和管理效率。尽管Caddy目前在四层代理和负载均衡的支持上还有待增强，但通过插件的方式也为用户提供了扩展的可能性。

总之，Caddy不仅适合个人项目的快速搭建，也在企业级应用中展现出强大的稳定性和高效性。随着社区的不断发展和支持，Caddy将继续成为开发者和系统管理员的重要工具。

本文涉及的源码可以在这里下载。

Gopher部落知识星球在2024年将继续致力于打造一个高品质的Go语言学习和交流平台。我们将继续提供优质的Go技术文章首发和阅读体验。同时，我们也会加强代码质量和最佳实践的分享，包括如何编写简洁、可读、可测试的Go代码。此外，我们还会加强星友之间的交流和互动。欢迎大家踊跃提问，分享心得，讨论技术。我会在第一时间进行解答和交流。我衷心希望Gopher部落可以成为大家学习、进步、交流的港湾。让我相聚在Gopher部落，享受coding的快乐! 欢迎大家踊跃加入！

著名云主机服务厂商DigitalOcean发布最新的主机计划，入门级Droplet配置升级为：1 core CPU、1G内存、25G高速SSD，价格5$/月。有使用DigitalOcean需求的朋友，可以打开这个链接地址：https://m.do.co/c/bff6eed92687 开启你的DO主机之路。

Gopher Daily(Gopher每日新闻) – https://gopherdaily.tonybai.com

我的联系方式：

微博(暂不可用)：https://weibo.com/bigwhite20xx
微博2：https://weibo.com/u/6484441286
博客：tonybai.com
github: https://github.com/bigwhite
Gopher Daily归档 – https://github.com/bigwhite/gopherdaily
Gopher Daily Feed订阅 – https://gopherdaily.tonybai.com/feed

商务合作方式：撰稿、出书、培训、在线课程、合伙创业、咨询、广告合作。

Gopher Daily改版了

bigwhite — Sun, 06 Aug 2023 12:24:58 +0000

本文永久链接 – https://tonybai.com/2023/08/06/gopherdaily-revamped

已经记不得GopherDaily是何时创建的了，翻了一下GopherDaily项目的commit history，才发现我的这个个人项目是2019年9月创建的，最初内容组织很粗糙，但我的编辑制作的热情很高，基本能坚持每日一发，甚至节假日也不停刊：

该项目的初衷就是为广大Gopher带来新鲜度较高的Go语言技术资料。项目创建以来得到了很多Gopher的支持，甚至经常收到催刊邮件/私信以及主动report订阅列表问题的情况。

不过近一年多，订阅GopherDaily的Gopher可能会发现：GopherDaily已经做不到“Daily”了！究其原因还是个人精力有限，每刊编辑都要花费很多时间。但个人又不想暂停该项目，怎么办呢？近段时间我就在着手思考提升GopherDaily制作效率的问题。

一个可行的方案就是“半自动化”！在这次从“纯人工”到“半自动化”的过程中，顺便对GopherDaily做了一次“改版”。

在这篇文章中，我就来说说结合大语言模型和Go技术栈实现GopherDaily制作的“半自动化”以及GopherDaily“改版”的历程。

1. “半自动化”的制作流程

当前的GopherDaily每刊的制作过程十分费时费力，下面是图示的制作过程：

这里面所有步骤都是人工处理，且收集资料、阅读摘要以及选优最为耗时。

那么这些环节中哪些可以自动化呢？收集、摘要、翻译、生成与发布都可以自动化，只有“选优”需要人工干预，下面是改进后的“半自动化”流程：

我们看到整个过程分为三个阶段：

第一阶段(stage1)：自动化的收集资料，并生成第二阶段的输入issue-20230805-stage1.json(以2023年8月5日为例)。
第二阶段(stage2)：对输入的issue-20230805-stage1.json中的资料进行选优，删掉不适合或质量不高的资料，当然也可以手工加入一些自动化收集阶段未找到的优秀资料；然后基于选优后的内容生成issue-20230805-stage2.json，作为第三阶段的输入。
第三阶段(stage3)：这一阶段也都是自动化的，程序基于第二阶段的输出issue-20230805-stage2.json中内容，逐条生成摘要，并将文章标题和摘要翻译为中文，最后生成两个文件：issue-20230805.html和issue-20230805.md，前者将被发布到邮件列表和gopherdaily github page上，而后者则会被上传到传统的GopherDaily归档项目中。

我个人的目标是将改进后的整个“半自动化”过程缩短在半小时以内，从试运行效果来看，基本达成！

下面我就来简要聊聊各个自动化步骤是如何实现的。

2. Go技术资料自动收集

GopherDaily制作效率提升的一个大前提就是可以将最耗时的“资料收集”环节自动化了！而要做到这一点，下面两方面不可或缺：

资料源集合
针对资料源的最新文章的感知和拉取

2.1 资料源的来源

资料源从哪里来呢？答案是以往的GopherDaily issues中！四年来积累了数千篇文章的URL，从这些issue中提取URL并按URL中域名/域名+一级路径的出现次数做个排序，得到GopherDaily改版后的初始资料源集合。虽然这个方案并不完美，但至少可以满足改版后的初始需求，后续还可以对资料源做渐进的手工优化。

提取文本中URL的方法有很多种，常用的一种方法是使用正则表达式，下面是一个从markdown或txt文件中提取url并输出的例子：

// extract-url/main.go

package main

import (
    "bufio"
    "fmt"
    "os"
    "path/filepath"
    "regexp"
)

func main() {
    var allURLs []string

    err := filepath.Walk("/Users/tonybai/blog/gitee.com/gopherdaily", func(path string, info os.FileInfo, err error) error {
        if err != nil {
            return err
        }

        if info.IsDir() {
            return nil
        }

        if filepath.Ext(path) != ".txt" && filepath.Ext(path) != ".md" {
            return nil
        }

        file, err := os.Open(path)
        if err != nil {
            return err
        }
        defer file.Close()

        scanner := bufio.NewScanner(file)
        urlRegex := regexp.MustCompile(`https?://[^\s]+`)

        for scanner.Scan() {
            urls := urlRegex.FindAllString(scanner.Text(), -1)
            allURLs = append(allURLs, urls...)
        }

        return scanner.Err()
    })

    if err != nil {
        fmt.Println(err)
        return
    }

    for _, url := range allURLs {
        fmt.Printf("%s\n", url)
    }
    fmt.Println(len(allURLs))
}

我将提取并分析后得到的URL放入一个临时文件中，因为仅提取URL还不够，要做为资料源，我们需要的是对应站点的feed地址。那么如何提取出站点的feed地址呢？我们看下面这个例子：

// extract_rss/main.go

package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
    "regexp"
)

var (
    rss  = regexp.MustCompile(`]*type="application/rss\+xml"[^>]*href="([^"]+)"`)
    atom = regexp.MustCompile(`]*type="application/atom\+xml"[^>]*href="([^"]+)"`)
)

func main() {
    var sites = []string{
        "http://research.swtch.com",
        "https://tonybai.com",
        "https://benhoyt.com/writings",
    }

    for _, url := range sites {
        resp, err := http.Get(url)
        if err != nil {
            fmt.Println("Error fetching URL:", err)
            continue
        }
        defer resp.Body.Close()

        body, err := ioutil.ReadAll(resp.Body)
        if err != nil {
            fmt.Println("Error reading response body:", err)
            continue
        }

        matches := rss.FindAllStringSubmatch(string(body), -1)
        if len(matches) == 0 {
            matches = atom.FindAllStringSubmatch(string(body), -1)
            if len(matches) == 0 {
                continue
            }
        }

        fmt.Printf("\"%s\" -> rss: \"%s\"\n", url, matches[0][1])
    }
}

执行上述程序，我们得到如下结果：

"http://research.swtch.com" -> rss: "http://research.swtch.com/feed.atom"
"https://tonybai.com" -> rss: "https://tonybai.com/feed/"
"https://benhoyt.com/writings" -> rss: "/writings/rss.xml"

我们看到不同站点的rss地址值着实不同，有些是完整的url地址，有些则是相对于主站点url的路径，这个还需要进一步判断与处理，但这里就不赘述了。

我们将提取和处理后的feed地址放入feeds.toml中作为资料源集合。每天开始制作Gopher Daily时，就从读取这个文件中的资料源开始。

2.2 感知和拉取资料源的更新

有了资料源集合后，我们接下来要做的就是定期感知和拉取资料源的最新更新（暂定24小时以内的），再说白点就是拉取资料源的feed数据，解析内容，得到资料源的最新文章信息。针对feed拉取与解析，Go社区有现成的工具，比如gofeed就是其中功能较为齐全且表现稳定的一个。

下面是使用Gofeed抓取feed地址并获取文章信息的例子：

// gofeed/main.go

package main

import (
    "fmt"

    "github.com/mmcdole/gofeed"
)

func main() {

    var feeds = []string{
        "https://research.swtch.com/feed.atom",
        "https://tonybai.com/feed/",
        "https://benhoyt.com/writings/rss.xml",
    }

    fp := gofeed.NewParser()
    for _, feed := range feeds {
        feedInfo, err := fp.ParseURL(feed)
        if err != nil {
            fmt.Printf("parse feed [%s] error: %s\n", feed, err.Error())
            continue
        }
        fmt.Printf("The info of feed url: %s\n", feed)
        for _, item := range feedInfo.Items {
            fmt.Printf("\t title: %s\n", item.Title)
            fmt.Printf("\t link: %s\n", item.Link)
            fmt.Printf("\t published: %s\n", item.Published)
        }
        fmt.Println("")
    }
}

该程序分别解析三个feed地址，并分别输出得到的文章信息，包括标题、url和发布时间。运行上述程序我们将得到如下结果：

$go run main.go
The info of feed url: https://research.swtch.com/feed.atom
     title: Coroutines for Go
     link: http://research.swtch.com/coro
     published: 2023-07-17T14:00:00-04:00
     title: Storing Data in Control Flow
     link: http://research.swtch.com/pcdata
     published: 2023-07-11T14:00:00-04:00
     title: Opting In to Transparent Telemetry
     link: http://research.swtch.com/telemetry-opt-in
     published: 2023-02-24T08:59:00-05:00
     title: Use Cases for Transparent Telemetry
     link: http://research.swtch.com/telemetry-uses
     published: 2023-02-08T08:00:03-05:00
     title: The Design of Transparent Telemetry
     link: http://research.swtch.com/telemetry-design
     published: 2023-02-08T08:00:02-05:00
     title: Transparent Telemetry for Open-Source Projects
     link: http://research.swtch.com/telemetry-intro
     published: 2023-02-08T08:00:01-05:00
     title: Transparent Telemetry
     link: http://research.swtch.com/telemetry
     published: 2023-02-08T08:00:00-05:00
     title: The Magic of Sampling, and its Limitations
     link: http://research.swtch.com/sample
     published: 2023-02-04T12:00:00-05:00
     title: Go’s Version Control History
     link: http://research.swtch.com/govcs
     published: 2022-02-14T10:00:00-05:00
     title: What NPM Should Do Today To Stop A New Colors Attack Tomorrow
     link: http://research.swtch.com/npm-colors
     published: 2022-01-10T11:45:00-05:00
     title: Our Software Dependency Problem
     link: http://research.swtch.com/deps
     published: 2019-01-23T11:00:00-05:00
     title: What is Software Engineering?
     link: http://research.swtch.com/vgo-eng
     published: 2018-05-30T10:00:00-04:00
     title: Go and Dogma
     link: http://research.swtch.com/dogma
     published: 2017-01-09T09:00:00-05:00
     title: A Tour of Acme
     link: http://research.swtch.com/acme
     published: 2012-09-17T11:00:00-04:00
     title: Minimal Boolean Formulas
     link: http://research.swtch.com/boolean
     published: 2011-05-18T00:00:00-04:00
     title: Zip Files All The Way Down
     link: http://research.swtch.com/zip
     published: 2010-03-18T00:00:00-04:00
     title: UTF-8: Bits, Bytes, and Benefits
     link: http://research.swtch.com/utf8
     published: 2010-03-05T00:00:00-05:00
     title: Computing History at Bell Labs
     link: http://research.swtch.com/bell-labs
     published: 2008-04-09T00:00:00-04:00
     title: Using Uninitialized Memory for Fun and Profit
     link: http://research.swtch.com/sparse
     published: 2008-03-14T00:00:00-04:00
     title: Play Tic-Tac-Toe with Knuth
     link: http://research.swtch.com/tictactoe
     published: 2008-01-25T00:00:00-05:00
     title: Crabs, the bitmap terror!
     link: http://research.swtch.com/crabs
     published: 2008-01-09T00:00:00-05:00

The info of feed url: https://tonybai.com/feed/
     title: Go语言开发者的Apache Arrow使用指南：读写Parquet文件
     link: https://tonybai.com/2023/07/31/a-guide-of-using-apache-arrow-for-gopher-part6/
     published: Mon, 31 Jul 2023 13:07:28 +0000
     title: Go语言开发者的Apache Arrow使用指南：扩展compute包
     link: https://tonybai.com/2023/07/22/a-guide-of-using-apache-arrow-for-gopher-part5/
     published: Sat, 22 Jul 2023 13:58:57 +0000
     title: 使用testify包辅助Go测试指南
     link: https://tonybai.com/2023/07/16/the-guide-of-go-testing-with-testify-package/
     published: Sun, 16 Jul 2023 07:09:56 +0000
     title: Go语言开发者的Apache Arrow使用指南：数据操作
     link: https://tonybai.com/2023/07/13/a-guide-of-using-apache-arrow-for-gopher-part4/
     published: Thu, 13 Jul 2023 14:41:25 +0000
     title: Go语言开发者的Apache Arrow使用指南：高级数据结构
     link: https://tonybai.com/2023/07/08/a-guide-of-using-apache-arrow-for-gopher-part3/
     published: Sat, 08 Jul 2023 15:27:54 +0000
     title: Apache Arrow：驱动列式分析性能和连接性的提升[译]
     link: https://tonybai.com/2023/07/01/arrow-columnar-analytics/
     published: Sat, 01 Jul 2023 14:42:29 +0000
     title: Go语言开发者的Apache Arrow使用指南：内存管理
     link: https://tonybai.com/2023/06/30/a-guide-of-using-apache-arrow-for-gopher-part2/
     published: Fri, 30 Jun 2023 14:00:59 +0000
     title: Go语言开发者的Apache Arrow使用指南：数据类型
     link: https://tonybai.com/2023/06/25/a-guide-of-using-apache-arrow-for-gopher-part1/
     published: Sat, 24 Jun 2023 20:43:38 +0000
     title: Go语言包设计指南
     link: https://tonybai.com/2023/06/18/go-package-design-guide/
     published: Sun, 18 Jun 2023 15:03:41 +0000
     title: Go GC：了解便利背后的开销
     link: https://tonybai.com/2023/06/13/understand-go-gc-overhead-behind-the-convenience/
     published: Tue, 13 Jun 2023 14:00:16 +0000

The info of feed url: https://benhoyt.com/writings/rss.xml
     title: The proposal to enhance Go's HTTP router
     link: https://benhoyt.com/writings/go-servemux-enhancements/
     published: Mon, 31 Jul 2023 08:00:00 +1200
     title: Scripting with Go: a 400-line Git client that can create a repo and push itself to GitHub
     link: https://benhoyt.com/writings/gogit/
     published: Sat, 29 Jul 2023 16:30:00 +1200
     title: Names should be as short as possible while still being clear
     link: https://benhoyt.com/writings/short-names/
     published: Mon, 03 Jul 2023 21:00:00 +1200
     title: Lookup Tables (Forth Dimensions XIX.3)
     link: https://benhoyt.com/writings/forth-lookup-tables/
     published: Sat, 01 Jul 2023 22:10:00 +1200
     title: For Python packages, file structure != API
     link: https://benhoyt.com/writings/python-api-file-structure/
     published: Fri, 30 Jun 2023 22:50:00 +1200
     title: Designing Pythonic library APIs
     link: https://benhoyt.com/writings/python-api-design/
     published: Sun, 18 Jun 2023 21:00:00 +1200
     title: From Go on EC2 to Fly.io: +fun, −$9/mo
     link: https://benhoyt.com/writings/flyio/
     published: Mon, 27 Feb 2023 10:00:00 +1300
     title: Code coverage for your AWK programs
     link: https://benhoyt.com/writings/goawk-coverage/
     published: Sat, 10 Dec 2022 13:41:00 +1300
     title: I/O is no longer the bottleneck
     link: https://benhoyt.com/writings/io-is-no-longer-the-bottleneck/
     published: Sat, 26 Nov 2022 22:20:00 +1300
     title: microPledge: our startup that (we wish) competed with Kickstarter
     link: https://benhoyt.com/writings/micropledge/
     published: Mon, 14 Nov 2022 20:00:00 +1200
     title: Rob Pike's simple C regex matcher in Go
     link: https://benhoyt.com/writings/rob-pike-regex/
     published: Fri, 12 Aug 2022 14:00:00 +1200
     title: Tools I use to build my website
     link: https://benhoyt.com/writings/tools-i-use-to-build-my-website/
     published: Tue, 02 Aug 2022 19:00:00 +1200
     title: Modernizing AWK, a 45-year old language, by adding CSV support
     link: https://benhoyt.com/writings/goawk-csv/
     published: Tue, 10 May 2022 09:30:00 +1200
     title: Prig: like AWK, but uses Go for "scripting"
     link: https://benhoyt.com/writings/prig/
     published: Sun, 27 Feb 2022 18:20:00 +0100
     title: Go performance from version 1.2 to 1.18
     link: https://benhoyt.com/writings/go-version-performance/
     published: Fri, 4 Feb 2022 09:30:00 +1300
     title: Optimizing GoAWK with a bytecode compiler and virtual machine
     link: https://benhoyt.com/writings/goawk-compiler-vm/
     published: Thu, 3 Feb 2022 22:25:00 +1300
     title: AWKGo, an AWK-to-Go compiler
     link: https://benhoyt.com/writings/awkgo/
     published: Mon, 22 Nov 2021 00:10:00 +1300
     title: Improving the code from the official Go RESTful API tutorial
     link: https://benhoyt.com/writings/web-service-stdlib/
     published: Wed, 17 Nov 2021 07:00:00 +1300
     title: Simple Lists: a tiny to-do list app written the old-school way (server-side Go, no JS)
     link: https://benhoyt.com/writings/simple-lists/
     published: Mon, 4 Oct 2021 07:30:00 +1300
     title: Structural pattern matching in Python 3.10
     link: https://benhoyt.com/writings/python-pattern-matching/
     published: Mon, 20 Sep 2021 19:30:00 +1200
     title: Mugo, a toy compiler for a subset of Go that can compile itself
     link: https://benhoyt.com/writings/mugo/
     published: Mon, 12 Apr 2021 20:30:00 +1300
     title: How to implement a hash table (in C)
     link: https://benhoyt.com/writings/hash-table-in-c/
     published: Fri, 26 Mar 2021 20:30:00 +1300
     title: Performance comparison: counting words in Python, Go, C++, C, AWK, Forth, and Rust
     link: https://benhoyt.com/writings/count-words/
     published: Mon, 15 Mar 2021 20:30:00 +1300
     title: The small web is beautiful
     link: https://benhoyt.com/writings/the-small-web-is-beautiful/
     published: Tue, 2 Mar 2021 06:50:00 +1300
     title: Coming in Go 1.16: ReadDir and DirEntry
     link: https://benhoyt.com/writings/go-readdir/
     published: Fri, 29 Jan 2021 10:00:00 +1300
     title: Fuzzing in Go
     link: https://lwn.net/Articles/829242/
     published: Tue, 25 Aug 2020 08:00:00 +1200
     title: Searching code with Sourcegraph
     link: https://lwn.net/Articles/828748/
     published: Mon, 17 Aug 2020 08:00:00 +1200
     title: Different approaches to HTTP routing in Go
     link: https://benhoyt.com/writings/go-routing/
     published: Fri, 31 Jul 2020 08:00:00 +1200
     title: Go filesystems and file embedding
     link: https://lwn.net/Articles/827215/
     published: Fri, 31 Jul 2020 00:00:00 +1200
     title: The sad, slow-motion death of Do Not Track
     link: https://lwn.net/Articles/826575/
     published: Wed, 22 Jul 2020 11:00:00 +1200
     title: What's new in Lua 5.4
     link: https://lwn.net/Articles/826134/
     published: Wed, 15 Jul 2020 11:00:00 +1200
     title: Hugo: a static-site generator
     link: https://lwn.net/Articles/825507/
     published: Wed, 8 Jul 2020 11:00:00 +1200
     title: Generics for Go
     link: https://lwn.net/Articles/824716/
     published: Wed, 1 Jul 2020 11:00:00 +1200
     title: More alternatives to Google Analytics
     link: https://lwn.net/Articles/824294/
     published: Wed, 24 Jun 2020 11:00:00 +1200
     title: Lightweight Google Analytics alternatives
     link: https://lwn.net/Articles/822568/
     published: Wed, 17 Jun 2020 11:00:00 +1200
     title: An intro to Go for non-Go developers
     link: https://benhoyt.com/writings/go-intro/
     published: Wed, 10 Jun 2020 23:38:00 +1200
     title: ZZT in Go (using a Pascal-to-Go converter)
     link: https://benhoyt.com/writings/zzt-in-go/
     published: Fri, 29 May 2020 17:25:00 +1200
     title: Testing in Go: philosophy and tools
     link: https://lwn.net/Articles/821358/
     published: Wed, 27 May 2020 12:00:00 +1200
     title: The state of the AWK
     link: https://lwn.net/Articles/820829/
     published: Wed, 20 May 2020 12:00:00 +1200
     title: What's coming in Go 1.15
     link: https://lwn.net/Articles/820217/
     published: Wed, 13 May 2020 12:00:00 +1200
     title: Don't try to sanitize input. Escape output.
     link: https://benhoyt.com/writings/dont-sanitize-do-escape/
     published: Thu, 27 Feb 2020 19:27:00 +1200
     title: SEO for Software Engineers
     link: https://benhoyt.com/writings/seo-for-software-engineers/
     published: Thu, 20 Feb 2020 12:00:00 +1200

注：gofeed抓取的item.Description是文章的摘要。但这个摘要不一定可以真实反映文章内容的概要，很多就是文章内容的前N个字而已。

Gopher Daily半自动化改造的另外一个技术课题是对拉取的文章做自动摘要与标题摘要的翻译，下面我们继续来看一下这个课题如何攻破。

注：目前微信公众号的优质文章尚未实现自动拉取，还需手工选优。

3. 自动摘要与翻译

对一段文本提取摘要和翻译均属于自然语言处理(NLP)范畴，说实话，Go在这个范畴中并不活跃，很难找到像样的开源算法实现或工具可直接使用。我的解决方案是借助云平台供应商的NLP API来做，这里我用的是微软Azure的相关API。

在使用现成的API之前，我们需要抓取特定url上的html页面并提取出要进行摘要的文本。

3.1 提取html中的原始文本

我们通过http.Get可以获取到一个文章URL上的html页面的所有内容，但如何提取出主要文本以供后续提取摘要使用呢？每个站点上的html内容都包含了很多额外内容，比如header、footer、分栏、边栏、导航栏等，这些内容对摘要的生成具有一定影响。我们最好能将这些额外内容剔除掉。但html的解析还是十分复杂的，我的解决方案是将html转换为markdown后再提交给摘要API。

html-to-markdown是一款不错的转换工具，它最吸引我的是可以删除原HTML中的一些tag，并自定义一些rule。下面的例子就是用html-to-markdown获取文章原始本文的例子：

// get-original-text/main.go

package main

import (
    "fmt"
    "io/ioutil"
    "net/http"

    md "github.com/JohannesKaufmann/html-to-markdown"
)

func main() {
    s, err := getOriginText("http://research.swtch.com/coro")
    if err != nil {
        panic(err)
    }
    fmt.Println(s)
}

func getOriginText(url string) (string, error) {
    resp, err := http.Get(url)
    if err != nil {
        return "", err
    }
    defer resp.Body.Close()

    body, _ := ioutil.ReadAll(resp.Body)

    converter := md.NewConverter("", true, nil).Remove("header",
        "footer", "aside", "table", "nav") //"table" is used to store code

    markdown, err := converter.ConvertString(string(body))
    if err != nil {
        return "", err
    }
    return markdown, nil
}

在这个例子中，我们删除了header、footer、边栏、导航栏等，尽可能的保留主要文本。针对这个例子我就不执行了，大家可以自行执行并查看执行结果。

3.2 提取摘要

我们通过微软Azure提供的摘要提取API进行摘要提取。微软Azure的这个API提供的免费额度，足够我这边制作Gopher Daily使用了。

注：要使用微软Azure提供的各类免费API，需要先注册Azure的账户。目前摘要提取API仅在North Europe, East US, UK South三个region提供，创建API服务时别选错Region了。我这里用的是East US。

注：Azure控制台较为难用，大家要有心理准备:)。

微软这个摘要API十分复杂，下面给出一个用curl调用API的示例。

摘要提取API的使用分为两步。第一步是请求对原始文本进行摘要处理，比如：

$curl -i -X POST https://gopherdaily-summarization.cognitiveservices.azure.com/language/analyze-text/jobs?api-version=2022-10-01-preview \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: your_api_key" \
-d \
'
{
  "displayName": "Document Abstractive Summarization Task Example",
  "analysisInput": {
    "documents": [
      {
        "id": "1",
        "language": "en",
        "text": "At Microsoft, we have been on a quest to advance AI beyond existing techniques, by taking a more holistic, human-centric approach to learning and understanding. As Chief Technology Officer of Azure AI services, I have been working with a team of amazing scientists and engineers to turn this quest into a reality. In my role, I enjoy a unique perspective in viewing the relationship among three attributes of human cognition: monolingual text (X), audio or visual sensory signals, (Y) and multilingual (Z). At the intersection of all three, there’s magic—what we call XYZ-code as illustrated in Figure 1—a joint representation to create more powerful AI that can speak, hear, see, and understand humans better. We believe XYZ-code will enable us to fulfill our long-term vision: cross-domain transfer learning, spanning modalities and languages. The goal is to have pre-trained models that can jointly learn representations to support a broad range of downstream AI tasks, much in the way humans do today. Over the past five years, we have achieved human performance on benchmarks in conversational speech recognition, machine translation, conversational question answering, machine reading comprehension, and image captioning. These five breakthroughs provided us with strong signals toward our more ambitious aspiration to produce a leap in AI capabilities, achieving multi-sensory and multilingual learning that is closer in line with how humans learn and understand. I believe the joint XYZ-code is a foundational component of this aspiration, if grounded with external knowledge sources in the downstream AI tasks."
      }
    ]
  },
  "tasks": [
    {
      "kind": "AbstractiveSummarization",
      "taskName": "Document Abstractive Summarization Task 1",
      "parameters": {
        "sentenceCount": 1
      }
    }
  ]
}
'

请求成功后，我们将得到一段应答，应答中包含类似operation-location的一段地址：

Operation-Location:[https://gopherdaily-summarization.cognitiveservices.azure.com/language/analyze-text/jobs/66e7e3a1-697c-4fad-864c-d84c647682b4?api-version=2022-10-01-preview]

这段地址就是第二步的请求地址，第二步是从这个地址获取摘要后的本文：

$curl -X GET https://gopherdaily-summarization.cognitiveservices.azure.com/language/analyze-text/jobs/66e7e3a1-697c-4fad-864c-d84c647682b4\?api-version\=2022-10-01-preview \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: your_api_key"
{"jobId":"66e7e3a1-697c-4fad-864c-d84c647682b4","lastUpdatedDateTime":"2023-07-27T11:09:45Z","createdDateTime":"2023-07-27T11:09:44Z","expirationDateTime":"2023-07-28T11:09:44Z","status":"succeeded","errors":[],"displayName":"Document Abstractive Summarization Task Example","tasks":{"completed":1,"failed":0,"inProgress":0,"total":1,"items":[{"kind":"AbstractiveSummarizationLROResults","taskName":"Document Abstractive Summarization Task 1","lastUpdateDateTime":"2023-07-27T11:09:45.8892126Z","status":"succeeded","results":{"documents":[{"summaries":[{"text":"Microsoft has been working to advance AI beyond existing techniques by taking a more holistic, human-centric approach to learning and understanding, and the Chief Technology Officer of Azure AI services, who enjoys a unique perspective in viewing the relationship among three attributes of human cognition: monolingual text, audio or visual sensory signals, and multilingual, has created XYZ-code, a joint representation to create more powerful AI that can speak, hear, see, and understand humans better.","contexts":[{"offset":0,"length":1619}]}],"id":"1","warnings":[]}],"errors":[],"modelVersion":"latest"}}]}}%

大家可以根据请求和应答的JSON结构，结合一些json-to-struct工具自行实现Azure摘要API的Go代码。

3.3 翻译

Azure的翻译API相对于摘要API要简单的多。

下面是使用curl演示翻译API的示例：

$curl -X POST "https://api.cognitive.microsofttranslator.com/translate?api-version=3.0&to=zh" \
     -H "Ocp-Apim-Subscription-Key:your_api_key" \
     -H "Ocp-Apim-Subscription-Region:westcentralus" \
     -H "Content-Type: application/json" \
     -d "[{'Text':'Hello, what is your name?'}]"

[{"detectedLanguage":{"language":"en","score":1.0},"translations":[{"text":"你好，你叫什么名字？","to":"zh-Hans"}]}]%

大家可以根据请求和应答的JSON结构，结合一些json-to-struct工具自行实现Azure翻译API的Go代码。

对于源文章是中文的，我们可以无需调用该API进行翻译，下面是一个判断字符串是否为中文的函数：

func isChinese(s string) bool {
    for _, r := range s {
        if unicode.Is(unicode.Scripts["Han"], r) {
            return true
        }
    }
    return false
}

4. 页面样式设计与html生成

这次Gopher Daily改版，我为Gopher Daily提供了Web版和邮件列表版，但页面设计是我最不擅长的。好在，和四年前相比，IT技术又有了进一步的发展，以ChatGPT为代表的大语言模型如雨后春笋般层出不穷，我可以借助大模型的帮助来为我设计和实现一个简单的html页面了。下图就是这次改版后的第一版页面：

整个页面分为四大部分：Go、云原生(与Go关系紧密，程序员相关，架构相关的内容也放在这部分)、AI(当今流行)以及热门工具与项目(目前主要是github trending中每天Go项目的top列表中的内容)。

每一部分每个条目都包含文章标题、文章链接和文章的摘要，摘要的增加可以帮助大家更好的预览文章内容。

html和markdown的生成都是基于Go的template技术，template也是借助claude.ai设计与实现的，这里就不赘述了。

5. 服务器选型

以前的Gopher Daily仅是在github上的一个开源项目，大家通过watch来订阅。此外，Basten Gao维护着一个第三方的邮件列表，在此也对Basten Gao对Gopher Daily的长期支持表示感谢。

如今改版后，我原生提供了Gopher Daily的Web版，我需要为Gopher Daily选择服务器。

简单起见，我选用了github page来承载Gopher Daily的Web版。

至于邮件列表的订阅、取消订阅，我则是开发了一个小小的服务，跑在Digital Ocean的VPS上。

在选择反向代理web服务器时，我放弃了nginx，选择了同样Go技术栈实现的Caddy。Caddy最大好处就是易上手，且默认自动支持HTTPS，我无需自行用工具向免费证书机构(如 Let’s Encrypt或ZeroSSL)去申请和维护证书。

6 小结

这次改版后的Gopher Daily应得上那句话：“麻雀虽小，五脏俱全”：我为此开发了三个工具，一个服务。

当然Gopher Daily还在持续优化，后续也会根据Gopher们的反馈作适当调整。

摘要和翻译目前使用Azure API，后续可能会改造为使用类ChatGPT的API。

此外，知识星球Gopher部落的星友们依然拥有“先睹为快”的权益。

本文示例代码可以在这里下载。

Gopher Daily网页版 – https://gopherdaily.tonybai.com
Gopher Daily邮件列表订阅 – https://gopherdaily.tonybai.com/subscribe
Gopher Daily项目归档(markdown版本) – https://github.com/bigwhite/gopherdaily

“Gopher部落”知识星球旨在打造一个精品Go学习和进阶社群！高品质首发Go技术文章，“三天”首发阅读权，每年两期Go语言发展现状分析，每天提前1小时阅读到新鲜的Gopher日报，网课、技术专栏、图书内容前瞻，六小时内必答保证等满足你关于Go语言生态的所有需求！2023年，Gopher部落将进一步聚焦于如何编写雅、地道、可读、可测试的Go代码，关注代码质量并深入理解Go核心技术，并继续加强与星友的互动。欢迎大家加入！

Gopher Daily(Gopher每日新闻) – https://gopherdaily.tonybai.com

我的联系方式：

微博(暂不可用)：https://weibo.com/bigwhite20xx
微博2：https://weibo.com/u/6484441286
博客：tonybai.com
github: https://github.com/bigwhite
Gopher Daily归档 – https://github.com/bigwhite/gopherdaily

商务合作方式：撰稿、出书、培训、在线课程、合伙创业、咨询、广告合作。

使用viper实现yaml配置文件的合并

bigwhite — Tue, 20 Sep 2022 14:22:04 +0000

本文永久链接 – https://tonybai.com/2022/09/20/use-viper-to-do-merge-of-yml-configuration-files

作为小厂，我们的基础设施还不够完备，项目经理中秋节通知我们的系统近期要上second-to-last stage环境和生产环境，于是从运维人员部署效率方面考量，我们紧急开发了一个一键安装脚本生成工具，这样运维人员便可以利用该工具结合实际目标环境生成一键安装脚本。这个工具的原理十分简单，如下示意图所示：

从上图可以知道，我们的工具是基于模板定制最终的配置与安装脚本的，其中：

templates/conf下面是服务配置；
templates/manifests下面是服务的k8s yaml脚本;
custom/configure文件存储的是针对templates/conf下面服务配置的定制化配置数据；
custom/manifests文件存储的是针对templates/manifests下面k8s yaml的定制化配置数据；
templates/install.sh则是安装脚本。

custom目录下的两个存储定制化配置的文件是与目标环境紧密相关的。

提到template，Gopher们首先想到的是Go text/template技术，利用模板语法编写上面templates目录下的模板配置文件。不过基于text/template就需要我们事先将所有需要定制化的变量都一一识别出来，这个量有些大，且不够灵活。

那我们还可以采用什么技术方案呢？我最终选择了yaml文件合并(包括覆盖与追加)的方案，该方案示意图如下：

这个示例包含了覆盖和(追加)合并两种情况，我们首先看一下覆盖。

custom/manifests.yml中配置覆盖templates/manifests/*.yaml的配置

以templates/manifests/a.yml为例，该模板中metadata.name的默认值为default，但运维人员根据目标环境定制了(customizing)custom/manifests.yml文件。在该文件中，a.yml文件名作为key值，然后将要覆盖的配置项的全路径配置到该文件中(这里的全路径为metadata.name)：

a.yml：
  metadata:
    name: foo

custom/manifests.yml文件中对namespace name的修改值foo将会覆盖原模板中的default，这在最终的xx_install/manifests/a.yml中会体现出来：

// a.yml
apiVersion: v1
kind: Namespace
metadata:
  name: foo

custom/manifests.yml中配置追加到templates/manifests/*.yaml配置中

对于原模板文件中没有而custom中新增的配置，会追加到最终生成的配置文件中，以b.yml为例。原模板目录下的b.yml内容如下：

// templates/manifests/b.yml
log:
  type: file
  level: 0
  compress: true

这里log下仅有三个子配置项：type、level和compress。

而运维在custom/manifests.yml为log增加了其他若干种配置，比如access_log、error_log等：

// custom/manifests.yml
b.yml:
  log:
    level: 1
    compress: false
    access_log: "access.log"
    error_log: "error.log"
    max_age: 3
    maxbackups: 7
    maxsize: 100

这样，除了level、compress会覆盖原模板中的值之外，其余新增的配置都会追加到生成的xx_install/manifests/b.yml中会体现出来：

// b.yml
log:
  type: file
  level: 1
  compress: false
  access_log: "access.log"
  error_log: "error.log"
  max_age: 3
  maxbackups: 7
  maxsize: 100

好了！方案确定了，那如何实现yaml文件的合并呢？Go社区的yaml包要数https://github.com/go-yaml/yaml(Canonical import paths为gopkg.in/yaml.v2或gopkg.in/yaml.v3)最为知名，这个包实现了YAML 1.2规范，可以方便实现Yaml与go struct之间的marshal与unmarshal。不过，yaml包提供的接口都比较初级，要想实现yaml文件的合并，还需要自己做较多额外工作，时间上可能不允许了。那有没有现成可用的工具呢？答案是有的，它就是在Go社区大名鼎鼎的viper！

viper是由gohugo作者、前Go语言项目组产品经理Steve Francia开发的开源Go应用配置框架。viper不仅支持命令行参数传入配置，还支持从各种类型配置文件、环境变量、远程配置系统(etcd等)等获取配置。除此之外，viper还支持配置文件的merge和对配置文件的写入操作。

我们是否可以直接使用viper的Merge系列操作呢？答案是不能！为什么呢？因为这与我们上面的设计有关。我们将与环境有关的配置都放入了custom/manifests.yml这一个文件中了，这与一merge就会导致custom/manifests.yml中的配置数据出现在每一个最终生成的templates/xx.yml配置文件中。

那我们就自行来实现一套merge(覆盖和追加)操作！

我们先来看驱动merge的main函数:

// github.com/bigwhite/experiments/tree/master/yml-merge-using-viper/main.go

var (
    sourceDir string
    dstDir    string
)

func init() {
    flag.StringVar(&sourceDir, "s", "./", "template directory path")
    flag.StringVar(&dstDir, "d", "./k8s-install", "the target directory path in which the generated files are put")
}

func main() {
    var err error
    flag.Parse()

    // create target directory if not exist
    err = os.MkdirAll(dstDir+"/conf", 0775)
    if err != nil {
        fmt.Printf("create %s error: %s\n", dstDir+"/conf", err)
        return
    }

    err = os.MkdirAll(dstDir+"/manifests", 0775)
    if err != nil {
        fmt.Printf("create %s error: %s\n", dstDir+"/manifests", err)
        return
    }

    // override manifests files with same config item in custom/manifests.yml,
    // store the final result to the target directory
    err = mergeManifestsFiles()
    if err != nil {
        fmt.Printf("override and generate manifests files error: %s\n", err)
        return
    }
    fmt.Printf("override and generate manifests files ok\n")
}

我们看到main包利用标准库flag包创建了两个命令行参数-s和-d，分别代表存放templates/custom的源路径和存储生成文件的目标路径。进入main函数后，我们首先在目标路径下建立manifests和conf目录用于分别存储相关配置文件（本例中，conf目录下不生成任何文件），然后main函数调用mergeManifestsFiles对源路径下的templates/manifests中的yml文件与custom/manifests.yml进行合并：

// github.com/bigwhite/experiments/tree/master/yml-merge-using-viper/main.go

var (
    manifestFiles = []string{
        "a.yml",
        "b.yml",
    }
)

func mergeManifestsFiles() error {
    for _, file := range manifestFiles {
        // check whether the file exist
        srcFile := sourceDir + "/templates/manifests/" + file
        _, err := os.Stat(srcFile)
        if os.IsNotExist(err) {
            fmt.Printf("%s not exist, ignore it\n", srcFile)
            continue
        }

        err = mergeConfig("yml", sourceDir+"/templates/manifests", strings.TrimSuffix(file, ".yml"),
            sourceDir+"/custom", "manifests", dstDir+"/manifests/"+file)
        if err != nil {
            fmt.Println("mergeConfig error: ", err)
            return err
        }
        fmt.Printf("mergeConfig %s ok\n", file)

    }
    return nil
}

我们看到mergeManifestsFiles遍历模板文件，并针对每个文件调用一次真正进行yml文件merge的函数mergeConfig：

// github.com/bigwhite/experiments/tree/master/yml-merge-using-viper/main.go

func mergeConfig(configType, srcPath, srcFile, overridePath, overrideFile, target string) error {
    v1 := viper.New()
    v1.SetConfigType(configType) // e.g. "yml"
    v1.AddConfigPath(srcPath)    // file directory
    v1.SetConfigName(srcFile)    // filename(without postfix)
    err := v1.ReadInConfig()
    if err != nil {
        return err
    }

    v2 := viper.New()
    v2.SetConfigType(configType)
    v2.AddConfigPath(overridePath)
    v2.SetConfigName(overrideFile)
    err = v2.ReadInConfig()
    if err != nil {
        return err
    }

    overrideKeys := v2.AllKeys()

    // override special keys
    prefixKey := srcFile + "." + configType + "." // e.g "a.yml."
    for _, key := range overrideKeys {
        if !strings.HasPrefix(key, prefixKey) {
            continue
        }

        stripKey := strings.TrimPrefix(key, prefixKey)
        val := v2.Get(key)
        v1.Set(stripKey, val)
    }

    // write the final result after overriding
    return v1.WriteConfigAs(target)
}

我们看到：mergeConfig函数针对templates/manifests下的文件和custom下的manifests.yml文件创建了两个viper实例(viper.New())并分别加载各自的配置数据。然后遍历custom下manifests.yml中的key，将符合要求的配置项的值set到代表对templates/manifests下文件的viper实例中，最后我们将merge后的viper实例数据写到目标文件中。

编译运行该生成工具：

$make
go build
$./generator
mergeConfig a.yml ok
mergeConfig b.yml ok
override and generate manifests files ok

在默认命令行参数的情况下，文件被生成在k8s-install路径下，我们查看一下生成的文件：

$cat k8s-install/manifests/a.yml
apiversion: v1
kind: Namespace
metadata:
    name: foo

$cat k8s-install/manifests/b.yml
log:
    access_log: access.log
    compress: false
    error_log: error.log
    level: 1
    max_age: 3
    maxbackups: 7
    maxsize: 100
    type: file

我们看到merge的结果与我们预期的一致(字段顺序不一致没关系，这与viper内部存储key-value时使用go map有关，go map的遍历顺序是随机的)。

不过细心的朋友可能会发现一处问题：那就是a.yml中原先的apiVersion在结果文件中变成了小写的apiversion，这会a.yml在提交给k8s时校验失败！

为什么会这样呢？viper官方给出的说明如下(机翻)：

Viper合并了来自不同来源的配置，其中许多配置是不区分大小写的，或者使用与其他来源不同的大小写（例如，env vars）。为了在使用多个资源时提供最佳体验，我们决定让所有按键不区分大小写。

已经有一些人试图实现大小写敏感，但不幸的是，这不是那么简单的事情。我们可能会在Viper v2中试着实现它。。。。

好吧，既然官方说在v2可能支持，但v2又遥遥无期，我们就用viper的fork版本来解决这个问题吧！开发者lnashier曾因这个大小写问题fork过一份viper代码并fix了这个问题，虽然比较old(且可能改的不全面)，但能满足我们的要求就行！我们来试试将spf13/viper换为lnashier/viper，并重新构建和执行generator：

$go mod tidy
go: finding module for package github.com/lnashier/viper
go: found github.com/lnashier/viper in github.com/lnashier/viper v0.0.0-20180730210402-cc7336125d12

$make clean
rm -fr generator k8s-install

$make
go build 

$./generator
mergeConfig a.yml ok
mergeConfig b.yml ok
override and generate manifests files ok

$cat k8s-install/manifests/a.yml
apiVersion: v1
kind: Namespace
metadata:
  name: foo

$cat k8s-install/manifests/b.yml
log:
  access_log: access.log
  compress: false
  error_log: error.log
  level: 1
  max_age: 3
  maxbackups: 7
  maxsize: 100
  type: file

我们看到更换为lnashier/viper后，a.yml中的apiVersion这个key没有再被改为小写。

这个工具基本可以使用了。但是这个工具是否没有问题了呢？很遗憾不是的！当generator面对下面的两种形式的配置文件时就会生成错误的文件：

//c.yml

apiVersion: v1
data:
  .dockerconfigjson: xxxxyyyyyzzz==
kind: Secret
type: kubernetes.io/dockerconfigjson
metadata:
  name: mysecret
  namespace: foo

和

//d.yml

apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-conf
  namespace: foo
data:
  my-nginx.conf: |
    server {
          listen 80;
          client_body_timeout 60000;
          client_max_body_size 1024m;
          send_timeout 60000;
          proxy_headers_hash_bucket_size 1024;
          proxy_headers_hash_max_size 4096;
          proxy_read_timeout 60000;
          location /dashboard {
             proxy_pass http://localhost:8081;
          }
    }

这两个问题就比较棘手了，lnashier/viper也无法解决。我也只能fork lnashier/viper到bigwhite/viper自己解决这个问题，并且像d.yml这样的配置形式十分特化，不具有通用性，因此bigwhite/viper并不具有通用性，这里就不细说了，有兴趣的朋友可以自行阅读代码(commit diff)来查看解决上述问题的方法。

本文涉及的代码可以从这里下载。

后记：

kustomize

kustomize是k8s官方工具，它可以让你基于k8s resource模板YAML文件(类似本文的templates/manifests目录下的文件)并结合kustomization.yaml(类似custom/manifests.yaml)为多种目的定制YAML文件，原始的YAML不会进行任何改动。

不过它的目标仅仅是k8s相关的yaml文件，对于我们的业务服务配置可能无能为力。

CUE数据配置语言

CUE是这两年流行起来的一种强大的声明性配置语言，它由前Go核心团队成员Marcel van Lohuizen创建，他曾与人合作创建了Borg配置语言（BCL）–在谷歌用于部署所有应用程序的语言。CUE是谷歌多年编写配置语言经验的结晶，旨在改善开发者的体验，同时避免一些陷阱。它是JSON的超集且还具有额外的功能特性。Docker之父Solomon Hykes的新创业项目dagger大量使用CUE，阿里力推的企业云原生应用管理平台kubevela也是CUE的重度用户。

关于如何使用CUE来替代我上述的方案，还待后续深入研究。

“Gopher部落”知识星球旨在打造一个精品Go学习和进阶社群！高品质首发Go技术文章，“三天”首发阅读权，每年两期Go语言发展现状分析，每天提前1小时阅读到新鲜的Gopher日报，网课、技术专栏、图书内容前瞻，六小时内必答保证等满足你关于Go语言生态的所有需求！2022年，Gopher部落全面改版，将持续分享Go语言与Go应用领域的知识、技巧与实践，并增加诸多互动形式。欢迎大家加入！

我爱发短信：企业级短信平台定制开发专家 https://tonybai.com/。smspush : 可部署在企业内部的定制化短信平台，三网覆盖，不惧大并发接入，可定制扩展；短信内容你来定，不再受约束, 接口丰富，支持长短信，签名可选。2020年4月8日，中国三大电信运营商联合发布《5G消息白皮书》，51短信平台也会全新升级到“51商用消息平台”，全面支持5G RCS消息。

Gopher Daily(Gopher每日新闻)归档仓库 – https://github.com/bigwhite/gopherdaily

我的联系方式：

微博：https://weibo.com/bigwhite20xx
博客：tonybai.com
github: https://github.com/bigwhite

商务合作方式：撰稿、出书、培训、在线课程、合伙创业、咨询、广告合作。

使用Go开发Kubernetes Operator：基本结构

bigwhite — Mon, 15 Aug 2022 14:47:40 +0000

本文永久链接 – https://tonybai.com/2022/08/15/developing-kubernetes-operators-in-go-part1

注：文章首图基于《Kubernetes Operators Explained》修改

几年前，我还称Kubernetes为服务编排和容器调度领域的事实标准，如今K8s已经是这个领域的“霸主”，地位无可撼动。不过，虽然Kubernetes发展演化到今天已经变得非常复杂，但是Kubernetes最初的数据模型、应用模式与扩展方式却依然有效。并且像Operator这样的应用模式和扩展方式日益受到开发者与运维者的欢迎。

我们的平台内部存在有状态(stateful)的后端服务，对有状态的服务的部署和运维是k8s operator的拿手好戏，是时候来研究一下operator了。

一. Operator的优点

kubernetes operator的概念最初来自CoreOS – 一家被红帽(redhat)收购的容器技术公司。

CoreOS在引入Operator概念的同时，也给出了Operator的第一批参考实现：etcd operator和prometheus operator。

注：etcd于2013年由CoreOS以开源形式发布；prometheus作为首款面向云原生服务的时序数据存储与监控系统，由SoundCloud公司于2012年以开源的形式发布。

下面是CoreOS对Operator这一概念的诠释：Operator在软件中代表了人类的运维操作知识，通过它可以可靠地管理一个应用程序。

图：CoreOS对operator的诠释(截图来自CoreOS官方博客归档)

Operator出现的初衷就是用来解放运维人员的，如今Operator也越来越受到云原生运维开发人员的青睐。

那么operator好处究竟在哪里呢？下面示意图对使用Operator和不使用Operator进行了对比：

通过这张图，即便对operator不甚了解，你也能大致感受到operator的优点吧。

我们看到在使用operator的情况下，对有状态应用的伸缩操作(这里以伸缩操作为例，也可以是其他诸如版本升级等对于有状态应用来说的“复杂”操作)，运维人员仅需一个简单的命令即可，运维人员也无需知道k8s内部对有状态应用的伸缩操作的原理是什么。

在没有使用operator的情况下，运维人员需要对有状态应用的伸缩的操作步骤有深刻的认知，并按顺序逐个执行一个命令序列中的命令并检查命令响应，遇到失败的情况时还需要进行重试，直到伸缩成功。

我们看到operator就好比一个内置于k8s中的经验丰富运维人员，时刻监控目标对象的状态，把复杂性留给自己，给运维人员一个简洁的交互接口，同时operator也能降低运维人员因个人原因导致的操作失误的概率。

不过，operator虽好，但开发门槛却不低。开发门槛至少体现在如下几个方面：

对operator概念的理解是基于对k8s的理解的基础之上的，而k8s自从2014年开源以来，变的日益复杂，理解起来需要一定时间投入；
从头手撸operator很verbose，几乎无人这么做，大多数开发者都会去学习相应的开发框架与工具，比如：kubebuilder、operator framework sdk等；
operator的能力也有高低之分，operator framework就提出了一个包含五个等级的operator能力模型(CAPABILITY MODEL)，见下图。使用Go开发高能力等级的operator需要对client-go这个kubernetes官方go client库中的API有深入的了解。

图：operator能力模型(截图来自operator framework官网)

当然在这些门槛当中，对operator概念的理解既是基础也是前提，而理解operator的前提又是对kubernetes的诸多概念要有深入理解，尤其是resource、resource type、API、controller以及它们之间的关系。接下来我们就来快速介绍一下这些概念。

二. Kubernetes resource、resource type、API和controller介绍

Kubernetes发展到今天，其本质已经显现：

Kubernetes就是一个“数据库”(数据实际持久存储在etcd中)；
其API就是“sql语句”；
API设计采用基于resource的Restful风格, resource type是API的端点(endpoint)；
每一类resource(即Resource Type)是一张“表”，Resource Type的spec对应“表结构”信息(schema)；
每张“表”里的一行记录就是一个resource，即该表对应的Resource Type的一个实例(instance)；
Kubernetes这个“数据库”内置了很多“表”，比如Pod、Deployment、DaemonSet、ReplicaSet等；

下面是一个Kubernetes API与resource关系的示意图：

我们看到resource type有两类，一类的namespace相关的(namespace-scoped)，我们通过下面形式的API操作这类resource type的实例：

VERB /apis/GROUP/VERSION/namespaces/NAMESPACE/RESOURCETYPE - 操作某特定namespace下面的resouce type中的resource实例集合
VERB /apis/GROUP/VERSION/namespaces/NAMESPACE/RESOURCETYPE/NAME - 操作某特定namespace下面的resource type中的某个具体的resource实例

另外一类则是namespace无关，即cluster范围(cluster-scoped)的，我们通过下面形式的API对这类resource type的实例进行操作：

VERB /apis/GROUP/VERSION/RESOURCETYPE - 操作resouce type中的resource实例集合
VERB /apis/GROUP/VERSION/RESOURCETYPE/NAME - 操作resource type中的某个具体的resource实例

我们知道Kubernetes并非真的只是一个“数据库”，它是服务编排和容器调度的平台标准，它的基本调度单元是Pod(也是一个resource type)，即一组容器的集合。那么Pod又是如何被创建、更新和删除的呢？这就离不开控制器(controller)了。每一类resource type都有自己对应的控制器(controller)。以pod这个resource type为例，它的controller为ReplicasSet的实例。

控制器的运行逻辑如下图所示：

图：控制器运行逻辑(引自《Kubernetes Operators Explained》一文)

控制器一旦启动，将尝试获得resource的当前状态(current state)，并与存储在k8s中的resource的期望状态（desired state，即spec)做比对，如果不一致，controller就会调用相应API进行调整，尽力使得current state与期望状态达成一致。这个达成一致的过程被称为协调(reconciliation)，协调过程的伪代码逻辑如下：

for {
    desired := getDesiredState()
    current := getCurrentState()
    makeChanges(desired, current)
}

注：k8s中有一个object的概念？那么object是什么呢？它类似于Java Object基类或Ruby中的Object超类。不仅resource type的实例resource是一个(is-a)object，resource type本身也是一个object，它是kubernetes concept的实例。

有了上面对k8s这些概念的初步理解，我们下面就来理解一下Operator究竟是什么！

三. Operator模式 = 操作对象(CRD) + 控制逻辑(controller)

如果让运维人员直面这些内置的resource type(如deployment、pod等)，也就是前面“使用operator vs. 不使用operator”对比图中的第二种情况, 运维人员面临的情况将会很复杂，且操作易错。

那么如果不直面内置的resource type，那么我们如何自定义resource type呢, Kubernetes提供了Custom Resource Definition，CRD(在coreos刚提出operator概念的时候，crd的前身是Third Party Resource, TPR)可以用于自定义resource type。

根据前面我们对resource type理解，定义CRD相当于建立新“表”(resource type)，一旦CRD建立，k8s会为我们自动生成对应CRD的API endpoint，我们就可以通过yaml或API来操作这个“表”。我们可以向“表”中“插入”数据，即基于CRD创建Custom Resource(CR)，这就好比我们创建Deployment实例，向Deployment“表”中插入数据一样。

和原生内置的resource type一样，光有存储对象状态的CR还不够，原生resource type有对应controller负责协调(reconciliation)实例的创建、伸缩与删除，CR也需要这样的“协调者”，即我们也需要定义一个controller来负责监听CR状态并管理CR创建、伸缩、删除以及保持期望状态(spec)与当前状态(current state)的一致。这个controller不再是面向原生Resource type的实例，而是面向CRD的实例CR的controller。

有了自定义的操作对象类型(CRD)，有了面向操作对象类型实例的controller，我们将其打包为一个概念：“Operator模式”，operator模式中的controller也被称为operator，它是在集群中对CR进行维护操作的主体。

四. 使用kubebuilder开发webserver operator

假设：此时你的本地开发环境已经具备访问实验用k8s环境的一切配置，通过kubectl工具可以任意操作k8s。

再深入浅出的概念讲解都不如一次实战对理解概念更有帮助，下面我们就来开发一个简单的Operator。

前面提过operator开发非常verbose，因此社区提供了开发工具和框架来帮助开发人员简化开发过程，目前主流的包括operator framework sdk和kubebuilder，前者是redhat开源并维护的一套工具，支持使用go、ansible、helm进行operator开发(其中只有go可以开发到能力级别5的operator，其他两种则不行)；而kubebuilder则是kubernetes官方的一个sig(特别兴趣小组)维护的operator开发工具。目前基于operator framework sdk和go进行operator开发时，operator sdk底层使用的也是kubebuilder，所以这里我们就直接使用kubebuilder来开发operator。

按照operator能力模型，我们这个operator差不多处于2级这个层次，我们定义一个Webserver的resource type，它代表的是一个基于nginx的webserver集群，我们的operator支持创建webserver示例(一个nginx集群)，支持nginx集群伸缩，支持集群中nginx的版本升级。

下面我们就用kubebuilder来实现这个operator！

1. 安装kubebuilder

这里我们采用源码构建方式安装，步骤如下：

$git clone git@github.com:kubernetes-sigs/kubebuilder.git
$cd kubebuilder
$make
$cd bin
$./kubebuilder version
Version: main.version{KubeBuilderVersion:"v3.5.0-101-g5c949c2e",
KubernetesVendor:"unknown",
GitCommit:"5c949c2e50ca8eec80d64878b88e1b2ee30bf0bc",
BuildDate:"2022-08-06T09:12:50Z", GoOs:"linux", GoArch:"amd64"}

然后将bin/kubebuilder拷贝到你的PATH环境变量中的某个路径下即可。

2. 创建webserver-operator工程

接下来，我们就可以使用kubebuilder创建webserver-operator工程了：

$mkdir webserver-operator
$cd webserver-operator
$kubebuilder init  --repo github.com/bigwhite/webserver-operator --project-name webserver-operator

Writing kustomize manifests for you to edit...
Writing scaffold for you to edit...
Get controller runtime:
$ go get sigs.k8s.io/controller-runtime@v0.12.2
go: downloading k8s.io/client-go v0.24.2
go: downloading k8s.io/component-base v0.24.2
Update dependencies:
$ go mod tidy
Next: define a resource with:
kubebuilder create api

注：–repo指定go.mod中的module root path，你可以定义你自己的module root path。

3. 创建API，生成初始CRD

Operator包括CRD和controller，这里我们就来建立自己的CRD，即自定义的resource type，也就是API的endpoint，我们使用下面kubebuilder create命令来完成这个步骤：

$kubebuilder create api --version v1 --kind WebServer
Create Resource [y/n]
y
Create Controller [y/n]
y
Writing kustomize manifests for you to edit...
Writing scaffold for you to edit...
api/v1/webserver_types.go
controllers/webserver_controller.go
Update dependencies:
$ go mod tidy
Running make:
$ make generate
mkdir -p /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin
test -s /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/controller-gen || GOBIN=/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
Next: implement your new API and generate the manifests (e.g. CRDs,CRs) with:
$ make manifests

之后，我们执行make manifests来生成最终CRD对应的yaml文件：

$make manifests
/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases

此刻，整个工程的目录文件布局如下：

$tree -F .
.
├── api/
│   └── v1/
│       ├── groupversion_info.go
│       ├── webserver_types.go
│       └── zz_generated.deepcopy.go
├── bin/
│   └── controller-gen*
├── config/
│   ├── crd/
│   │   ├── bases/
│   │   │   └── my.domain_webservers.yaml
│   │   ├── kustomization.yaml
│   │   ├── kustomizeconfig.yaml
│   │   └── patches/
│   │       ├── cainjection_in_webservers.yaml
│   │       └── webhook_in_webservers.yaml
│   ├── default/
│   │   ├── kustomization.yaml
│   │   ├── manager_auth_proxy_patch.yaml
│   │   └── manager_config_patch.yaml
│   ├── manager/
│   │   ├── controller_manager_config.yaml
│   │   ├── kustomization.yaml
│   │   └── manager.yaml
│   ├── prometheus/
│   │   ├── kustomization.yaml
│   │   └── monitor.yaml
│   ├── rbac/
│   │   ├── auth_proxy_client_clusterrole.yaml
│   │   ├── auth_proxy_role_binding.yaml
│   │   ├── auth_proxy_role.yaml
│   │   ├── auth_proxy_service.yaml
│   │   ├── kustomization.yaml
│   │   ├── leader_election_role_binding.yaml
│   │   ├── leader_election_role.yaml
│   │   ├── role_binding.yaml
│   │   ├── role.yaml
│   │   ├── service_account.yaml
│   │   ├── webserver_editor_role.yaml
│   │   └── webserver_viewer_role.yaml
│   └── samples/
│       └── _v1_webserver.yaml
├── controllers/
│   ├── suite_test.go
│   └── webserver_controller.go
├── Dockerfile
├── go.mod
├── go.sum
├── hack/
│   └── boilerplate.go.txt
├── main.go
├── Makefile
├── PROJECT
└── README.md

14 directories, 40 files

4. webserver-operator的基本结构

忽略我们此次不关心的诸如leader election、auth_proxy等，我将这个operator例子的主要部分整理到下面这张图中：

图中的各个部分就是使用kubebuilder生成的operator的基本结构。

webserver operator主要由CRD和controller组成：

图中的左下角的框框就是上面生成的CRD yaml文件：config/crd/bases/my.domain_webservers.yaml。CRD与api/v1/webserver_types.go密切相关。我们在api/v1/webserver_types.go中为CRD定义spec相关字段，之后make manifests命令可以解析webserver_types.go中的变化并更新CRD的yaml文件。

controller

从图的右侧部分可以看出，controller自身就是作为一个deployment部署在k8s集群中运行的，它监视CRD的实例CR的运行状态，并在Reconcile方法中检查预期状态与当前状态是否一致，如果不一致，则执行相关操作。

其它

图中左上角是有关controller的权限的设置，controller通过serviceaccount访问k8s API server，通过role.yaml和role_binding.yaml设置controller的角色和权限。

5. 为CRD spec添加字段(field)

为了实现Webserver operator的功能目标，我们需要为CRD spec添加一些状态字段。前面说过，CRD与api中的webserver_types.go文件是同步的，我们只需修改webserver_types.go文件即可。我们在WebServerSpec结构体中增加Replicas和Image两个字段，它们分别用于表示webserver实例的副本数量以及使用的容器镜像：

// api/v1/webserver_types.go

// WebServerSpec defines the desired state of WebServer
type WebServerSpec struct {
    // INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
    // Important: Run "make" to regenerate code after modifying this file

    // The number of replicas that the webserver should have
    Replicas int `json:"replicas,omitempty"`

    // The container image of the webserver
    Image string `json:"image,omitempty"`

    // Foo is an example field of WebServer. Edit webserver_types.go to remove/update
    Foo string `json:"foo,omitempty"`
}

保存修改后，执行make manifests重新生成config/crd/bases/my.domain_webservers.yaml

$cat my.domain_webservers.yaml
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    controller-gen.kubebuilder.io/version: v0.9.2
  creationTimestamp: null
  name: webservers.my.domain
spec:
  group: my.domain
  names:
    kind: WebServer
    listKind: WebServerList
    plural: webservers
    singular: webserver
  scope: Namespaced
  versions:
  - name: v1
    schema:
      openAPIV3Schema:
        description: WebServer is the Schema for the webservers API
        properties:
          apiVersion:
            description: 'APIVersion defines the versioned schema of this representation
              of an object. Servers should convert recognized schemas to the latest
              internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
            type: string
          kind:
            description: 'Kind is a string value representing the REST resource this
              object represents. Servers may infer this from the endpoint the client
              submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
            type: string
          metadata:
            type: object
          spec:
            description: WebServerSpec defines the desired state of WebServer
            properties:
              foo:
                description: Foo is an example field of WebServer. Edit webserver_types.go
                  to remove/update
                type: string
              image:
                description: The container image of the webserver
                type: string
              replicas:
                description: The number of replicas that the webserver should have
                type: integer
            type: object
          status:
            description: WebServerStatus defines the observed state of WebServer
            type: object
        type: object
    served: true
    storage: true
    subresources:
      status: {}

一旦定义完CRD，我们就可以将其安装到k8s中：

$make install
/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
test -s /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/kustomize || { curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash -s -- 3.8.7 /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin; }
{Version:kustomize/v3.8.7 GitCommit:ad092cc7a91c07fdf63a2e4b7f13fa588a39af4f BuildDate:2020-11-11T23:14:14Z GoOs:linux GoArch:amd64}
kustomize installed to /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/kustomize
/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/kustomize build config/crd | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/webservers.my.domain created

检查安装情况：

$kubectl get crd|grep webservers
webservers.my.domain                                             2022-08-06T21:55:45Z

6. 修改role.yaml

在开始controller开发之前，我们先来为controller后续的运行“铺平道路”，即设置好相应权限。

我们在controller中会为CRD实例创建对应deployment和service，这样就要求controller有操作deployments和services的权限，这样就需要我们修改role.yaml，增加service account: controller-manager 操作deployments和services的权限：

// config/rbac/role.yaml
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: null
  name: manager-role
rules:
- apiGroups:
  - my.domain
  resources:
  - webservers
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - my.domain
  resources:
  - webservers/finalizers
  verbs:
  - update
- apiGroups:
  - my.domain
  resources:
  - webservers/status
  verbs:
  - get
  - patch
  - update
- apiGroups:
  - apps
  resources:
  - deployments
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - apps
  - ""
  resources:
  - services
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch

修改后的role.yaml先放在这里，后续与controller一并部署到k8s上。

7. 实现controller的Reconcile(协调)逻辑

kubebuilder为我们搭好了controller的代码架子，我们只需要在controllers/webserver_controller.go中实现WebServerReconciler的Reconcile方法即可。下面是Reconcile的一个简易流程图，结合这幅图理解代码就容易的多了：

下面是对应的Reconcile方法的代码：

// controllers/webserver_controller.go

func (r *WebServerReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := r.Log.WithValues("Webserver", req.NamespacedName)

    instance := &mydomainv1.WebServer{}
    err := r.Get(ctx, req.NamespacedName, instance)
    if err != nil {
        if errors.IsNotFound(err) {
            // Request object not found, could have been deleted after reconcile request.
            // Return and don't requeue
            log.Info("Webserver resource not found. Ignoring since object must be deleted")
            return ctrl.Result{}, nil
        }

        // Error reading the object - requeue the request.
        log.Error(err, "Failed to get Webserver")
        return ctrl.Result{RequeueAfter: time.Second * 5}, err
    }

    // Check if the webserver deployment already exists, if not, create a new one
    found := &appsv1.Deployment{}
    err = r.Get(ctx, types.NamespacedName{Name: instance.Name, Namespace: instance.Namespace}, found)
    if err != nil && errors.IsNotFound(err) {
        // Define a new deployment
        dep := r.deploymentForWebserver(instance)
        log.Info("Creating a new Deployment", "Deployment.Namespace", dep.Namespace, "Deployment.Name", dep.Name)
        err = r.Create(ctx, dep)
        if err != nil {
            log.Error(err, "Failed to create new Deployment", "Deployment.Namespace", dep.Namespace, "Deployment.Name", dep.Name)
            return ctrl.Result{RequeueAfter: time.Second * 5}, err
        }
        // Deployment created successfully - return and requeue
        return ctrl.Result{Requeue: true}, nil
    } else if err != nil {
        log.Error(err, "Failed to get Deployment")
        return ctrl.Result{RequeueAfter: time.Second * 5}, err
    }

    // Ensure the deployment replicas and image are the same as the spec
    var replicas int32 = int32(instance.Spec.Replicas)
    image := instance.Spec.Image

    var needUpd bool
    if *found.Spec.Replicas != replicas {
        log.Info("Deployment spec.replicas change", "from", *found.Spec.Replicas, "to", replicas)
        found.Spec.Replicas = &replicas
        needUpd = true
    }

    if (*found).Spec.Template.Spec.Containers[0].Image != image {
        log.Info("Deployment spec.template.spec.container[0].image change", "from", (*found).Spec.Template.Spec.Containers[0].Image, "to", image)
        found.Spec.Template.Spec.Containers[0].Image = image
        needUpd = true
    }

    if needUpd {
        err = r.Update(ctx, found)
        if err != nil {
            log.Error(err, "Failed to update Deployment", "Deployment.Namespace", found.Namespace, "Deployment.Name", found.Name)
            return ctrl.Result{RequeueAfter: time.Second * 5}, err
        }
        // Spec updated - return and requeue
        return ctrl.Result{Requeue: true}, nil
    }

    // Check if the webserver service already exists, if not, create a new one
    foundService := &corev1.Service{}
    err = r.Get(ctx, types.NamespacedName{Name: instance.Name + "-service", Namespace: instance.Namespace}, foundService)
    if err != nil && errors.IsNotFound(err) {
        // Define a new service
        srv := r.serviceForWebserver(instance)
        log.Info("Creating a new Service", "Service.Namespace", srv.Namespace, "Service.Name", srv.Name)
        err = r.Create(ctx, srv)
        if err != nil {
            log.Error(err, "Failed to create new Servie", "Service.Namespace", srv.Namespace, "Service.Name", srv.Name)
            return ctrl.Result{RequeueAfter: time.Second * 5}, err
        }
        // Service created successfully - return and requeue
        return ctrl.Result{Requeue: true}, nil
    } else if err != nil {
        log.Error(err, "Failed to get Service")
        return ctrl.Result{RequeueAfter: time.Second * 5}, err
    }

    // Tbd: Ensure the service state is the same as the spec, your homework

    // reconcile webserver operator in again 10 seconds
    return ctrl.Result{RequeueAfter: time.Second * 10}, nil
}

这里大家可能发现了：原来CRD的controller最终还是将CR翻译为k8s原生Resource，比如service、deployment等。CR的状态变化(比如这里的replicas、image等)最终都转换成了deployment等原生resource的update操作，这就是operator的精髓！理解到这一层，operator对大家来说就不再是什么密不可及的概念了。

有些朋友可能也会发现，上面流程图中似乎没有考虑CR实例被删除时对deployment、service的操作，的确如此。不过对于一个7×24小时运行于后台的服务来说，我们更多关注的是其变更、伸缩、升级等操作，删除是优先级最低的需求。

8. 构建controller image

controller代码写完后，我们就来构建controller的image。通过前文我们知道，这个controller其实就是运行在k8s中的一个deployment下的pod。我们需要构建其image并通过deployment部署到k8s中。

kubebuilder创建的operator工程中包含了Makefile，通过make docker-build即可构建controller image。docker-build使用golang builder image来构建controller源码，不过如果不对Dockerfile稍作修改，你很难编译过去，因为默认GOPROXY在国内无法访问。这里最简单的改造方式是使用vendor构建，下面是改造后的Dockerfile：

# Build the manager binary
FROM golang:1.18 as builder

ENV GOPROXY https://goproxy.cn
WORKDIR /workspace
# Copy the Go Modules manifests
COPY go.mod go.mod
COPY go.sum go.sum
COPY vendor/ vendor/
# cache deps before building and copying source so that we don't need to re-download as much
# and so that source changes don't invalidate our downloaded layer
#RUN go mod download

# Copy the go source
COPY main.go main.go
COPY api/ api/
COPY controllers/ controllers/

# Build
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -mod=vendor -a -o manager main.go

# Use distroless as minimal base image to package the manager binary
# Refer to https://github.com/GoogleContainerTools/distroless for more details
#FROM gcr.io/distroless/static:nonroot
FROM katanomi/distroless-static:nonroot
WORKDIR /
COPY --from=builder /workspace/manager .
USER 65532:65532

ENTRYPOINT ["/manager"]

下面是构建的步骤：

$go mod vendor
$make docker-build

test -s /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/controller-gen || GOBIN=/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
KUBEBUILDER_ASSETS="/home/tonybai/.local/share/kubebuilder-envtest/k8s/1.24.2-linux-amd64" go test ./... -coverprofile cover.out
?       github.com/bigwhite/webserver-operator    [no test files]
?       github.com/bigwhite/webserver-operator/api/v1    [no test files]
ok      github.com/bigwhite/webserver-operator/controllers    4.530s    coverage: 0.0% of statements
docker build -t bigwhite/webserver-controller:latest .
Sending build context to Docker daemon  47.51MB
Step 1/15 : FROM golang:1.18 as builder
 ---> 2d952adaec1e
Step 2/15 : ENV GOPROXY https://goproxy.cn
 ---> Using cache
 ---> db2b06a078e3
Step 3/15 : WORKDIR /workspace
 ---> Using cache
 ---> cc3c613c19c6
Step 4/15 : COPY go.mod go.mod
 ---> Using cache
 ---> 5fa5c0d89350
Step 5/15 : COPY go.sum go.sum
 ---> Using cache
 ---> 71669cd0fe8e
Step 6/15 : COPY vendor/ vendor/
 ---> Using cache
 ---> 502b280a0e67
Step 7/15 : COPY main.go main.go
 ---> Using cache
 ---> 0c59a69091bb
Step 8/15 : COPY api/ api/
 ---> Using cache
 ---> 2b81131c681f
Step 9/15 : COPY controllers/ controllers/
 ---> Using cache
 ---> e3fd48c88ccb
Step 10/15 : RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -mod=vendor -a -o manager main.go
 ---> Using cache
 ---> 548ac10321a2
Step 11/15 : FROM katanomi/distroless-static:nonroot
 ---> 421f180b71d8
Step 12/15 : WORKDIR /
 ---> Running in ea7cb03027c0
Removing intermediate container ea7cb03027c0
 ---> 9d3c0ea19c3b
Step 13/15 : COPY --from=builder /workspace/manager .
 ---> a4387fe33ab7
Step 14/15 : USER 65532:65532
 ---> Running in 739a32d251b6
Removing intermediate container 739a32d251b6
 ---> 52ae8742f9c5
Step 15/15 : ENTRYPOINT ["/manager"]
 ---> Running in 897893b0c9df
Removing intermediate container 897893b0c9df
 ---> e375cc2adb08
Successfully built e375cc2adb08
Successfully tagged bigwhite/webserver-controller:latest

注：执行make命令之前，先将Makefile中的IMG变量初值改为IMG ?= bigwhite/webserver-controller:latest

构建成功后，执行make docker-push将image推送到镜像仓库中(这里使用了docker公司提供的公共仓库)。

9. 部署controller

之前我们已经通过make install将CRD安装到k8s中了，接下来再把controller部署到k8s上，我们的operator就算部署完毕了。执行make deploy即可实现部署：

$make deploy
test -s /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/controller-gen || GOBIN=/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
test -s /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/kustomize || { curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash -s -- 3.8.7 /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin; }
cd config/manager && /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/kustomize edit set image controller=bigwhite/webserver-controller:latest
/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/kustomize build config/default | kubectl apply -f -
namespace/webserver-operator-system created
customresourcedefinition.apiextensions.k8s.io/webservers.my.domain unchanged
serviceaccount/webserver-operator-controller-manager created
role.rbac.authorization.k8s.io/webserver-operator-leader-election-role created
clusterrole.rbac.authorization.k8s.io/webserver-operator-manager-role created
clusterrole.rbac.authorization.k8s.io/webserver-operator-metrics-reader created
clusterrole.rbac.authorization.k8s.io/webserver-operator-proxy-role created
rolebinding.rbac.authorization.k8s.io/webserver-operator-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/webserver-operator-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/webserver-operator-proxy-rolebinding created
configmap/webserver-operator-manager-config created
service/webserver-operator-controller-manager-metrics-service created
deployment.apps/webserver-operator-controller-manager created

我们看到deploy不仅会安装controller、serviceaccount、role、rolebinding，它还会创建namespace，也会将crd安装一遍。也就是说deploy是一个完整的operator安装命令。

注：使用make undeploy可以完整卸载operator相关resource。

我们用kubectl logs查看一下controller的运行日志：

$kubectl logs -f deployment.apps/webserver-operator-controller-manager -n webserver-operator-system
1.6600280818476188e+09    INFO    controller-runtime.metrics    Metrics server is starting to listen    {"addr": "127.0.0.1:8080"}
1.6600280818478029e+09    INFO    setup    starting manager
1.6600280818480284e+09    INFO    Starting server    {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
1.660028081848097e+09    INFO    Starting server    {"kind": "health probe", "addr": "[::]:8081"}
I0809 06:54:41.848093       1 leaderelection.go:248] attempting to acquire leader lease webserver-operator-system/63e5a746.my.domain...
I0809 06:54:57.072336       1 leaderelection.go:258] successfully acquired lease webserver-operator-system/63e5a746.my.domain
1.6600280970724037e+09    DEBUG    events    Normal    {"object": {"kind":"Lease","namespace":"webserver-operator-system","name":"63e5a746.my.domain","uid":"e05aaeb5-4a3a-4272-b036-80d61f0b6788","apiVersion":"coordination.k8s.io/v1","resourceVersion":"5238800"}, "reason": "LeaderElection", "message": "webserver-operator-controller-manager-6f45bc88f7-ptxlc_0e960015-9fbe-466d-a6b1-ff31af63a797 became leader"}
1.6600280970724993e+09    INFO    Starting EventSource    {"controller": "webserver", "controllerGroup": "my.domain", "controllerKind": "WebServer", "source": "kind source: *v1.WebServer"}
1.6600280970725305e+09    INFO    Starting Controller    {"controller": "webserver", "controllerGroup": "my.domain", "controllerKind": "WebServer"}
1.660028097173026e+09    INFO    Starting workers    {"controller": "webserver", "controllerGroup": "my.domain", "controllerKind": "WebServer", "worker count": 1}

可以看到，controller已经成功启动，正在等待一个WebServer CR的相关事件(比如创建)！下面我们就来创建一个WebServer CR!

10. 创建WebServer CR

webserver-operator项目中有一个CR sample，位于config/samples下面，我们对其进行改造，添加我们在spec中加入的字段：

// config/samples/_v1_webserver.yaml 

apiVersion: my.domain/v1
kind: WebServer
metadata:
  name: webserver-sample
spec:
  # TODO(user): Add fields here
  image: nginx:1.23.1
  replicas: 3

我们通过kubectl创建该WebServer CR：

$cd config/samples
$kubectl apply -f _v1_webserver.yaml
webserver.my.domain/webserver-sample created

观察controller的日志：

1.6602084232243123e+09  INFO    controllers.WebServer   Creating a new Deployment   {"Webserver": "default/webserver-sample", "Deployment.Namespace": "default", "Deployment.Name": "webserver-sample"}
1.6602084233446114e+09  INFO    controllers.WebServer   Creating a new Service  {"Webserver": "default/webserver-sample", "Service.Namespace": "default", "Service.Name": "webserver-sample-service"}

我们看到当CR被创建后，controller监听到相关事件，创建了对应的Deployment和service，我们查看一下为CR创建的Deployment、三个Pod以及service：

$kubectl get service
NAME                       TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
kubernetes                 ClusterIP   172.26.0.1             443/TCP        22d
webserver-sample-service   NodePort    172.26.173.0           80:30010/TCP   2m58s

$kubectl get deployment
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
webserver-sample   3/3     3            3           4m44s

$kubectl get pods
NAME                               READY   STATUS    RESTARTS   AGE
webserver-sample-bc698b9fb-8gq2h   1/1     Running   0          4m52s
webserver-sample-bc698b9fb-vk6gw   1/1     Running   0          4m52s
webserver-sample-bc698b9fb-xgrgb   1/1     Running   0          4m52s

我们访问一下该服务：

$curl http://192.168.10.182:30010



Welcome to nginx!



Welcome to nginx!
If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.

For online documentation and support please refer to
nginx.org.

Commercial support is available at
nginx.com.

Thank you for using nginx.

服务如预期返回响应！

11. 伸缩、变更版本和Service自愈

接下来我们来对CR做一些常见的运维操作。

副本数由3变为4

我们将CR的replicas由3改为4，对容器实例做一次扩展操作：

// config/samples/_v1_webserver.yaml 

apiVersion: my.domain/v1
kind: WebServer
metadata:
  name: webserver-sample
spec:
  # TODO(user): Add fields here
  image: nginx:1.23.1
  replicas: 4

然后通过kubectl apply使之生效：

$kubectl apply -f _v1_webserver.yaml
webserver.my.domain/webserver-sample configured

上述命令执行后，我们观察到operator的controller日志如下：

1.660208962767797e+09   INFO    controllers.WebServer   Deployment spec.replicas change {"Webserver": "default/webserver-sample", "from": 3, "to": 4}

稍后，查看pod数量：

$kubectl get pods
NAME                               READY   STATUS    RESTARTS   AGE
webserver-sample-bc698b9fb-8gq2h   1/1     Running   0          9m41s
webserver-sample-bc698b9fb-v9gvg   1/1     Running   0          42s
webserver-sample-bc698b9fb-vk6gw   1/1     Running   0          9m41s
webserver-sample-bc698b9fb-xgrgb   1/1     Running   0          9m41s

webserver pod副本数量成功从3扩为4。

变更webserver image版本

我们将CR的image的版本从nginx:1.23.1改为nginx:1.23.0，然后执行kubectl apply使之生效。

我们查看controller的响应日志如下：

1.6602090494113188e+09  INFO    controllers.WebServer   Deployment spec.template.spec.container[0].image change {"Webserver": "default/webserver-sample", "from": "nginx:1.23.1", "to": "nginx:1.23.0"}

controller会更新deployment，导致所辖pod进行滚动升级：

$kubectl get pods
NAME                               READY   STATUS              RESTARTS   AGE
webserver-sample-bc698b9fb-8gq2h   1/1     Running             0          10m
webserver-sample-bc698b9fb-vk6gw   1/1     Running             0          10m
webserver-sample-bc698b9fb-xgrgb   1/1     Running             0          10m
webserver-sample-ffcf549ff-g6whk   0/1     ContainerCreating   0          12s
webserver-sample-ffcf549ff-ngjz6   0/1     ContainerCreating   0          12s

耐心等一小会儿，最终的pod列表为：

$kubectl get pods
NAME                               READY   STATUS    RESTARTS   AGE
webserver-sample-ffcf549ff-g6whk   1/1     Running   0          6m22s
webserver-sample-ffcf549ff-m6z24   1/1     Running   0          3m12s
webserver-sample-ffcf549ff-ngjz6   1/1     Running   0          6m22s
webserver-sample-ffcf549ff-t7gvc   1/1     Running   0          4m16s

service自愈：恢复被无删除的Service

我们来一次“误操作”，将webserver-sample-service删除，看看controller能否帮助service自愈：

$kubectl delete service/webserver-sample-service
service "webserver-sample-service" deleted

查看controller日志：

1.6602096994710526e+09  INFO    controllers.WebServer   Creating a new Service  {"Webserver": "default/webserver-sample", "Service.Namespace": "default", "Service.Name": "webserver-sample-service"}

我们看到controller检测到了service被删除的状态，并重建了一个新service！

访问新建的service：

$curl http://192.168.10.182:30010



Welcome to nginx!



Welcome to nginx!
If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.

For online documentation and support please refer to
nginx.org.

Commercial support is available at
nginx.com.

Thank you for using nginx.

可以看到service在controller的帮助下完成了自愈！

五. 小结

本文对Kubernetes Operator的概念以及优点做了初步的介绍，并基于kubebuilder这个工具开发了一个具有2级能力的operator。当然这个operator离完善还有很远的距离，其主要目的还是帮助大家理解operator的概念以及实现套路。

相信你阅读完本文后，对operator，尤其是其基本结构会有一个较为清晰的了解，并具备开发简单operator的能力！

文中涉及的源码可以在这里下载 – https://github.com/bigwhite/experiments/tree/master/webserver-operator。

六. 参考资料

kubernetes operator 101, Part 1: Overview and key features – https://developers.redhat.com/articles/2021/06/11/kubernetes-operators-101-part-1-overview-and-key-features
Kubernetes Operators 101, Part 2: How operators work – https://developers.redhat.com/articles/2021/06/22/kubernetes-operators-101-part-2-how-operators-work
Operator SDK: Build Kubernetes Operators – https://developers.redhat.com/blog/2020/04/28/operator-sdk-build-kubernetes-operators-and-deploy-them-on-openshift
kubernetes doc: Custom Resources – https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/
kubernetes doc: Operator pattern – https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
kubernetes doc: API concepts – https://kubernetes.io/docs/reference/using-api/api-concepts/
Introducing Operators: Putting Operational Knowledge into Software 第一篇有关operator的文章 by coreos – https://web.archive.org/web/20170129131616/https://coreos.com/blog/introducing-operators.html
CNCF Operator白皮书v1.0 – https://github.com/cncf/tag-app-delivery/blob/main/operator-whitepaper/v1/Operator-WhitePaper_v1-0.md
Best practices for building Kubernetes Operators and stateful apps – https://cloud.google.com/blog/products/containers-kubernetes/best-practices-for-building-kubernetes-operators-and-stateful-apps
A deep dive into Kubernetes controllers – https://docs.bitnami.com/tutorials/a-deep-dive-into-kubernetes-controllers
Kubernetes Operators Explained – https://blog.container-solutions.com/kubernetes-operators-explained
书籍《Kubernetes Operator》 – https://book.douban.com/subject/34796009/
书籍《Programming Kubernetes》 – https://book.douban.com/subject/35498478/
Operator SDK Reaches v1.0 – https://cloud.redhat.com/blog/operator-sdk-reaches-v1.0
What is the difference between kubebuilder and operator-sdk – https://github.com/operator-framework/operator-sdk/issues/1758
Kubernetes Operators in Depth – https://www.infoq.com/articles/kubernetes-operators-in-depth/
Get started using Kubernetes Operators – https://developer.ibm.com/learningpaths/kubernetes-operators/
Use Kubernetes operators to extend Kubernetes’ functionality – https://developer.ibm.com/learningpaths/kubernetes-operators/operators-extend-kubernetes/
memcached operator – https://github.com/operator-framework/operator-sdk-samples/tree/master/go/memcached-operator

Gopher Daily(Gopher每日新闻)归档仓库 – https://github.com/bigwhite/gopherdaily

我的联系方式：

微博：https://weibo.com/bigwhite20xx
博客：tonybai.com
github: https://github.com/bigwhite

商务合作方式：撰稿、出书、培训、在线课程、合伙创业、咨询、广告合作。

小厂内部私有Go module拉取方案（续）

bigwhite — Sat, 18 Jun 2022 14:10:10 +0000

本文永久链接 – https://tonybai.com/2022/06/18/the-approach-to-go-get-private-go-module-in-house-part2

自从去年在公司搭建了内部私有Go module proxy后，我们的私有代理工作得基本良好。按理说，这篇续篇本不该存在:)。

日子一天天过去，Go团队逐渐壮大，空气中都充满了“Go的香气”。

突然有一天，业务线考虑将目前在用的gerrit换成gitlab。最初使用gerrit的原因不得而知，但我猜是想使用gerrit强大且独特的code review机制和相应的工作流。不过由于业务需求变化太快，每个迭代的功能都很多，“+2”的review机制到后来就形同虚设了。

如果不用gerrit review工作流，那么gerrit还有什么存在的价值呢。从管理员那边反馈，gerrit配置起来也是比较复杂的，尤其是权限。两者叠加就有了迁移到gitlab的想法。这样摆在Go团队面前的一个事情就是如何让我们内部私有go module代理适配gitlab。

如果你还不清楚我们搭建私有Go module代理的原理，那么在进一步往下阅读前，请先阅读一下《小厂内部私有Go module拉取方案》。

适配gitlab

回顾一下我们的私有Go module代理的原理图：

基于这张原理图，我们分析后得出结论：要适配gitlab仓库，其实很简单，只需修改govanityurls的配置文件中的各个module的真实repo地址即可，这也符合更换一个后端代码仓库服务理论上开发人员无感的原则。

下面我们在gitlab上创建一个foo repo，其对应的module path为mycompany.com/go/foo。我们使用ssh方式拉取gitlab repo，先将goproxy所在主机的公钥添加到gitlab ssh key中。然后将gitlab clone按钮提示框中给出的clone地址：git@10.10.30.30:go/foo.git填到vanity.yaml文件中：

//vanity.yaml
  ... ...
  /go/foo:
     repo: ssh://git@10.10.30.30:go/foo.git
     vcs: git

我门在一台开发机上建立测试程序，该程序导入mycompany.com/go/foo，执行go mod tidy命令的结果如下：

$go mod tidy
go: finding module for package mycompany.com/go/foo
demo imports
    mycompany.com/go/foo: cannot find module providing package mycompany.com/go/foo: module mycompany.com/go/foo: reading http://10.10.20.20:10000/mycompany.com/go/foo/@v/list: 404 Not Found
    server response:
    go list -m -json -versions mycompany.com/go/foo@latest:
    go: mycompany.com/go/foo@latest: unrecognized import path "mycompany.com/go/foo": http://mycompany.com/go/foo?go-get=1: invalid repo root "ssh://git@10.10.30.30:go/foo.git": parse "ssh://git@10.10.30.30:go/foo.git": invalid port ":go" after host

从goproxy返回的response内容来看，似乎是goproxy使用的go命令无法识别：”ssh://git@10.10.30.30:go/foo.git”，认为10.10.30.30后面的分号后面应该接一个端口，而不是go。

我们将repo换成下面这样的格式：

  /go/foo:
     repo: ssh://git@10.10.30.30:80/go/foo.git
     vcs: git

重启govanityurls并重新执行go mod tidy，依旧报错：

$go mod tidy
go: finding module for package mycompany.com/go/foo
demo imports
    mycompany.com/go/foo: cannot find module providing package mycompany.com/go/foo: module mycompany.com/go/foo: reading http://10.10.20.20:10000/mycompany.com/go/foo/@v/list: 404 Not Found
    server response:
    go list -m -json -versions mycompany.com/go/foo@latest:
    go: module mycompany.com/go/foo: git ls-remote -q origin in /root/.bin/goproxycache/pkg/mod/cache/vcs/4d37c02c151342112bd2d7e6cf9c0508b31b8fe1cf27063da6774aa0f53d872f: exit status 128:
        kex_exchange_identification: Connection closed by remote host
        fatal: Could not read from remote repository.

直接在主机上通过git clone git@10.10.30.30:80/go/foo.git也是报错的！ssh不行，我们再来试试http方式。使用http方式呢，每次clone都需要输入用户名密码，不适合goproxy。是时候让personal token上阵了！在gitlab上分配好personal token，然后在本地建立~/.netrc如下：

# cat ~/.netrc
machine 10.10.30.30
login tonybai
password [your personal token]

然后我们将vanity.yaml中的repo改为如下形式：

// vanity.yaml

  /go/foo:
     repo: http://10.10.30.30/go/foo.git
     vcs: git

这样再执行go mod tidy，foo仓库就被顺利拉取了下来。

答疑

1. git clone错误

在搭建goproxy时，我们通常会在goproxy服务器上手工验证一下是否可以通过git成功拉取私有仓库，如果git clone出现下面错误信息，是什么问题呢？

$ git clone ssh://tonybai@10.10.30.30:29418/go/common
Cloning into 'common'...
Unable to negotiate with 10.10.30.30 port 29418: no matching key exchange method found. Their offer: diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

这里的错误提示信息其实是很清楚明了的。git服务器端支持diffie-hellman-group1-sha1和diffie-hellman-group14-sha1这两种密钥交换方法，而git客户端却默认一个都不支持。

怎么解决呢？我们需要在goproxy所在主机增加一个配置.ssh/config：

// ~/.ssh/config
Host 10.10.30.30
    HostName 10.10.30.30
    User tonybai
    Port 29418
    KexAlgorithms +diffie-hellman-group1-sha1

    IdentityFile ~/.ssh/id_rsa

有了这条配置后，我们就可以成功clone。

2. 使用非安全连接

有些童鞋使用这个方案后会遇到下面问题：

$go get mycompany.com/go/common@latest
go: module mycompany.com/go/common: reading http://10.10.30.30:10000/mycompany.com/go/common/@v/list: 404 Not Found
    server response:
    go list -m -json -versions mycompany.com/go/common@latest:
    go list -m: mycompany.com/go/common@latest: unrecognized import path "mycompany.com/go/common": https fetch: Get "https://mycompany.com/go/common?go-get=1": dial tcp 127.0.0.1:443: connect: connection refused

首先，go get得到的服务端响应信息中提示：无法连接127.0.0.1:443，查看goproxy主机的nginx access.log，也无日志。说明goproxy没有发起请求。也就是说问题出在go list命令这块，它为什么要去连127.0.0.1:443？我们的代码服务器使用的可是http而非https方式访问。

这让我想起了Go 1.14中增加的GOINSECURE，go命令默认采用的是secure方式，即https去访问代码仓库的。如果不要求非得以https获取module，或者即便使用https，也不再对server证书进行校验，那么需要设置GOINSECURE环境变量，比如；

export GOINSECURE="mycompany.com"

这样再获取mycompany.com/…下面的go module时，就不会出现上面的错误了！

Gopher Daily(Gopher每日新闻)归档仓库 – https://github.com/bigwhite/gopherdaily

我的联系方式：

微博：https://weibo.com/bigwhite20xx
博客：tonybai.com
github: https://github.com/bigwhite

商务合作方式：撰稿、出书、培训、在线课程、合伙创业、咨询、广告合作。

使用Docker Compose构建一键启动的运行环境

bigwhite — Fri, 26 Nov 2021 13:08:10 +0000

本文永久链接 – https://tonybai.com/2021/11/26/build-all-in-one-runtime-environment-with-docker-compose

如今，不管你是否喜欢，不管你是否承认，微服务架构模式的流行就摆在那里。作为架构师的你，如果再将系统设计成个大单体结构，那么即便不懂技术的领导，都会给你送上几次白眼。好吧，妥协了！开拆！“没吃过猪肉，还没见过猪跑吗！”。拆不出40-50个服务，我就不信还拆不出4-5个服务^_^。

终于拆出了几个服务，但又犯难了：以前单体程序，搭建一个运行环境十分easy，程序往一个主机上一扔，配置配置，启动就ok了；但自从拆成服务后，开发人员的调试环境、集成环境、测试环境等搭建就变得异常困难。

有人会说，现在都云原生了？你不知道云原生操作系统k8s的存在么？让运维帮你在k8s上整环境啊。一般小厂，运维人员不多且很忙，开发人员只能“自力更生，丰衣足食”。开发人员自己整k8s？别扯了！没看到这两年k8s变得越来越复杂了吗！如果有一年不紧跟k8s的演进，新版本中的概念你就可能很陌生，不知源自何方。一般开发人员根本搞不定(如果你想搞定，可以看看我的k8s实战课程哦，包教包会^_^)。

那怎么办呢？角落里曾经的没落云原生贵族docker发话了：要不让我兄弟试试！

1. docker compose

docker虽然成了“过气网红”，但docker依然是容器界的主流。至少对于非docker界的开发人员来说，一提到容器，大家首先想到的还是docker。

docker公司的产品推出不少，开发人员对多数都不买账也是现实，但我们也不能一棒子打死，毕竟docker是可用的，还有一个可用的，那就是docker的兄弟：docker compose。

Compose是一个用于定义和运行多容器Docker应用程序的工具。使用Compose，我们可以使用一个YAML文件来配置应用程序的所有服务组件。然后，只需一条命令，我们就可以创建并启动配置中的所有服务。

这不正是我们想要的工具么! Compose与k8s很像，都算是容器编排工具，最大的不同：Compose更适合在单节点上的调试或集成环境中（虽然也支持跨主机，基于被淘汰的docker swarm)。Compose可以大幅提升开发人员以及测试人员搭建应用运行环境的效率。

2. 选版本

使用docker compose搭建运行环境，我们仅需一个yml文件。但docker compose工具也经历了多年演化，这个文件的语法规范也有多个版本，截至目前，docker compose的配置文件的语法版本就有2、2.x和3.x三种。并且不同规范版本支持的docker引擎版本还不同，这个对应关系如下图。图来自docker compose文件规范页面：

选版本是最闹心的。选哪个呢？设定两个条件：

docker引擎版本怎么也得是17.xx
规范版本怎么也得是3.x吧

这样一来，版本3.2是最低要求的了。我们就选3.2：

// docker-compose.yml
version: "3.2"

3. 选网络

docker compose默认会为docker-compose.yml中的各个service创建一个bridge网络，所有service在这个网络里可以相互访问。以下面docker-compose.yml为例：

// demo1/docker-compose.yml
version: "3.2"
services:
  srv1:
    image: nginx:latest
    container_name: srv1
  srv2:
    image: nginx:latest
    container_name: srv2

启动这个yml中的服务：

# docker-compose -f docker-compose.yml up -d
Creating network "demo1_default" with the default driver
... ...

docker compose会为这组容器创建一个名为demo1_default的桥接网络:

# docker network ls
NETWORK ID          NAME                     DRIVER              SCOPE
f9a6ac1af020        bridge                   bridge              local
7099c68b39ec        demo1_default            bridge              local
... ...

关于demo1_default网络的细节，可以通过docker network inspect 7099c68b39ec获得。

对于这样的网络中的服务，我们在外部是无法访问的。如果要访问其中服务，我们需要对其中的服务做端口映射，比如如果我们要将srv1暴露到外部，我们可以将srv1监听的服务端口80映射到主机上的某个端口，这里用8080，修改后的docker-compose.yml如下：

version: "3.2"
services:
  srv1:
    image: nginx:latest
    container_name: srv1
    ports:
    - "8080:80"
  srv2:
    image: nginx:latest
    container_name: srv2

这样启动该组容器后，我们通过curl localhost:8080就可以访问到容器中的srv1服务。不过这种情况下，服务间的相互发现比较麻烦，要么借助于外部的发现服务，要么通过容器间的link来做。

开发人员大多只有一个环境，不同服务的服务端口亦不相同，让容器使用host网络要比单独创建一个bridge网络来的更加方便。通过network_mode我们可以指定服务使用host网络，就像下面这样：

version: "3.2"
services:
  srv1:
    image: bigwhite/srv1:1.0.0
    container_name: srv1
    network_mode: "host"

在host网络下，容器监听的端口就是主机上的端口，各个服务间通过端口区别各个服务实例(前提是端口各不相同)，ip使用localhost即可。

使用host网络还有一个好处，那就是我们在该环境之外的主机上访问环境中的服务也十分方便，比如查看prometheus的面板等。

4. 依赖的中间件先启动，预置配置次之

如今的微服务架构系统，除了自身实现的服务外，外围还有大量其依赖的中间件，比如：redis、kafka(mq)、nacos/etcd(服务发现与注册）、prometheus(时序度量数据服务)、mysql(关系型数据库)、jaeger server(trace服务器)、elastic(日志中心)、pyroscope-server(持续profiling服务)等。

这些中间件若没有启动成功，我们自己的服务多半启动都要失败，因此我们要保证这些中间件服务都启动成功后，再来启动我们自己的服务。

如何做呢？compose规范中有一个迷惑人的“depends_on”，比如下面配置文件中srv1依赖redis和nacos两个service：

version: "3.2"
services:
  srv1:
    image: bigwhite/srv1:1.0.0
    container_name: srv1
    network_mode: "host"
    depends_on:
      - "redis"
      - "nacos"
    environment:
      - NACOS_SERVICE_ADDR=127.0.0.1:8848
      - REDIS_SERVICE_ADDR=127.0.0.1:6379
    restart: on-failure

不深入了解，很多人会认为depends_on可以保证先启动依赖项redis和nacos，并等依赖项ready后再启动我们自己的服务srv1。但实际上，depends_on仅能保证先启动依赖项，后启动我们的服务。但它不会探测依赖项redis或nacos是否ready，也不会等依赖项ready后，才启动我们的服务。于是你会看到srv1启动后依旧出现各种的报错，包括无法与redis、nacos建立连接等。

要想真正实现依赖项ready后才启动我们自己的服务，我们需要借助外部工具了，docker compose文档对此有说明。其中一个方法是使用wait-for-it脚本。

我们可以改变一下自由服务的容器镜像，将其entrypoint从执行服务的可执行文件变为执行一个start.sh的脚本：

// Dockerfile
... ...
ENTRYPOINT ["/bin/bash", "./start.sh"]

这样我们就可以在start.sh脚本中“定制”我们的启动逻辑了。下面是一个start.sh脚本的示例：

#! /bin/sh

./wait_for_it.sh $NACOS_SERVICE_ADDR -t 60 --strict -- echo "nacos is up" && \
./wait_for_it.sh $REDIS_SERVICE_ADDR -- echo "redis is up" && \
exec ./srv1

我们看到，在start.sh脚本中，我们使用wait_for_it.sh脚本等待nacos和redis启动，如果在限定时间内等待失败，根据restart策略，我们的服务还会被docker compose重新拉起，直到nacos与redis都ready，我们的服务才会真正开始执行启动过程。

在exec ./srv1之前，很多时候我们还需要进行一些配置初始化操作，比如向nacos中写入预置的srv1服务的配置文件内容以保证srv1启动后能从nacos中读取到自己的配置文件，下面是加了配置初始化的start.sh：

#! /bin/sh

./wait_for_it.sh $NACOS_SERVICE_ADDR -t 60 --strict -- echo "nacos is up" && \
./wait_for_it.sh $REDIS_SERVICE_ADDR -- echo "redis is up" && \
curl -X POST --header 'Content-Type: application/x-www-form-urlencoded' -d dataId=srv1.yml --data-urlencode content@./conf/srv1.yml "http://127.0.0.1:8848/nacos/v1/cs/configs?group=MY_GROUP" && \
exec ./srv1

我们通过curl将打入镜像的./conf/srv1.yml配置写入已经启动了的nacos中供后续srv1启动时读取。

5. 全家桶，一应俱全

就像前面提到的，如今的系统对外部的中间件“依存度”很高，好在主流中间件都提供了基于docker启动的官方支持。这样我们的开发环境也可以是一个一应俱全的“全家桶”。不过要有一个很容易满足的前提：你的机器配置足够高，才能把这些中间件全部运行起来。

有了这些全家桶，我们无论是诊断问题(看log、看trace、看度量数据），还是作性能优化（看持续profiling的数据），都方便的不要不要的。

6. 结合Makefile，简化命令行输入

docker-compose这个工具有一个“严重缺陷”，那就是名字太长^_^。这导致我们每次操作都要敲入很多命令字符，当你使用的compose配置文件名字不为docker-compose.yml时，更是如此，我们还需要通过-f选项指定配置文件路径。

为了简化命令行输入，减少键盘敲击次数，我们可以将复杂的docker-compose命令与Makefile相结合，通过定制命令行命令并将其赋予简单的make target名字来实现这一简化目标，比如：

// Makefile

pull:
    docker-compose -f my-docker-compose.yml pull

pull-my-system:
    docker-compose -f my-docker-compose.yml pull srv1 srv2 srv3

up: pull-my-system
    docker-compose -f my-docker-compose.yml up

upd: pull-my-system
    docker-compose -f my-docker-compose.yml up -d

up2log: pull-my-system
    docker-compose -f my-docker-compose.yml up > up.log 2>&1

down:
    docker-compose -f my-docker-compose.yml down

ps:
    docker-compose -f my-docker-compose.yml ps -a

log:
    docker-compose -f my-docker-compose.yml logs -f

# usage example: make upsrv service=srv1
service=
upsrv:
    docker-compose -f my-docker-compose.yml up -d ${service}

config:
    docker-compose -f my-docker-compose.yml config

另外服务依赖的中间件一般都时启动与运行开销较大的系统，每次和我们的服务一起启停十分浪费时间，我们可以将这些依赖与我们的服务分别放在不同的compose配置文件中管理，这样我们每次重启自己的服务时，没有必要重新启动这些依赖，这样可以节省大量“等待”时间。

7. .env文件

有些时候，我们需要在compose的配置文件中放置一些“变量”，我们通常使用环境变量来实现“变量”的功能，比如：我们将srv1的镜像版本改为一个环境变量：

version: "3.2"
services:
  srv1:
    image: bigwhite/srv1:${SRV1_VER}
    container_name: srv1
    network_mode: "host"
  ... ...

docker compose支持通过同路径下的.env文件的方式docker-compose.yml中环境变量的值，比如：

// .env
SRV1_VER=dev

这样docker compose在启动srv1时会将.env中SRV1_VER的值读取出来并替换掉compose配置文件中的相应环境变量。通过这种方式，我们可以灵活的修改我们使用的镜像版本。

8. 优点与不足

使用docker compose工具，我们可以轻松拥有并快速启动一个all-in-one的运行环境，大幅度加速了部署、调试与测试的效率，在特定的工程环节，它可以给予开发与测试人员很大帮助。

不过这样的运行环境也有一些不足，比如：

对部署的机器/虚拟机配置要求较高；
这样的运行环境有局限，用在功能测试、持续集成、验收测试的场景下可以，但不能用来执行压测或者说即便压测也只是摸底，数据不算数的，因为所有服务放在一起，相互干扰；
服务或中间件多了以后，完全启动一次也要耐心等待一段时间。

“Gopher部落”知识星球正式转正（从试运营星球变成了正式星球）！“gopher部落”旨在打造一个精品Go学习和进阶社群！高品质首发Go技术文章，“三天”首发阅读权，每年两期Go语言发展现状分析，每天提前1小时阅读到新鲜的Gopher日报，网课、技术专栏、图书内容前瞻，六小时内必答保证等满足你关于Go语言生态的所有需求！部落目前虽小，但持续力很强，欢迎大家加入！

Gopher Daily(Gopher每日新闻)归档仓库 – https://github.com/bigwhite/gopherdaily

我的联系方式：

微博：https://weibo.com/bigwhite20xx
微信公众号：iamtonybai
博客：tonybai.com
github: https://github.com/bigwhite
“Gopher部落”知识星球：https://public.zsxq.com/groups/51284458844544

微信赞赏：

商务合作方式：撰稿、出书、培训、在线课程、合伙创业、咨询、广告合作。

小厂内部私有Go module拉取方案

bigwhite — Fri, 03 Sep 2021 10:29:41 +0000

本文永久链接 – https://tonybai.com/2021/09/03/the-approach-to-go-get-private-go-module-in-house

1. 问题来由

Go 1.11版本引入Go module后，Go命令拉取依赖的公共go module不再是“痛点”。如下图所示：

图：从公司内部经由公共GOPROXY服务拉取公共go module

我们在公司/组织内部仅需要为环境变量GOPROXY配置一个公共GOPROXY服务即可轻松拉取所有公共go module(公共module即开源module)。

但随着公司内Go使用者增多以及Go项目的增多，“代码重复”问题就出现了。抽取公共代码放入一个独立的、可被复用的内部私有仓库成为必然。这样我们便有了拉取私有go module的需求！

一些公司或组织的所有代码都放在公共vcs托管服务商那里(比如github.com)，私有go module则直接放在对应的公共vcs服务的private repository(私有仓库)中。如果你的公司也是如此，那么拉取托管在公共vcs私有仓库中的私有go module也很容易，见下图：

图：从公司内部直接拉取托管在公共vcs服务上的私有go module

当然这个方案的一个前提是：每个开发人员都需要具有访问公共vcs服务上的私有go module仓库的权限，凭证的形式不限，可以是basic auth的user和password，也可以是personal access token(类似github那种)，只要按照公共vcs的身份认证要求提供即可。

但是如果私有go module放在公司内部的vcs服务器上，就像下面图中所示：

图：私有go module放在组织/公司内部的vcs服务器上

那么我们该如何让Go命令自动拉取内部服务器上的私有go module呢？

一些gopher会说：“这很简单啊! 这和拉取托管在公共vcs服务上的私有go module没有什么分别啊”。持这种观点的gopher多半来自大厂。大厂内部有完备的IT基础设施供开发使用，大厂内部的vcs服务器都可以通过域名访问(比如git.bat.com/user/repo)，因此大厂内部员工可以像访问公共vcs服务那样访问内部vcs服务器上的私有go module，就像下面图中所示：

图：大厂方案：直接拉取内部vcs仓库上的私有go module

我们看到：在上面这个方案中，公司搭建了一个内部goproxy服务(即上图中的in-house goproxy)，这样的目的一来是为那些无法直接访问外网的开发机器以及ci机器提供拉取外部go module的途径，二来由于in-house goproxy的cache的存在，还可以加速公共go module的拉取效率。对于私有go module，开发机将其配置到GOPRIVATE环境变量中，这样Go命令在拉取私有go module时不会再走GOPROXY，而会采用直接访问vcs(如上图中的git.bat.com)的方式拉取私有go module。

当然大厂还可能采用下图所示方案将外部go module与私有go module都交给内部统一的Goproxy服务去处理：

图：大厂方案: 统一代理方案

在这种方案中，开发者仅需要将GOPROXY配置为in-house goproxy便可以统一拉取外部go module与私有go module。但由于go命令默认会对所有通过goproxy拉取的go module进行sum校验（到sum.golang.org)，而我们的私有go module在公共sum验证server中没有数据记录，因此，开发者需要将私有go module填到GONOSUMDB环境变量中，这样go命令就不会对其进行sum校验了。不过这种方案有一处要注意：那就是in-house goproxy需要拥有对所有private module所在repo的访问权限，这样才能保证每个私有go module的拉取成功！

好了，问题来了！对于那些没有完备内部IT基础设施，还想将私有go module放在公司内部的vcs服务器上的小厂应该如何实现私有go module的拉取方案呢？

2. 可供小厂参考的一个解决方案

小厂虽小，但目标不能低。小厂虽然IT基础设施薄弱或不够灵活，但也不能因此给开发人员带去太多额外的“负担”。因此，对比了上面的两个大厂可能采用的方案，我们更倾向于后者。这样，我们就可以将所有复杂性都交给in-house goproxy这个节点，开发人员就可以做的足够简单。但小厂没有DNS，无法用域名…，我们该怎么实现这个方案呢？在这一节中，我们就实现这个方案。

0. 方案示例环境拓扑

我们先为后续的方案实现准备一个示例环境，其拓扑如下图：

1. 选择一个goproxy实现

Go module proxy协议规范发布后，Go社区出现了很多成熟的Goproxy开源实现。从最初的athens，再到国内的两个优秀的开源实现：goproxy.cn和goproxy.io。其中，goproxy.io在官方站点给出了企业内部部署的方法，基于这一点，我们就基于goproxy.io来实现我们的方案（其余的goproxy实现应该也都可以实现)。

我们在上图中的in-house goproxy节点上执行下面步骤安装goproxy：

$mkdir ~/.bin/goproxy
$cd ~/.bin/goproxy
$git clone https://github.com/goproxyio/goproxy.git
$cd goproxy
$make

编译后，会在当前的bin目录(~/.bin/goproxy/goproxy/bin)下看到名为goproxy的可执行文件。

建立goproxy cache目录：

$mkdir /root/.bin/goproxy/goproxy/bin/cache

启动goproxy：

$./goproxy -listen=0.0.0.0:8081 -cacheDir=/root/.bin/goproxy/goproxy/bin/cache -proxy https://goproxy.io
goproxy.io: ProxyHost https://goproxy.io

启动后goproxy在8081端口监听(即便不指定，goproxy的默认端口也是8081)，指定的上游goproxy服务为goproxy.io。

注意：goproxy的这个启动参数并不是最终版本的，这里仅仅想验证一下goproxy是否能按预期工作。

接下来，我们来验证一下goproxy的工作是否如我们预期。

我们在开发机上配置GOPROXY环境变量指向10.10.20.20:8081：

// .bashrc
export GOPROXY=http://10.10.20.20:8081

生效环境变量后，执行下面命令：

$go get github.com/pkg/errors

结果如预期，开发机顺利下载了github.com/pkg/errors包。

在goproxy侧，我们看到了下面日志：

goproxy.io: ------ --- /github.com/pkg/@v/list [proxy]
goproxy.io: ------ --- /github.com/pkg/errors/@v/list [proxy]
goproxy.io: ------ --- /github.com/@v/list [proxy]
goproxy.io: 0.146s 404 /github.com/@v/list
goproxy.io: 0.156s 404 /github.com/pkg/@v/list
goproxy.io: 0.157s 200 /github.com/pkg/errors/@v/list

并且在goproxy的cache目录下，我们也看到了下载并缓存的github.com/pkg/errors包：

$cd /root/.bin/goproxy/goproxy/bin/cache
$tree
.
└── pkg
    └── mod
        └── cache
            └── download
                └── github.com
                    └── pkg
                        └── errors
                            └── @v
                                └── list

8 directories, 1 file

2. 自定义包导入路径并将其映射到内部的vcs仓库

小厂可能没有为vcs服务器分配域名，我们也不能在Go私有包的导入路径中放入ip地址，因此我们需要给我们的私有go module自定义一个路径，比如：mycompany.com/go/module1。我们统一将私有go module放在mycompany.com/go下面的代码仓库中。

接下来的问题是，当goproxy去拉取mycompany.com/go/module1时，应该得到mycompany.com/go/module1对应的内部vcs上module1 仓库的地址，这样goproxy才能从内部vcs代码服务器上下载到module1对应的代码。

图：goproxy如何得到mycompany.com/go/module1所对应的vcs仓库地址呢？

其实方案不止一种。这里我们使用一个名为govanityurls的工具，这个工具在我以前的文章中曾提到过。

结合govanityurls和nginx，我们就可以将私有go module的导入路径映射为其在vcs上的代码仓库的真实地址。下面的图解释了具体原理：

首先，goproxy要想将收到的拉取私有go module(mycompany.com/go/module1)的请求不转发给公共代理，需要在其启动参数上做一些手脚，如下面修改后的goproxy启动命令：

$./goproxy -listen=0.0.0.0:8081 -cacheDir=/root/.bin/goproxy/goproxy/bin/cache -proxy https://goproxy.io -exclude "mycompany.com/go"

这样凡是与-exclude后面的值匹配的go module拉取请求，goproxy都不会转给goproxy.io，而是直接请求go module的“源站”。而上面图中要做的就是将这个“源站”的地址转换为企业内部vcs服务中的一个仓库地址。由于mycompany.com这个域名并不存在，从图中我们看到：我们在goproxy所在节点的/etc/hosts中加了这样一条记录：

127.0.0.1 mycompany.com

这样goproxy发出的到mycompany.com的请求实则是发向了本机。而上图中所示，监听本机80端口的正是nginx，nginx关于mycompany.com这一主机的配置如下：

// /etc/nginx/conf.d/gomodule.conf

server {
        listen 80;
        server_name mycompany.com;

        location /go {
                proxy_pass http://127.0.0.1:8080;
                proxy_redirect off;
                proxy_set_header Host $host;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

                proxy_http_version 1.1;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection "upgrade";
        }
}

我们看到对于路径为mycompany.com/go/xxx的请求，nginx将请求转发给了127.0.0.1:8080，而这个服务地址恰是govanityurls工具监听的地址。

govanityurls这个工具是前Go核心开发团队成员Jaana B.Dogan开源的一个工具，这个工具可以帮助gopher快速实现自定义Go包的go get导入路径。

govanityurls本身就好比一个“导航”服务器。当go命令向自定义包地址发起请求时，实则是将请求发送给了govanityurls服务，之后govanityurls将请求中的包所在仓库的真实地址(从vanity.yaml配置文件中读取)返回给go命令，后续go命令再从真实的仓库地址获取包数据。

注：govanityurls的安装方法很简单，直接go install/go get github.com/GoogleCloudPlatform/govanityurls即可。

在我们的示例中，vanity.yaml的配置如下：

host: mycompany.com

paths:
  /go/module1:
      repo: ssh://admin@10.10.30.30/module1
      vcs: git

也就是说当govanityurls收到nginx转发的请求后，会将请求与vanity.yaml中配置的module路径相匹配，如果匹配ok，则会将该module的真实repo地址通过go命令期望的应答格式予以返回。在这里我们看到，module1对应的真实vcs上的仓库地址为：ssh://admin@10.10.30.30/module1。

于是goproxy会收到这个地址，并再次向这个真实地址发起请求，并最终将module1缓存到本地cache并返回给客户端。

注意：由于这个方案与大厂的第二个方案是一样的，因此goproxy需要有访问mycompany.com/go下面所有go module对应的真实vcs仓库的权限。

3. 开发机(客户端)的设置

前面示例中，我们已经将开发机的GOPROXY环境变量设置为goproxy的服务地址。但我们说过凡是通过GOPROXY拉取的go module，go命令默认都会将其sum值到公共GOSUM服务器上去校验。但我们实质上拉取的是私有go module，GOSUM服务器上并没有我们的go module的sum数据。这样会导致go build命令报错，无法继续构建过程。

因此，开发机客户端还需将mycompany.com/go作为一个值设置到GONOSUMDB环境变量中，这就告诉go命令，凡是与mycompany.com/go匹配的go module，都无需做sum校验了。

4. 方案的“不足”

当然上述方案也不是完美的，它也有自己的不足的地方：

开发者还是需要额外配置GONOSUMDB变量

由于Go命令默认会对从GOPROXY拉取的go module进行sum校验，因此我们需要将私有go module配置到GONOSUMDB环境变量中，这给开发者带来了一个小小的“负担”。

缓解措施：小厂可以将私有go项目都放在一个特定域名下，这样就无需为每个go私有项目单独增加GONOSUMDB配置了，只需要配置一次即可。

新增私有go module，vanity.yaml需要手工同步更新

这个是这个方案最不灵活的地方了，由于目前govanityurls功能有限，我们针对每个私有go module可能都需要单独配置其对应的vcs仓库地址以及获取方式(git, svn or hg)。

缓解方案：在一个vcs仓库中管理多个私有go module，就像etcd那样。相比于最初go官方建议的一个repo只管理一个module，新版本的go在一个repo管理多个go module方面已经有了长足的进步。

不过对于小厂来说，这点额外工作与得到的收益相比，应该也不算什么！^_^

无法划分权限

在上面的方案说明时也提到过，goproxy所在节点需要具备访问所有私有go module所在vcs repo的权限，但又无法对go开发者端做出有差别授权，这样只要是goproxy能拉取到的私有go module，go开发者都能拉取到。

不过对于多数小厂而言，内部所有源码原则上都是企业内部公开的，这个问题似乎也不大。如果觉得这是个问题，那么只能使用上面的大厂的第一个方案了。

3. 小结

无论大厂小厂，当对Go的使用逐渐深入后，接纳的人增多，开发的项目增多且越来越复杂后，拉取私有go module这样的问题肯定会摆到桌面上来。

对于大厂的gopher来说，这可能不是问题，甚至对他们都是透明的。但对于小厂等内部IT基础设施不完备的组织而言，的确需要自己动手解决。

这篇文章为小厂搭建Go私有库以及从私有库拉取私有go module提供了一个思路以及一个参考实现。

如果觉得上面的安装配置步骤有些繁琐，有兴趣深入的朋友可以将上述几个程序(goproxy, nginx, govanityurls)打到一个容器镜像中，实现一键安装设置。

Go技术书籍的书摘和读书体会系列
Go与eBPF系列

欢迎大家加入！

Go技术专栏“改善Go语⾔编程质量的50个有效实践”正在慕课网火热热销中！本专栏主要满足广大gopher关于Go语言进阶的需求，围绕如何写出地道且高质量Go代码给出50条有效实践建议，上线后收到一致好评！欢迎大家订
阅！

我的网课“Kubernetes实战：高可用集群搭建、配置、运维与应用”在慕课网热卖中，欢迎小伙伴们订阅学习！

Gopher Daily(Gopher每日新闻)归档仓库 – https://github.com/bigwhite/gopherdaily

我的联系方式：

微博：https://weibo.com/bigwhite20xx
微信公众号：iamtonybai
博客：tonybai.com
github: https://github.com/bigwhite
“Gopher部落”知识星球：https://public.zsxq.com/groups/51284458844544

微信赞赏：

商务合作方式：撰稿、出书、培训、在线课程、合伙创业、咨询、广告合作。

使用minio搭建高性能对象存储-第一部分：原型

bigwhite — Mon, 16 Mar 2020 08:43:43 +0000

近期参与了一个项目，该项目有存储大量图片、短视频、音频等非结构化数据的需求。于是我优先在Go社区寻找能满足这类需求的开源项目，minio就这样进入了我的视野。

图：minio logo

其实三年前我就知道了minio，并还下载玩(研)耍(究)了一番，但那时minio的成熟程度与今天相比还是相差较远的(当时需求简单，于是选择了较为熟悉的weedfs)。而如今的minio在github上收获了广泛的关注，小星星也是蛮多的(20k+ star)。它不仅被Go社区使用，在其他语言社区也有着广泛应用。我可以不负责任的说：在对象存储领域，minio大有kafka(java技术栈)在消息队列领域舍我其谁的气概:)。

2019年gopherchina大会上，探探工程师分享了“基于MINIO的对象存储方案在探探的实践”。虽然探探目前是否在生产中使用minio暂不得而知，但这又一次证明了minio在对象存储领域的强大影响力。

图：探探工程师在gopherchina2019大会上分享minio实践

minio出品自一个有着多年网络文件系统开发经验的团队，其初始创始团队都来自于原Glusterfs团队，该团队二次创业的产品minio的设计广泛吸取了glusterfs的经验和教训：

部署简单：一个single二进制文件即是一切，还可支持各种平台。（托了go语言的福）
minio支持海量存储，可按zone扩展(原zone不受任何影响)，支持单个对象最大5TB；
兼容Amazon S3接口，充分考虑开发人员的需求和体验；
低冗余且磁盘损坏高容忍，标准且最高的数据冗余系数为2（即存储一个1M的数据对象，实际占用磁盘空间为2M）。但在任意n/2块disk损坏的情况下依然可以读出数据(n为一个纠删码集合(Erasure Coding Set)中的disk数量)。并且这种损坏恢复是基于单个对象的，而不是基于整个存储卷的。
读写性能优异

图：来自minio技术白皮书中的benchmark数据

鉴于上述minio的“优点”，我打算在这个项目中基于minio实现非结构化数据的对象存储方案。本篇文章将介绍方案的原型设计与初始minio验证环境搭建。

一. 原型方案

基于minio的非结构化数据对象存储方案都大同小异，下面的图示就是根据我们的需求简单设计的原型方案：

图：原型方案

我们基于minio提供的distributed mode，将位于多个host上的多块磁盘组成一个逻辑存储池，通过运行于不同host上的minio server实现一个高可用的对象存储方案；
数据通过一个独立的上传服务(基于minio提供的sdk与minio集群通信)写入minio；
通过minio的mc工具创建bucket，并将bucket的policy设置为”download”，以允许外部用户直接与minio通信，获取对象数据。中间不再设置除lb之外的中间层；
通过job或定时任务利用mc工具统一对minio中的数据进行维护，比如定期删除7天前的数据(如果数据默认过期时间设定为7天)。

二. minio server启动模式

minio支持多种server启动模式：

图：minio server启动模式

minio server的standalone模式，即要管理的磁盘都在host本地。该启动模式一般仅用于实验环境、测试环境的验证和学习使用。在standalone模式下，还可以分为non-erasure code mode和erasure code mode。

所谓non-erasure code mode，即minio server启动时仅传入一个本地磁盘目录参数：比如：

$minio server data

Endpoint:  http://10.10.126.88:9000  http://127.0.0.1:9000
AccessKey: minioadmin
SecretKey: minioadmin

Browser Access:
   http://10.10.126.88:9000  http://127.0.0.1:9000           

Command-line Access: https://docs.min.io/docs/minio-client-quickstart-guide
   $ mc config host add myminio http://10.10.126.88:9000 minioadmin minioadmin

... ...

在这样的启动模式下，对于每一份对象数据，minio直接在data下面存储这份数据，不会建立副本，也不会启用纠删码机制。因此，这种模式无论是服务实例还是磁盘都是“单点”，无任何高可用保障，磁盘损坏就表示数据丢失。

同样在单minio server的情况下，erasure code mode即为minio server实例传入多个本地磁盘参数。一旦遇到多于一个磁盘参数，minio server会自动启用erasure code mode。erasure code对磁盘的个数是有要求的，如不满足要求，实例启动将失败：

$minio server data1 data2
ERROR Invalid command line arguments: Incorrect number of endpoints provided [data1 data2]
      > Please provide an even number of endpoints greater or equal to 4
      HINT:
        For more information, please refer to https://docs.min.io/docs/minio-erasure-code-quickstart-guide

erasure code启用后，要求传给minio server的endpoint(standalone模式下，即本地磁盘上的目录)至少为4个。minio server启用纠删码机制后，会自动将传入的disk drive划分为多个erasure coding set，每个erasure coding set中的disk drive的数量可以是：4, 6, 8, 10, 12, 14 和16。minio server会根据传入disk drive的数量自动计算set个数和每个set中的disk drive数量。比如下面例子中，我们传入四个endpoint(disk drive)给minio server：

$minio server data1 data2 data3 data4

Formatting 1 zone, 1 set(s), 4 drives per set.
WARNING: Host local has more than 2 drives of set. A host failure will result in data becoming unavailable.
Status:         4 Online, 0 Offline.
Endpoint:  http://10.10.126.88:9000  http://127.0.0.1:9000
AccessKey: minioadmin
SecretKey: minioadmin

Browser Access:
   http://10.10.126.88:9000  http://127.0.0.1:9000           

Command-line Access: https://docs.min.io/docs/minio-client-quickstart-guide
   $ mc config host add myminio http://10.10.126.88:9000 minioadmin minioadmin

... ...

从minio server的输出日志来看，minio server将这些drive放入了一个erasure coding set了。在输出日志中，我们还看到一行WARNING: Host local has more than 2 drives of set. A host failure will result in data becoming unavailable.，即minio server警告我们：这个erasure coding set中有多于两个的drive都在local host上，这样一旦host宕机，那么数据将无法获取。(每个set 有4个drive，根据纠删码的机制，这个set的最大允许失效的disk数量为4/2=2)。

我们再来看minio server启动的一个“语法糖” – “省略号”语法：

$minio server data{1...18}

Formatting 1 zone, 3 set(s), 6 drives per set.
WARNING: Host local has more than 3 drives of set. A host failure will result in data becoming unavailable.
WARNING: Host local has more than 3 drives of set. A host failure will result in data becoming unavailable.
WARNING: Host local has more than 3 drives of set. A host failure will result in data becoming unavailable.
Status:         18 Online, 0 Offline.
Endpoint:  http://10.10.126.88:9000  http://127.0.0.1:9000
AccessKey: minioadmin
SecretKey: minioadmin

Browser Access:
   http://10.10.126.88:9000  http://127.0.0.1:9000           

Command-line Access: https://docs.min.io/docs/minio-client-quickstart-guide
   $ mc config host add myminio http://10.10.126.88:9000 minioadmin minioadmin

... ...

minio server data{1...18}等价于minio server data1 data2 data3 data4 data5 data6 data7 data8 data9 data10 data11 data 12 data13 data14 data15 data16 data17 data18。minio server会自行扩展省略号代表的内容。我们看到：当我们传入18个disk drive后，minio server创建了3个erasure coding set，每个set中有6个disk drive。同样，minio server还针对每个set输出了一行WARNING：每个Set中有三个以上的disk drive都位于同一台host上。

这些WARNING我们可以通过distributed mode来解决。顾名思义，distributed mode下，minio server实例和其管理的disk drive分布在多台host上，这种模式可以避免minio server实例单点，数据也将分布在不同host上的不同disk中，实现了高可用，提升了整体的容灾能力。由于处理多个host上的disk，distribute mode默认就会启动erasure coding set机制。

在distributed mode下，minio server后面的远程的endpoint采用http url编码格式：

export MINIO_ACCESS_KEY=
export MINIO_SECRET_KEY=
$minio server http://host{1...4}:9000/minio/data{1...4}

上面例子中的minio server命令相当于4个host，每个host上启动一个minio server实例，每个实例都管理16的disk drive(包括本地和远程的)。上述命令等价于：

$minio server http://host1:9000/minio/data1 http://host1:9000/minio/data2 http://host1:9000/minio/data3 http://host1:9000/minio/data4 http://host2:9000/minio/data1 http://host2:9000/minio/data2 http://host2:9000/minio/data3 http://host2:9000/minio/data4 http://host3:9000/minio/data1 http://host3:9000/minio/data2 http://host3:9000/minio/data3 http://host3:9000/minio/data4 http://host4:9000/minio/data1 http://host4:9000/minio/data2 http://host4:9000/minio/data3 http://host4:9000/minio/data4

minio同样会自动将这些disk drive划分为若干个erasure coding set。每个endpoint用http://address/disk-drive-path的形式编码。注意：这条命令在host1、host2、host3和host4上都要执行。

minio有一个zone的概念，比如下面这个例子：

$minio server data{1...8} data{9...16}

Formatting 1 zone, 1 set(s), 8 drives per set.
WARNING: Host local has more than 4 drives of set. A host failure will result in data becoming unavailable.
Formatting 2 zone, 1 set(s), 8 drives per set.
WARNING: Host local has more than 4 drives of set. A host failure will result in data becoming unavailable.
Status:         16 Online, 0 Offline.
Endpoint:  http://10.10.126.88:9000  http://127.0.0.1:9000
AccessKey: minioadmin
SecretKey: minioadmin

Browser Access:
   http://10.10.126.88:9000  http://127.0.0.1:9000           

Command-line Access: https://docs.min.io/docs/minio-client-quickstart-guide
   $ mc config host add myminio http://10.10.126.88:9000 minioadmin minioadmin

... ...

我们在命令行中给minio server传入两组采用“省略号”语法的参数，minio认为每组就是一个“zone”，这里有两组，因此minio创建了两个zone。在每个zone内，minio创建了一个erasure coding set，每个set中有8个disk drive。对于外部的写数据请求，minio server会首先查找可用空间多的zone，然后再在zone内选择set和disk drive。

如果不用“省略号”语法，那么minio server会将后面传入的所有disk drive放入一个zone中。

三. 原型验证环境搭建与配置

1. 单机上部署distributed minio集群

我们的验证环境采用最小的distributed minio模式：单机、one zone, one erasure coding set, 4 disk drive。下面是部署的示意图：

图：单机上部署distributed minio集群

我们没有使用“省略号”语法，在单机上不是很好模拟。我们通过下面脚本来启动该minio集群：

# cat startup_minio.sh
#!/bin/bash

export MINIO_ACCESS_KEY="minio"
export MINIO_SECRET_KEY="minio123"

for i in {01..04}; do
    nohup minio server --address ":90${i}" http://127.0.0.1:9001/root/minio-install/data1 http://127.0.0.1:9002/root/minio-install/data2  http://127.0.0.1:9003/root/minio-install/data3 http://127.0.0.1:9004/root/minio-install/data4 > "/root/minio-install/90${i}.log"& 2>&1
done

启动该minio集群，并查看启动状态：

# bash startup_minio.sh

# ps -ef|grep minio

root      1218     1 11 21:58 pts/5    00:00:01 minio server --address :9001 http://127.0.0.1:9001/root/minio-install/data1 http://127.0.0.1:9002/root/minio-install/data2 http://127.0.0.1:9003/root/minio-install/data3 http://127.0.0.1:9004/root/minio-install/data4
root      1219     1 11 21:58 pts/5    00:00:01 minio server --address :9002 http://127.0.0.1:9001/root/minio-install/data1 http://127.0.0.1:9002/root/minio-install/data2 http://127.0.0.1:9003/root/minio-install/data3 http://127.0.0.1:9004/root/minio-install/data4
root      1220     1  3 21:58 pts/5    00:00:00 minio server --address :9003 http://127.0.0.1:9001/root/minio-install/data1 http://127.0.0.1:9002/root/minio-install/data2 http://127.0.0.1:9003/root/minio-install/data3 http://127.0.0.1:9004/root/minio-install/data4
root      1221     1 11 21:58 pts/5    00:00:01 minio server --address :9004 http://127.0.0.1:9001/root/minio-install/data1 http://127.0.0.1:9002/root/minio-install/data2 http://127.0.0.1:9003/root/minio-install/data3 http://127.0.0.1:9004/root/minio-install/data4

root@instance-cspzrq3u:~/minio-install# ls
9001.log  9002.log  9003.log  9004.log  data1  data2  data3  data4  startup_minio.sh
root@instance-cspzrq3u:~/minio-install# tail -100f 9001.log

Formatting 1 zone, 1 set(s), 4 drives per set.
Attempting encryption of all config, IAM users and policies on MinIO backend
Status:         4 Online, 0 Offline.
Endpoint:  http://192.168.16.4:9001  http://172.17.0.1:9001  http://172.18.0.1:9001  http://127.0.0.1:9001       

Browser Access:
   http://192.168.16.4:9001  http://172.17.0.1:9001  http://172.18.0.1:9001  http://127.0.0.1:9001       

.... ...

2. mc配置与管理

minio官方提供了mc命令行工具，用于对minio server进行管理。我们首先要为mc创建一个管理本地minio server(:9001)的配置：

# mc config host add myminio http://localhost:9001 minio minio123
Added `myminio` successfully.

这里我们使用mc添加了一个所谓”host”，指向上面创建的minio server(:9001)。上面的命令实质上是在~/.mc/config.json中写入了如下配置：

# cat ~/.mc/config.json
{
    "version": "9",
    "hosts": {
        "myminio": {
            "url": "http://localhost:9001",
            "accessKey": "minio",
            "secretKey": "minio123",
            "api": "s3v4",
            "lookup": "auto"
        }
    }
}

接下来，我们通过mc命令在minio集群中添加三个bucket：

root@instance-cspzrq3u:~# mc mb myminio/image
Bucket created successfully `myminio/image`.
root@instance-cspzrq3u:~# mc mb myminio/video
Bucket created successfully `myminio/video`.
root@instance-cspzrq3u:~# mc mb myminio/audio
Bucket created successfully `myminio/audio`.
root@instance-cspzrq3u:~# mc ls myminio
[2020-03-16 15:19:55 CST]      0B audio/
[2020-03-16 15:19:48 CST]      0B image/
[2020-03-16 15:19:52 CST]      0B video/

新创建的bucket默认的访问policy是none，即外部无访问权限：

root@instance-cspzrq3u:~# mc policy get myminio/image
Access permission for `myminio/image` is `none`

根据我们的设计，我们需要给这三个bucket添加外部可读取权限，以image这个bucket为例：

root@instance-cspzrq3u:~# mc policy set download myminio/image
Access permission for `myminio/image` is set to `download`
root@instance-cspzrq3u:~# mc policy get myminio/image
Access permission for `myminio/image` is `download`

3. load balancer设置

这里我们使用一个nginx前置在minio集群外部，下面是为minio创建的nginx配置文件(/etc/nginx/conf.d/minio.conf)：

// /etc/nginx/conf.d/minio.conf

 upstream minio_cluster {
    server localhost:9001;
    server localhost:9002;
    server localhost:9003;
    server localhost:9004;
 }

server {
 listen 9000;
 server_name myminio.tonybai.com;

 # To allow special characters in headers
 ignore_invalid_headers off;
 # Allow any size file to be uploaded.
 # Set to a value such as 1000m; to restrict file size to a specific value
 client_max_body_size 0;
 # To disable buffering
 proxy_buffering off;

location / {

   proxy_set_header X-Real-IP $remote_addr;
   proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
   proxy_set_header X-Forwarded-Proto $scheme;
   proxy_set_header Host $http_host;

   proxy_connect_timeout 300;
   # Default is HTTP/1, keepalive is only enabled in HTTP/1.1
   proxy_http_version 1.1;
   proxy_set_header Connection "";
   chunked_transfer_encoding off;

   proxy_pass http://minio_cluster;
}

location /image/ {
   proxy_set_header X-Real-IP $remote_addr;
   proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
   proxy_set_header X-Forwarded-Proto $scheme;
   proxy_set_header Host $http_host;

   proxy_connect_timeout 300;
   # Default is HTTP/1, keepalive is only enabled in HTTP/1.1
   proxy_http_version 1.1;
   proxy_set_header Connection "";
   chunked_transfer_encoding off;
   client_max_body_size 1000m;
   proxy_buffering off;

   proxy_pass http://minio_cluster;
 }
}

重启nginx（nginx -s reload)。

我们使用浏览器访问一下http://myminio.tonybai.com:9000/，登录后，你将看到如下页面：

图：浏览器访问minio web

选择左侧的”image” bucket，点击右下角的”+”号，我们可以上传一张图片：gopher-daily-logo.png，上传后，我们退出登录。然后通过地址http://myminio.tonybai.com:9000/image/gopher-daily-logo.png访问该图片。你也可以通过wget命令下载该图片：

$wget -c http://myminio.tonybai.com:9000/image/gopher-daily-logo.png
--2020-03-16 15:40:20--  http://myminio.tonybai.com:9000/image/gopher-daily-logo.png
正在解析主机 myminio.tonybai.com (myminio.tonybai.com)... 106.12.69.83
正在连接 myminio.tonybai.com (myminio.tonybai.com)|106.12.69.83|:9000... 已连接。
已发出 HTTP 请求，正在等待回应... 200 OK
长度：59736 (58K) [image/png]
正在保存至: “gopher-daily-logo.png”

gopher-daily-logo.png        100%[============================================>]  58.34K   253KB/s  用时 0.2s   

2020-03-16 15:40:20 (253 KB/s) - 已保存 “gopher-daily-logo.png” [59736/59736])

4. 对象清除

我们的需求中，bucket中的数据对象的生命周期是7天，我们可以使用定时工具或一个job通过mc工具对这些过期对象进行清除，比如我们每隔5分钟执行一次下面的命令：

$mc rm --recursive --force --newer-than 7d myminio/image/

该命令将递归删除image bucket下早于7天前创建的数据对象。rm命令支持各种条件组合，具体可参考一下mc rm的manual。

四. 小结

至此，使用minio搭建高性能对象存储的第一步：原型算是顺利搭建ok了。相信在后续对minio的深入使用和了解后，会有更多关于minio的内容和大家分享。

我的网课“Kubernetes实战：高可用集群搭建、配置、运维与应用”在慕课网上线了，感谢小伙伴们学习支持！

我爱发短信：企业级短信平台定制开发专家 https://tonybai.com/
smspush : 可部署在企业内部的定制化短信平台，三网覆盖，不惧大并发接入，可定制扩展；短信内容你来定，不再受约束, 接口丰富，支持长短信，签名可选。

Gopher Daily(Gopher每日新闻)归档仓库 – https://github.com/bigwhite/gopherdaily

我的联系方式：

微博：https://weibo.com/bigwhite20xx
微信公众号：iamtonybai
博客：tonybai.com
github: https://github.com/bigwhite

微信赞赏：

商务合作方式：撰稿、出书、培训、在线课程、合伙创业、咨询、广告合作。

Tony Bai » nginx

谁“杀”死了你的 HTTP 连接？—— 揭秘云环境下连接池配置的隐形陷阱

案发现场：一个“幽灵”般的报错

Go 语言中的默认“陷阱”

冲突爆发点

黄金法则：连接池配置指南

Go 实战：如何正确配置 http.Client

推荐配置示例

关键参数详解

最后的防线：重试机制

小结

Go语言正在成为“老旧”生态的“新引擎”？从 FrankenPHP 和新版 TypeScript 编译器谈起

编程语言中的“丰田卡罗拉”

Go：一个意想不到的“新引擎”

Go 语言的新角色：从“建新城”到“改旧都”

结论：拥抱务实，而非追逐光环

从简单到强大：再次探索Caddy服务器的魅力

1. Caddy的运行方法与基本配置

1.1 Caddy的启停

1.2 使用Caddyfile配置站点信息

1.3 Caddyfile背后的那些事儿

1.4 四层代理配置和grpc

1.4.1 Raw TCP和UDP

1.4.2 RPC

2. 运行时使用API对Caddy进行动态配置

2.1 POST /load

2.2 /config/[path]

2.2.1 读取特定路径下的配置

2.2.2 更新特定路径下的配置

2.3 @id

3. 生产环境的实践与ACME

3.1 生产环境的Caddy配置方法

3.2 自动HTTPS与ACME

4. 小结

Gopher Daily改版了

1. “半自动化”的制作流程

2. Go技术资料自动收集

2.1 资料源的来源

2.2 感知和拉取资料源的更新

3. 自动摘要与翻译

3.1 提取html中的原始文本

3.2 提取摘要

3.3 翻译

4. 页面样式设计与html生成

5. 服务器选型

6 小结

使用viper实现yaml配置文件的合并

使用Go开发Kubernetes Operator：基本结构

一. Operator的优点

二. Kubernetes resource、resource type、API和controller介绍

三. Operator模式 = 操作对象(CRD) + 控制逻辑(controller)

四. 使用kubebuilder开发webserver operator

1. 安装kubebuilder

2. 创建webserver-operator工程

3. 创建API，生成初始CRD

4. webserver-operator的基本结构

5. 为CRD spec添加字段(field)

6. 修改role.yaml

7. 实现controller的Reconcile(协调)逻辑

8. 构建controller image

9. 部署controller

10. 创建WebServer CR

Welcome to nginx!

11. 伸缩、变更版本和Service自愈

Welcome to nginx!

五. 小结

六. 参考资料

小厂内部私有Go module拉取方案（续）

适配gitlab

答疑

1. git clone错误

2. 使用非安全连接

使用Docker Compose构建一键启动的运行环境

1. docker compose

2. 选版本

3. 选网络

4. 依赖的中间件先启动，预置配置次之

5. 全家桶，一应俱全

6. 结合Makefile，简化命令行输入

7. .env文件