RESTAPI | Tony Bai

标签 RESTAPI 下的文章

使用istio治理微服务入门

一月 3, 2018
13 条评论

近两年微服务架构流行，主流互联网厂商内部都已经微服务化，初创企业虽然技术积淀不行，但也通过各种开源工具拥抱微服务。再加上容器技术赋能，Kubernetes又添了一把火，微服务架构已然成为当前软件架构设计的首选。

但微服务化易弄，服务治理难搞！

一、微服务的“痛点”

微服务化没有统一标准，多数是进行业务领域垂直切分，业务按一定的粒度划分职责，并形成清晰、职责单一的服务接口，这样每一块规划为一个微服务。微服务之间的通信方案相对成熟，开源领域选择较多的有RPC或RESTful API方案，比如：gRPC、apache thrift等。这些方案多偏重于数据如何打包、传输与解包，对服务治理的内容涉及甚少。

微服务治理是头疼的事，也是微服务架构中的痛点。治理这个词有多元含义，很难下达一个精确定义，这里可以像小学二年级学生那样列出治理的诸多近义词：管理、控制、规则、掌控、监督、支配、规定、统治等。对于微服务而言，治理体现在以下诸多方面：

服务注册与发现
身份验证与授权
服务的伸缩控制
反向代理与负载均衡
路由控制
流量切换
日志管理
性能度量、监控与调优
分布式跟踪
过载保护
服务降级
服务部署与版本升级策略支持
错误处理
… …

从微服务治理角度来说，微服务其实是一个“大系统”，要想将这个大系统全部落地，绝非易事，尤其是之前尚没有一种特别优雅的技术方案。多数方案(比如：dubbo、go-kit等。)都或多或少地对应用逻辑有一定的侵入性，让业务开发人员不能只focus到业务本身，还要关心那些“治理”逻辑。并且市面上内置了微服务治理逻辑的框架较少，且很多编程语言相关。这种情况下，大厂多选择自研或基于某个框架改造，小厂一般只能“东拼西凑”一些“半成品”凑合着使用，就这样微服务也走过了若干年。

二、Service Mesh横空出世，istio带来“福音”

我不知道在没有TCP/IP协议的年代，主机和主机之间的应用通信时是否需要应用关心底层通信协议实现逻辑。但是和TCP/IP诞生的思想类似，在微服务使用多年后，人们发现需要独立地抽象出一层逻辑网络，专门用于“微服务通信与治理策略的落地”，让应用只关心业务，把服务治理的事情全部交由“这一层”去处理。

img{512x368}
图：传统微服务之间的微服务治理逻辑的位置

img{512x368}
图：微服务治理逻辑被独立出来之后的位置

由“Service Govern Logic”这一层组成的逻辑网络被定义为service mesh，每个微服务都包含一个service mesh的端点。

“Service Mesh”概念还非常年轻，这个词在国内被翻译为“服务网格”或“服务啮合层”，我们这里就用Service Mesh这个英文词。这里摘录一下ServiceMesh中文社区上的一篇名为“年度盘点2017之Service Mesh：群雄逐鹿烽烟起”的文章中对Service Mesh概念的回顾：

在 2016 年年初，“Service Mesh”还只是 Buoyant 公司的内部词汇，而之后，它开始逐步走向社区：
2016 年 9 月 29 日在 SF Microservices 上，“Service Mesh”这个词汇第一次在公开场合被使用。这标志着“Service Mesh”这个词，从 Buoyant 公司走向社区。
2016 年 10 月，Alex Leong 开始在 Buoyant 公司的官方 Blog 中连载系列文章“A Service Mesh for Kubernetes”。随着“The Services must Mesh”口号的喊出，Buoyant 和 Linkerd 开始 Service Mesh 概念的布道。
2017 年 4 月 25 日，William Morgan 发布博文“What’s a service mesh? And why do I need one?”。正式给 Service Mesh 做了一个权威定义。

而Service Mesh真正引起大家关注要源于istio项目的开源发布。为什么呢？个人觉得还是因为“爹好”！istio项目由Google、IBM共同合作创建，lyft公司贡献了envoy项目将作为istio service mesh的data panel。Google、IBM的影响力让Service Mesh概念迅速传播，同时也让大家认识到了istio项目在service mesh领域的重要性，于是纷纷选择积极支持并将自己的产品或项目与istio项目集成。

istio项目是service mesh概念的最新实现，旨在所有主流集群管理平台上提供service mesh层，初期以实现Kubernetes上的服务治理层为目标。它由控制平面和数据平面组成（是不是感觉和SDN的设计理念相似啊）。控制平面由Go语言实现，包括pilot、mixer、auth三个组件；数据平面功能暂由envoy在pod中以sidecar的部署形式提供。下面是官方的架构图：

img{512x368}
图：istio架构图(来自官网)

sidecar中envoy代理了pod中真正业务container的所有进出流量，并对这些流量按照控制平面设定的“治理逻辑”进行处理。而这一切对pod中的业务应用是透明的，开发人员可以专心于业务逻辑，而无需再关心微服务治理的逻辑。istio代表的service mesh的设计理念被认为是下一代“微服务统一框架”，甚至有人认为是微服务框架演化的终点。

istio于2017 年 5 月 24 日发布了0.1 release 版本，截至目前为止istio的版本更新到v0.4.0，演进速度相当快，不过目前依然不要用于生产环境，至少要等到1.0版本发布吧。但对于istio的早期接纳者而言，现在正是深入研究istio的好时机。在本篇的接下来内容中，我们将带领大家感性的认识一下istio，入个门儿。

三、istio安装

istio目前支持最好的就是kubernetes了，因此我们的实验环境就定在kubernetes上。至于版本，istio当前最新版本为0.4.0，这个版本据说要k8s 1.7.4及以上版本用起来才不会发生小毛病:)。我的k8s集群是v1.7.6版本的，恰好满足条件。下面是安装过程：（Node上的os是ubuntu 16.04）

# wget -c https://github.com/istio/istio/releases/download/0.4.0/istio-0.4.0-linux.tar.gz

解压后，进入istio-0.4.0目录，

# ls -F
bin/  install/  istio.VERSION  LICENSE  README.md  samples/

# cat istio.VERSION
# DO NOT EDIT THIS FILE MANUALLY instead use
# install/updateVersion.sh (see install/README.md)
export CA_HUB="docker.io/istio"
export CA_TAG="0.4.0"
export MIXER_HUB="docker.io/istio"
export MIXER_TAG="0.4.0"
export PILOT_HUB="docker.io/istio"
export PILOT_TAG="0.4.0"
export ISTIOCTL_URL="https://storage.googleapis.com/istio-release/releases/0.4.0/istioctl"
export PROXY_TAG="0.4.0"
export ISTIO_NAMESPACE="istio-system"
export AUTH_DEBIAN_URL="https://storage.googleapis.com/istio-release/releases/0.4.0/deb"
export PILOT_DEBIAN_URL="https://storage.googleapis.com/istio-release/releases/0.4.0/deb"
export PROXY_DEBIAN_URL="https://storage.googleapis.com/istio-release/releases/0.4.0/deb"
export FORTIO_HUB="docker.io/istio"
export FORTIO_TAG="0.4.2"

# cd install/kubernetes

我们先不用auth功能，因此使用istio.yaml这个文件进行istio组件安装：

# kubectl apply -f istio.yaml
namespace "istio-system" created
clusterrole "istio-pilot-istio-system" created
clusterrole "istio-initializer-istio-system" created
clusterrole "istio-mixer-istio-system" created
clusterrole "istio-ca-istio-system" created
clusterrole "istio-sidecar-istio-system" created
clusterrolebinding "istio-pilot-admin-role-binding-istio-system" created
clusterrolebinding "istio-initializer-admin-role-binding-istio-system" created
clusterrolebinding "istio-ca-role-binding-istio-system" created
clusterrolebinding "istio-ingress-admin-role-binding-istio-system" created
clusterrolebinding "istio-sidecar-role-binding-istio-system" created
clusterrolebinding "istio-mixer-admin-role-binding-istio-system" created
configmap "istio-mixer" created
service "istio-mixer" created
serviceaccount "istio-mixer-service-account" created
deployment "istio-mixer" created
customresourcedefinition "rules.config.istio.io" created
customresourcedefinition "attributemanifests.config.istio.io" created
... ...
customresourcedefinition "reportnothings.config.istio.io" created
attributemanifest "istioproxy" created
attributemanifest "kubernetes" created
stdio "handler" created
logentry "accesslog" created
rule "stdio" created
metric "requestcount" created
metric "requestduration" created
metric "requestsize" created
metric "responsesize" created
metric "tcpbytesent" created
metric "tcpbytereceived" created
prometheus "handler" created
rule "promhttp" created
rule "promtcp" created
kubernetesenv "handler" created
rule "kubeattrgenrulerule" created
kubernetes "attributes" created
configmap "istio" created
customresourcedefinition "destinationpolicies.config.istio.io" created
customresourcedefinition "egressrules.config.istio.io" created
customresourcedefinition "routerules.config.istio.io" created
service "istio-pilot" created
serviceaccount "istio-pilot-service-account" created
deployment "istio-pilot" created
service "istio-ingress" created
serviceaccount "istio-ingress-service-account" created
deployment "istio-ingress" created
serviceaccount "istio-ca-service-account" created
deployment "istio-ca" created

注：我还曾在k8s v1.7.3上安装过istio 0.3.0版本，但在创建组件时会报下面错误（这个错误可能会导致后续addon安装后工作不正常）：

unable to recognize "istio.yaml": no matches for config.istio.io/, Kind=metric
unable to recognize "istio.yaml": no matches for config.istio.io/, Kind=metric
unable to recognize "istio.yaml": no matches for config.istio.io/, Kind=metric
unable to recognize "istio.yaml": no matches for config.istio.io/, Kind=metric
unable to recognize "istio.yaml": no matches for config.istio.io/, Kind=metric
unable to recognize "istio.yaml": no matches for config.istio.io/, Kind=metric

安装后，我们在istio-system这个namespace下会看到如下pod和service在运行（由于istio的各个组件的image size都不小，因此pod状态变为running需要一丢丢时间，耐心等待）：

# kubectl get pods -n istio-system
NAME                             READY     STATUS    RESTARTS   AGE
istio-ca-1363003450-jskp5        1/1       Running   0          3d
istio-ingress-1005666339-c7776   1/1       Running   4          3d
istio-mixer-465004155-twhxq      3/3       Running   24         3d
istio-pilot-1861292947-6v37w     2/2       Running   18         3d

# kubectl get svc -n istio-system
NAME            CLUSTER-IP       EXTERNAL-IP   PORT(S)                                                   AGE
istio-ingress   10.98.10.87      <pending>     80:31759/TCP,443:25804/TCP                         4d
istio-mixer     10.109.244.155   <none>        9091/TCP,15004/TCP,9093/TCP,9094/TCP,9102/TCP,9125/UDP,42422/TCP   4d
istio-pilot     10.105.80.55     <none>        15003/TCP,443/TCP                                              4d

istio安装成功！

四、服务治理策略验证

接下来我们来用几个例子验证一下istio在服务治理方面的能力！（istio自带一些完整的例子，比如bookinfo，用于验证服务治理的能力，但这里先不打算用这些例子）

1、验证环境和拓扑

我们先来看一下验证环境的示意图：
img{512x368}

我们看到在service mesh中部署了两个service: server_a和service_b，前者调用后者完成某项业务，后者则调用外部服务完成业务逻辑。

service_a: 模拟pay服务，在收到client请求后，进行pay处理，并将处理结果通过service_b提供的msg notify服务下发给user。该服务的endpoint为/pay；
service_b: 模拟notify服务，在收到service_a请求后，将message转发给external service，完成notify逻辑。该服务的endpoint为/notify；
external service: 位于service mesh之外。
client：我们使用curl模拟。

img{512x368}

我们先来部署service_a和service_b的v0.1版本：

以service_a的部署为例, service_a的deployment文件如下：

//svca-v0.1.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: svca
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: svca
        version: v0.1
    spec:
      containers:
      - name: svca
        image: docker.io/bigwhite/istio-demo-svca:v0.1
        imagePullPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
  name: svca
  labels:
    app: svca
spec:
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
  selector:
    app: svca

注意，我们部署service_a时不能直接使用kubectl apply -f svca-v0.1.yaml，而是要apply经过istioctl(需将istio安装目录下的bin放入PATH)处理过的yaml，以注入sidecar容器。当然也可以配置为自动为每个k8s启动的pod注入sidecar，但我们这里没有使用自动注入。我们执行下面命令：

# kubectl apply -f <(istioctl kube-inject -f svca-v0.1.yaml)
deployment "svca" created
service "svca" created

# kubectl get pods
NAME                               READY     STATUS    RESTARTS   AGE
svca-1997590752-tpwjf              2/2       Running   0          2m

同样的方法，我们来创建svcb:v0.1:

# kubectl apply -f <(istioctl kube-inject -f svcb-v0.1.yaml)
deployment "svcb" created
service "svcb" created

我们看到istio向每个pod中插入一个sidecar container，这个就是前面说的envoy，只不过container名字为istio-proxy。

接下来，我们把那个external service启动起来：

# nohup ./msgd > 1.log & 2>&1
[1] 9423

实验环境ok了。下面我们来验证一下业务是否是通的。

2、egress rule

按照之前我们的设定，我们使用curl去访问service_a服务的/pay端点，我们查看一下svca服务的ip和端口：

# kubectl get svc
NAME               CLUSTER-IP       EXTERNAL-IP   PORT(S)
svca               10.105.38.238    <none>        80/TCP                                         9h
svcb               10.105.119.194   <none>        80/TCP                                         9h

我们访问一下svca服务，svca的服务地址可以通过kubectl get svc查到：

# curl {svca_ip}/pay

查看svca和svcb的日志：

//service_a的日志：

service_a:v0.1 is serving the request...
service_a:v0.1 pays ok
&{500 Internal Server Error 500 HTTP/1.1 1 1 map[X-Content-Type-Options:[nosniff] Date:[Tue, 02 Jan 2018 15:41:50 GMT] Content-Length:[66] Content-Type:[text/plain; charset=utf-8]] 0xc420058d40 66 [] false false map[] 0xc4200eaf00 <nil>}
service_a:v0.1 notify customer ok

// service_b的日志：
&{GET /notify?msg=service_a:v0.1-pays-ok HTTP/1.1 1 1 map[User-Agent:[Go-http-client/1.1] Accept-Encoding:[gzip]] {} <nil> 0 [] false svcb map[] map[] <nil> map[] 127.0.0.1:58778 /notify?msg=service_a:v0.1-pays-ok <nil> <nil> <nil> 0xc4200fa3c0}
service_b:v0.1 is serving the request...
service_b:v0.1 send msg error: Get http://10.100.35.27:9997/send?msg=service_a:v0.1-pays-ok: EOF

我们看到service_a和service_b都返回了错误日志（注意：go http get方法对于non-2xx response不会返回错误，我们只是看到了response中的500状态码才意识到错误的存在）。其中源头在service_b，原因是其连不上那个external service！那么为什么连不上external service呢？这是由于缺省情况下，启用了Istio的服务是无法访问外部URL的，这是因为Pod中的iptables把所有外发传输都转向到了Sidecar代理，而这一代理只处理集群内的访问目标。因此位于service mesh内的服务svcb无法访问外部的服务(msgd)，我们需要显式的添加egressrule规则：

我们创建一个允许svcb访问外部特定服务的EgressRule：

//rules/enable-svcb-engress-rule.yaml

apiVersion: config.istio.io/v1alpha2
kind: EgressRule
metadata:
  name: enable-svcb-engress-rule
spec:
  destination:
    service: 10.100.35.27
  ports:
    - port: 9997
      protocol: http

使规则生效：

# istioctl create -f enable-svcb-engress-rule.yaml
Created config egress-rule/default/enable-svcb-engress-rule at revision 30031258

这时你再尝试curl svca，我们可以看到msgd的日志中出现了下面的内容：

2018/01/02 23:58:16 &{GET /send?msg=service_a:v0.1-pays-ok HTTP/1.1 1 1 map[X-Ot-Span-Context:[2157e7ffb8105330;2157e7ffb8105330;0000000000000000] Content-Length:[0] User-Agent:[Go-http-client/1.1] X-Forwarded-Proto:[http] X-Request-Id:[13c3af6e-2f52-993d-905f-aa6aa4b57e2d] X-Envoy-Decorator-Operation:[default-route] X-B3-Spanid:[2157e7ffb8105330] X-B3-Sampled:[1] Accept-Encoding:[gzip] X-B3-Traceid:[2157e7ffb8105330] X-Istio-Attributes:[Ch8KCXNvdXJjZS5pcBISMhAAAAAAAAAAAAAA//8KLgAMCjoKCnNvdXJjZS51aWQSLBIqa3ViZXJuZXRlczovL3N2Y2ItMjAwODk3Mzc2OS1ncTBsaC5kZWZhdWx0]] {} <nil> 0 [] false 10.100.35.27:9997 map[] map[] <nil> map[] 10.100.35.28:38188 /send?msg=service_a:v0.1-pays-ok <nil> <nil> <nil> 0xc4200584c0}
2018/01/02 23:58:16 Msgd is serving the request...
2018/01/02 23:58:16 Msgd recv msg ok, msg= service_a:v0.1-pays-ok

说明Svcb到外部服务的通信被打通了！

3、迁移流量到新版本svcb:v0.2

我们经常有这样的需求，当svcb运行一段时间后，svcb添加了新feature，版本要升级到v0.2了，这时我们会部署svcb:v0.2，并将流量逐步切到v0.2上。

我们先来部署一下svcb:v0.2：

// svcb-v0.2.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: svcb-v0.2
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: svcb
        version: v0.2
    spec:
      containers:
      - name: svcb
        image: docker.io/bigwhite/istio-demo-svcb:v0.2
        imagePullPolicy: Always

我们可以看到，服务名不变，但版本的label变成了v0.2，我们来执行这次部署：

# kubectl apply -f <(istioctl kube-inject -f svcb-v0.2.yaml)
deployment "svcb-v0.2" created

# kubectl get pods
NAME                               READY     STATUS    RESTARTS   AGE
svca-1997590752-pq9zg              2/2       Running   0          9h
svcb-2008973769-gq0lh              2/2       Running   0          9h
svcb-v0.2-3233505404-0g55w         2/2       Running   0          1m

svcb服务下又增加了一个endpoint:

# kubectl describe svc/svcb

.... ...
Selector:        app=svcb
Type:            ClusterIP
IP:            10.105.119.194
Port:            <unset>    80/TCP
Endpoints:        10.40.0.28:8080,10.46.0.12:8080
... ...

此时，如果按照k8s的调度方式，v0.1和v0.2版本的两个svcb pod应该1:1均衡地承载流量。为了方便查看流量分布，我们将每个版本的svcb的pod副本数量都扩展为2个(replicas: 2)，这样service mesh中一共会有4个 svcb endpoints。

通过curl访问svca注入流量后，我们发现流量都集中在一个svcb:v0.2的pod上，并且长时间没有变化。我们通过下面的route rule规则来尝试将流量在svcb:v0.1和svcb:v0.2之间1:1均衡：

// route-rules-svcb-v0.2-50.yaml
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: route-rules-svcb
spec:
  destination:
    name: svcb
  precedence: 1
  route:
  - labels:
      version: v0.1
    weight: 50
  - labels:
      version: v0.2
    weight: 50

# istioctl create -f route-rules-svcb-v0.2-50.yaml
Created config route-rule/default/route-rules-svcb at revision 30080638

按照istio文档中的说法，这个规则的生效需要一些时间。之后我们注入流量，发现流量切换到svcb:v0.1的一个pod上去了，并且很长一段时间不曾变化，未均衡到svcb:v0.2上去。

我们更新一下route rule，将流量全部切到svcb:v0.2上去：

//route-rules-svcb-v0.2-100.yaml
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: route-rules-svcb
spec:
  destination:
    name: svcb
  precedence: 1
  route:
  - labels:
      version: v0.2
    weight: 100

# istioctl replace -f route-rules-svcb-v0.2-100.yaml
Updated config route-rule/default/route-rules-svcb to revision 30082944

我们用istio的replace命令更新了规则：route-rules-svcb。更新后，再次注入流量，这回流量重新集中在svcb:v0.2的一个pod上了，再过一段时间另外一个svcb:v0.2的pod上才有了一些流量。但svcb:v0.1上不再有流量，这个切换是成功的。

在k8s的service的负载均衡中，k8s就利用了iptables的概率转发（random –probability 0.5），因此这种流量均衡并非是精确的，只有在长时间大量流量经过后，才能看到流量的分布与设定的权重是相似的，可能istio也是如此，这里仅是入门，就不深入挖掘了。

当然istio在路由规则设施方面的“能耐”远不止上面例子中所展示的那样，如果要悉数列出，那本文的长度可是要爆掉了。有兴趣的朋友可以去翻看官方文档。

五、插件安装

istio的强大微服务治理能力还体现在其集成了grafana、prometheus、servicegraph、zipkin等addons，应用程序无需做任何改动，就可以具有数据收集、度量与可视化的监控能力、服务的分布式跟踪能力等。我们可以在istio的安装包中找到这些addons的安装文件，我们来逐一试试。

1、prometheus & grafana

我们先来安装一下prometheus 和 grafana插件(位于istio-0.4.0/install/kubernetes/addon下面)：

# kubectl apply -f prometheus.yaml
configmap "prometheus" created
service "prometheus" created
deployment "prometheus" created

# kubectl apply -f grafana.yaml
service "grafana" created
deployment "grafana" created

# kubectl get pods -n istio-system
NAME                             READY     STATUS    RESTARTS   AGE
grafana-3617079618-zpglx         1/1       Running   0          5m
prometheus-168775884-ppfxr       1/1       Running   0          5m
... ...

# kubectl get svc -n istio-system
NAME            CLUSTER-IP       EXTERNAL-IP   PORT(S)            AGE
grafana         10.105.21.25     <none>        3000/TCP                     16m
prometheus      10.103.160.37    <none>        9090/TCP                16m
... ...

浏览器中输入prometheus的服务地址http://10.103.160.37:9090，访问prometheus:

img{512x368}

点击菜单项：status -> targets，查看各个target的状态是否正常：

img{512x368}

如果像上图所示那样，各个target都是up状态，那就说明istio运行时ok的。否则请参考istio troubleshooting中的内容对istio逐一进行排查，尤其是istio-mesh这个Target在istio-0.3.0+kubernetes 1.7.3的环境中就是Down的状态。

浏览器输入grafana的服务地址：http://10.105.21.25:3000/，打开grafana面板：

img{512x368}

切换到Istio Dashboard，并向istio service mesh注入流量，我们会看到仪表盘变化如下：

img{512x368}

2、servicegraph

servicegraph插件是用来查看服务调用关系的，我们来创建一下该组件：

# kubectl apply -f servicegraph.yaml
deployment "servicegraph" created
service "servicegraph" created

# kubectl get svc -n istio-system
NAME            CLUSTER-IP       EXTERNAL-IP   PORT(S)                 AGE
servicegraph    10.108.245.21    <none>        8088/TCP                     52s
... ...

创建成功后，向service mesh网络注入流量，然后访问servicegraph：http://{servicegraph_ip}:8088/dotviz，在我的环境里，我看到的图示如下：

img{512x368}

调用关系似乎有些乱，难道是我在程序使用的调用方法不够标准？:(

3、zipkin

istio集成了zipkin，利用zipkin我们可以做分布式服务调用的追踪。之前自己曾经搭建过基于jaeger和opentracing的分布式调用服务，十分繁琐。并且要想使用tracing，对应用代码的侵入必不可少。

我们安装一下zipkin addon:

# kubectl apply -f zipkin.yaml
deployment "zipkin" created
service "zipkin" created

# kubectl get svc -n istio-system
NAME            CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
zipkin          10.105.7.219     <none>        9411/TCP                             1h

我们访问以下zikpin的UI，通过浏览器打开http://{zipkin_service_ip}:9411。

img{512x368}

接下来，我们向service mesh注入一些流量，然后再zipkin首页的“服务名”下拉框中选择”svcb”，查找跟踪情况：

img{512x368}

我们看到：在没有对svca, svcb做任何修改的情况下，我们依然可以在zipkin中找到svcb相关的调用。点击其中一个trace，可以查看细节：

img{512x368}

当然如果你想做内容更为丰富的、更为强大的跟踪，可能需要在应用代码中做些配合，具体可以参见：istio的分布式跟踪。

六、小结

istio项目诞生不到一年，目前离成熟还远。快速积极开发可能会导致istio的接口和实现机制都会发生很大的变化，因此本文不能保证内容将适用于后续所有istio的发布版本。

本文涉及到的源码在这里可以下载到，demo service的镜像可以在我的docker hub上pull。

更多内容可以通过我在慕课网开设的实战课程《Kubernetes实战高可用集群搭建、配置、运维与应用》学习。

微博：@tonybai_cn
微信公众号：iamtonybai
github.com: https://github.com/bigwhite

微信赞赏：
img{512x368}

Kubernetes集群中Service的滚动更新

二月 9, 2017
3 条评论

在移动互联网时代，消费者的消费行为已经“全天候化”，为此，商家的业务系统也要保持7×24小时不间断地提供服务以满足消费者的需求。很难想像如今还会有以“中断业务”为前提的服务系统更新升级。如果微信官方发布公告说：每周六晚23:00~次日凌晨2:00进行例行系统升级，不能提供服务，作为用户的你会怎么想、怎么做呢？因此，各个平台在最初设计时就要考虑到服务的更新升级问题，部署在Kubernetes集群中的Service也不例外。

一、预备知识

1、滚动更新Rolling-update

传统的升级更新，是先将服务全部下线，业务停止后再更新版本和配置，然后重新启动并提供服务。这样的模式已经完全不能满足“时代的需要”了。在并发化、高可用系统普及的今天，服务的升级更新至少要做到“业务不中断”。而滚动更新(Rolling-update)恰是满足这一需求的一种系统更新升级方案。

简单来说，滚动更新就是针对多实例服务的一种不中断服务的更新升级方式。一般情况，对于多实例服务，滚动更新采用对各个实例逐个进行单独更新而非同一时刻对所有实例进行全部更新的方式。“滚动更新”的先进之处在于“滚动”这个概念的引入，笔者觉得它至少有以下两点含义：

a) “滚动”给人一种“圆”的映像，表意：持续，不中断。“滚动”的理念是一种趋势，我们常见的“滚动发布”、“持续交付”都是“滚动”理念的应用。与传统的大版本周期性发布/更新相比，”滚动”可以让用户更快、更及时地使用上新Feature，缩短市场反馈周期，同时滚动式的发布和更新又会将对用户体验的影响降到最小化。

b) “滚动”可向前，也可向后。我们可以在更新过程中及时发现“更新”存在的问题，并“向后滚动”，实现更新的回退，可以最大程度上降低每次更新升级的风险。

对于在Kubernetes集群部署的Service来说，Rolling update就是指一次仅更新一个Pod，并逐个进行更新，而不是在同一时刻将该Service下面的所有Pod shutdown，避免将业务中断的尴尬。

2、Service、Deployment、Replica Set、Replication Controllers和Pod之间的关系

对于我们要部署的Application来说，一般是由多个抽象的Service组成。在Kubernetes中，一个Service通过label selector match出一个Pods集合，这些Pods作为Service的endpoint，是真正承载业务的实体。而Pod在集群内的部署、调度、副本数保持则是通过Deployment或ReplicationControllers这些高level的抽象来管理的，下面是一幅示意图：

img{512x368}

新版本的Kubernetes推荐用Deployment替代ReplicationController，在Deployment这个概念下在保持Pod副本数上实际发挥作用的是隐藏在背后的Replica Set。

因此，我们可以看到Kubernetes上Service的rolling update实质上是对Service所match出来的Pod集合的Rolling update，而控制Pod部署、调度和副本调度的却又恰恰是Deployment和replication controller，因此后两者才是kubernetes service rolling update真正要面对的实体。

二、kubectl rolling-update子命令

kubernetes在kubectl cli工具中仅提供了对Replication Controller的rolling-update支持，通过kubectl -help，我们可以查看到下面的命令usage描述：

# kubectl -help
... ...
Deploy Commands:
  rollout        Manage a deployment rollout
  rolling-update Perform a rolling update of the given ReplicationController
  scale          Set a new size for a Deployment, ReplicaSet, Replication Controller, or Job
  autoscale      Auto-scale a Deployment, ReplicaSet, or ReplicationController
... ...

# kubectl help rolling-update
... ...
Usage:
  kubectl rolling-update OLD_CONTROLLER_NAME ([NEW_CONTROLLER_NAME] --image=NEW_CONTAINER_IMAGE | -f
NEW_CONTROLLER_SPEC) [options]
... ...

我们现在来看一个例子，看一下kubectl rolling-update是如何对service下的Pods进行滚动更新的。我们的kubernetes集群有两个版本的Nginx：

# docker images|grep nginx
nginx                                                    1.11.9                     cc1b61406712        2 weeks ago         181.8 MB
nginx                                                    1.10.1                     bf2b4c2d7bf5        4 months ago        180.7 MB

在例子中我们将Service的Pod从nginx 1.10.1版本滚动升级到1.11.9版本。

我们的rc-demo-v0.1.yaml文件内容如下：

apiVersion: v1
kind: ReplicationController
metadata:
  name: rc-demo-nginx-v0.1
spec:
  replicas: 4
  selector:
    app: rc-demo-nginx
    ver: v0.1
  template:
    metadata:
      labels:
        app: rc-demo-nginx
        ver: v0.1
    spec:
      containers:
        - name: rc-demo-nginx
          image: nginx:1.10.1
          ports:
            - containerPort: 80
              protocol: TCP
          env:
            - name: RC_DEMO_VER
              value: v0.1

创建这个replication controller：

# kubectl create -f rc-demo-v0.1.yaml
replicationcontroller "rc-demo-nginx-v0.1" created

# kubectl get pods -o wide
NAME                       READY     STATUS    RESTARTS   AGE       IP             NODE
rc-demo-nginx-v0.1-2p7v0   1/1       Running   0          1m        172.30.192.9   iz2ze39jeyizepdxhwqci6z
rc-demo-nginx-v0.1-9pk3t   1/1       Running   0          1m        172.30.192.8   iz2ze39jeyizepdxhwqci6z
rc-demo-nginx-v0.1-hm6b9   1/1       Running   0          1m        172.30.0.9     iz25beglnhtz
rc-demo-nginx-v0.1-vbxpl   1/1       Running   0          1m        172.30.0.10    iz25beglnhtz

Service manifest文件rc-demo-svc.yaml的内容如下：

apiVersion: v1
kind: Service
metadata:
  name: rc-demo-svc
spec:
  ports:
  - port: 80
    protocol: TCP
  selector:
    app: rc-demo-nginx

创建这个service：

# kubectl create -f rc-demo-svc.yaml
service "rc-demo-svc" created

# kubectl describe svc/rc-demo-svc
Name:            rc-demo-svc
Namespace:        default
Labels:            <none>
Selector:        app=rc-demo-nginx
Type:            ClusterIP
IP:            10.96.172.246
Port:            <unset>    80/TCP
Endpoints:        172.30.0.10:80,172.30.0.9:80,172.30.192.8:80 + 1 more...
Session Affinity:    None
No events.

可以看到之前replication controller创建的4个Pod都被置于rc-demo-svc这个service的下面了，我们来访问一下该服务：

# curl -I http://10.96.172.246:80
HTTP/1.1 200 OK
Server: nginx/1.10.1
Date: Wed, 08 Feb 2017 08:45:19 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 31 May 2016 14:17:02 GMT
Connection: keep-alive
ETag: "574d9cde-264"
Accept-Ranges: bytes

# kubectl exec rc-demo-nginx-v0.1-2p7v0  env
... ...
RC_DEMO_VER=v0.1
... ...

通过Response Header中的Server字段，我们可以看到当前Service pods中的nginx版本为1.10.1；通过打印Pod中环境变量，得到RC_DEMO_VER=v0.1。

接下来，我们来rolling-update rc-demo-nginx-v0.1这个rc，我们的新rc manifest文件rc-demo-v0.2.yaml内容如下：

apiVersion: v1
kind: ReplicationController
metadata:
  name: rc-demo-nginx-v0.2
spec:
  replicas: 4
  selector:
    app: rc-demo-nginx
    ver: v0.2
  template:
    metadata:
      labels:
        app: rc-demo-nginx
        ver: v0.2
    spec:
      containers:
        - name: rc-demo-nginx
          image: nginx:1.11.9
          ports:
            - containerPort: 80
              protocol: TCP
          env:
            - name: RC_DEMO_VER
              value: v0.2

rc-demo-new.yaml与rc-demo-old.yaml有几点不同：rc的name、image的版本以及RC_DEMO_VER这个环境变量的值：

# diff rc-demo-v0.2.yaml rc-demo-v0.1.yaml
4c4
<   name: rc-demo-nginx-v0.2
---
>   name: rc-demo-nginx-v0.1
9c9
<     ver: v0.2
---
>     ver: v0.1
14c14
<         ver: v0.2
---
>         ver: v0.1
18c18
<           image: nginx:1.11.9
---
>           image: nginx:1.10.1
24c24
<               value: v0.2
---
>               value: v0.1

我们开始rolling-update，为了便于跟踪update过程，这里将update-period设为10s，即每隔10s更新一个Pod：

#  kubectl rolling-update rc-demo-nginx-v0.1 --update-period=10s -f rc-demo-v0.2.yaml
Created rc-demo-nginx-v0.2
Scaling up rc-demo-nginx-v0.2 from 0 to 4, scaling down rc-demo-nginx-v0.1 from 4 to 0 (keep 4 pods available, don't exceed 5 pods)
Scaling rc-demo-nginx-v0.2 up to 1
Scaling rc-demo-nginx-v0.1 down to 3
Scaling rc-demo-nginx-v0.2 up to 2
Scaling rc-demo-nginx-v0.1 down to 2
Scaling rc-demo-nginx-v0.2 up to 3
Scaling rc-demo-nginx-v0.1 down to 1
Scaling rc-demo-nginx-v0.2 up to 4
Scaling rc-demo-nginx-v0.1 down to 0
Update succeeded. Deleting rc-demo-nginx-v0.1
replicationcontroller "rc-demo-nginx-v0.1" rolling updated to "rc-demo-nginx-v0.2"

从日志可以看出：kubectl rolling-update逐渐增加 rc-demo-nginx-v0.2的scale并同时逐渐减小 rc-demo-nginx-v0.1的scale值直至减到0。

在升级过程中，我们不断访问rc-demo-svc，可以看到新旧Pod版本共存的状态，服务并未中断：

# curl -I http://10.96.172.246:80
HTTP/1.1 200 OK
Server: nginx/1.10.1
... ...

# curl -I http://10.96.172.246:80
HTTP/1.1 200 OK
Server: nginx/1.11.9
... ...

# curl -I http://10.96.172.246:80
HTTP/1.1 200 OK
Server: nginx/1.10.1
... ...

更新后的一些状态信息：

# kubectl get rc
NAME                 DESIRED   CURRENT   READY     AGE
rc-demo-nginx-v0.2   4         4         4         5m

# kubectl get pods
NAME                       READY     STATUS    RESTARTS   AGE
rc-demo-nginx-v0.2-25b15   1/1       Running   0          5m
rc-demo-nginx-v0.2-3jlpk   1/1       Running   0          5m
rc-demo-nginx-v0.2-lcnf9   1/1       Running   0          6m
rc-demo-nginx-v0.2-s7pkc   1/1       Running   0          5m

# kubectl exec rc-demo-nginx-v0.2-25b15  env
... ...
RC_DEMO_VER=v0.2
... ...

官方文档说kubectl rolling-update是由client side实现的rolling-update，这是因为roll-update的逻辑都是由kubectl发出N条命令到APIServer完成的，在kubectl的代码中我们可以看到这点：

//https://github.com/kubernetes/kubernetes/blob/master/pkg/kubectl/cmd/rollingupdate.go
... ...
func RunRollingUpdate(f cmdutil.Factory, out io.Writer, cmd *cobra.Command, args []string, options *resource.FilenameOptions) error {
    ... ...
    err = updater.Update(config)
    if err != nil {
        return err
    }
    ... ...
}

//https://github.com/kubernetes/kubernetes/blob/master/pkg/kubectl/rolling_updater.go
func (r *RollingUpdater) Update(config *RollingUpdaterConfig) error {
    ... ...
    // Scale newRc and oldRc until newRc has the desired number of replicas and
    // oldRc has 0 replicas.
    progressDeadline := time.Now().UnixNano() + config.Timeout.Nanoseconds()
    for newRc.Spec.Replicas != desired || oldRc.Spec.Replicas != 0 {
        // Store the existing replica counts for progress timeout tracking.
        newReplicas := newRc.Spec.Replicas
        oldReplicas := oldRc.Spec.Replicas

        // Scale up as much as possible.
        scaledRc, err := r.scaleUp(newRc, oldRc, desired, maxSurge, maxUnavailable, scaleRetryParams, config)
        if err != nil {
            return err
        }
        newRc = scaledRc
    ... ...
}

在rolling_updater.go中Update方法使用一个for循环完成了逐步减少old rc的replicas和增加new rc的replicas的工作，直到new rc到达期望值，old rc的replicas变为0。

通过kubectl rolling-update实现的滚动更新有很多不足：
- 由kubectl实现，很可能因为网络原因导致update中断；
- 需要创建一个新的rc，名字与要更新的rc不能一样；虽然这个问题不大，但实施起来也蛮别扭的；
- 回滚还需要执行rolling-update，只是用的老版本的rc manifest文件；
- service执行的rolling-update在集群中没有记录，后续无法跟踪rolling-update历史。

不过，由于Replication Controller已被Deployment这个抽象概念所逐渐代替，下面我们来考虑如何实现Deployment的滚动更新以及deployment滚动更新的优势。

三、Deployment的rolling-update

kubernetes Deployment是一个更高级别的抽象，就像文章开头那幅示意图那样，Deployment会创建一个Replica Set，用来保证Deployment中Pod的副本数。由于kubectl rolling-update仅支持replication controllers，因此要想rolling-updata deployment中的Pod，你需要修改Deployment自己的manifest文件并应用。这个修改会创建一个新的Replica Set，在scale up这个Replica Set的Pod数的同时，减少原先的Replica Set的Pod数，直至zero。而这一切都发生在Server端，并不需要kubectl参与。

我们同样来看一个例子。我们建立第一个版本的deployment manifest文件：deployment-demo-v0.1.yaml。

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: deployment-demo
spec:
  replicas: 4
  selector:
    matchLabels:
      app: deployment-demo-nginx
  minReadySeconds: 10
  template:
    metadata:
      labels:
        app: deployment-demo-nginx
        version: v0.1
    spec:
      containers:
        - name: deployment-demo
          image: nginx:1.10.1
          ports:
            - containerPort: 80
              protocol: TCP
          env:
            - name: DEPLOYMENT_DEMO_VER
              value: v0.1

创建该deployment：

# kubectl create -f deployment-demo-v0.1.yaml --record
deployment "deployment-demo" created

# kubectl get deployments
NAME              DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment-demo   4         4         4            0           10s

# kubectl get rs
NAME                         DESIRED   CURRENT   READY     AGE
deployment-demo-1818355944   4         4         4         13s

# kubectl get pods -o wide
NAME                               READY     STATUS    RESTARTS   AGE       IP             NODE
deployment-demo-1818355944-78spp   1/1       Running   0          24s       172.30.0.10    iz25beglnhtz
deployment-demo-1818355944-7wvxk   1/1       Running   0          24s       172.30.0.9     iz25beglnhtz
deployment-demo-1818355944-hb8tt   1/1       Running   0          24s       172.30.192.9   iz2ze39jeyizepdxhwqci6z
deployment-demo-1818355944-jtxs2   1/1       Running   0          24s       172.30.192.8   iz2ze39jeyizepdxhwqci6z

# kubectl exec deployment-demo-1818355944-78spp env
... ...
DEPLOYMENT_DEMO_VER=v0.1
... ...

deployment-demo创建了ReplicaSet：deployment-demo-1818355944，用于保证Pod的副本数。

我们再来创建使用了该deployment中Pods的Service：

# kubectl create -f deployment-demo-svc.yaml
service "deployment-demo-svc" created

# kubectl get service
NAME                  CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
deployment-demo-svc   10.109.173.225   <none>        80/TCP    5s
kubernetes            10.96.0.1        <none>        443/TCP   42d

# kubectl describe service/deployment-demo-svc
Name:            deployment-demo-svc
Namespace:        default
Labels:            <none>
Selector:        app=deployment-demo-nginx
Type:            ClusterIP
IP:            10.109.173.225
Port:            <unset>    80/TCP
Endpoints:        172.30.0.10:80,172.30.0.9:80,172.30.192.8:80 + 1 more...
Session Affinity:    None
No events.

# curl -I http://10.109.173.225:80
HTTP/1.1 200 OK
Server: nginx/1.10.1
... ...

好了，我们看到该service下有四个pods，Service提供的服务也运行正常。

接下来，我们对该Service进行更新。为了方便说明，我们建立了deployment-demo-v0.2.yaml文件，其实你也大可不必另创建文件，直接再上面的deployment-demo-v0.1.yaml文件中修改也行：

# diff deployment-demo-v0.2.yaml deployment-demo-v0.1.yaml
15c15
<         version: v0.2
---
>         version: v0.1
19c19
<           image: nginx:1.11.9
---
>           image: nginx:1.10.1
25c25
<               value: v0.2
---
>               value: v0.1

我们用deployment-demo-v0.2.yaml文件来更新之前创建的deployments中的Pods：

# kubectl apply -f deployment-demo-v0.2.yaml --record
deployment "deployment-demo" configured

apply命令是瞬间接收到apiserver返回的Response并结束的。但deployment的rolling-update过程还在进行：

# kubectl describe deployment deployment-demo
Name:            deployment-demo
... ...
Replicas:        2 updated | 4 total | 3 available | 2 unavailable
StrategyType:        RollingUpdate
MinReadySeconds:    10
RollingUpdateStrategy:    1 max unavailable, 1 max surge
Conditions:
  Type        Status    Reason
  ----        ------    ------
  Available     True    MinimumReplicasAvailable
OldReplicaSets:    deployment-demo-1818355944 (3/3 replicas created)
NewReplicaSet:    deployment-demo-2775967987 (2/2 replicas created)
Events:
  FirstSeen    LastSeen    Count    From                SubObjectPath    Type        Reason            Message
  ---------    --------    -----    ----                -------------    --------    ------            -------
  12m        12m        1    {deployment-controller }            Normal        ScalingReplicaSet    Scaled up replica set deployment-demo-1818355944 to 4
  11s        11s        1    {deployment-controller }            Normal        ScalingReplicaSet    Scaled up replica set deployment-demo-2775967987 to 1
  11s        11s        1    {deployment-controller }            Normal        ScalingReplicaSet    Scaled down replica set deployment-demo-1818355944 to 3
  11s        11s        1    {deployment-controller }            Normal        ScalingReplicaSet    Scaled up replica set deployment-demo-2775967987 to 2

# kubectl get pods
NAME                               READY     STATUS              RESTARTS   AGE
deployment-demo-1818355944-78spp   1/1       Terminating         0          12m
deployment-demo-1818355944-hb8tt   1/1       Terminating         0          12m
deployment-demo-1818355944-jtxs2   1/1       Running             0          12m
deployment-demo-2775967987-5s9qx   0/1       ContainerCreating   0          0s
deployment-demo-2775967987-lf5gw   1/1       Running             0          12s
deployment-demo-2775967987-lxbx8   1/1       Running             0          12s
deployment-demo-2775967987-pr0hl   0/1       ContainerCreating   0          0s

# kubectl get rs
NAME                         DESIRED   CURRENT   READY     AGE
deployment-demo-1818355944   1         1         1         12m
deployment-demo-2775967987   4         4         4         17s

我们可以看到这个update过程中ReplicaSet的变化，同时这个过程中服务并未中断，只是新旧版本短暂地交错提供服务：

# curl -I http://10.109.173.225:80
HTTP/1.1 200 OK
Server: nginx/1.11.9
... ...

# curl -I http://10.109.173.225:80
HTTP/1.1 200 OK
Server: nginx/1.10.1
... ...

# curl -I http://10.109.173.225:80
HTTP/1.1 200 OK
Server: nginx/1.10.1
... ...

最终所有Pod被替换为了v0.2版本：

kubectl exec deployment-demo-2775967987-5s9qx env
... ...
DEPLOYMENT_DEMO_VER=v0.2
... ...

# curl -I http://10.109.173.225:80
HTTP/1.1 200 OK
Server: nginx/1.11.9
... ...

我们发现deployment的create和apply命令都带有一个–record参数，这是告诉apiserver记录update的历史。通过kubectl rollout history可以查看deployment的update history：

#  kubectl rollout history deployment deployment-demo
deployments "deployment-demo"
REVISION    CHANGE-CAUSE
1        kubectl create -f deployment-demo-v0.1.yaml --record
2        kubectl apply -f deployment-demo-v0.2.yaml --record

如果没有加“–record”，那么你得到的历史将会类似这样的结果：

#  kubectl rollout history deployment deployment-demo
deployments "deployment-demo"
REVISION    CHANGE-CAUSE
1        <none>

同时，我们会看到old ReplicaSet并未被删除：

# kubectl get rs
NAME                         DESIRED   CURRENT   READY     AGE
deployment-demo-1818355944   0         0         0         25m
deployment-demo-2775967987   4         4         4         13m

这些信息都存储在server端，方便回退！

Deployment下Pod的回退操作异常简单，通过rollout undo即可完成。rollout undo会将Deployment回退到record中的上一个revision（见上面rollout history的输出中有revision列）：

# kubectl rollout undo deployment deployment-demo
deployment "deployment-demo" rolled back

rs的状态又颠倒回来：

# kubectl get rs
NAME                         DESIRED   CURRENT   READY     AGE
deployment-demo-1818355944   4         4         4         28m
deployment-demo-2775967987   0         0         0         15m

查看update历史:

# kubectl rollout history deployment deployment-demo
deployments "deployment-demo"
REVISION    CHANGE-CAUSE
2        kubectl apply -f deployment-demo-v0.2.yaml --record
3        kubectl create -f deployment-demo-v0.1.yaml --record

可以看到history中最多保存了两个revision记录（这个Revision保存的数量应该可以设置）。

四、通过API实现的deployment rolling-update

我们的最终目标是通过API来实现service的rolling-update。Kubernetes提供了针对deployment的Restful API，包括：create、read、replace、delete、patch、rollback等。从这些API的字面意义上看，patch和rollback很可能符合我们的需要，我们需要验证一下。

我们将deployment置为v0.1版本，即：image: nginx:1.10.1，DEPLOYMENT_DEMO_VER=v0.1。然后我们尝试通过patch API将deployment升级为v0.2版本，由于patch API仅接收json格式的body内容，我们将 deployment-demo-v0.2.yaml转换为json格式：deployment-demo-v0.2.json。patch是局部更新，这里偷个懒儿，直接将全部deployment manifest内容发给了APIServer，让server自己做merge^0^。

执行下面curl命令：

# curl -H 'Content-Type:application/strategic-merge-patch+json' -X PATCH --data @deployment-demo-v0.2.json http://localhost:8080/apis/extensions/v1beta1/namespaces/default/deployments/deployment-demo

这个命令输出一个merge后的Deployment json文件，由于内容太多，这里就不贴出来了，内容参见：patch-api-output.txt。

跟踪命令执行时的deployment状态，我们可以看到该命令生效了：新旧两个rs的Scale值在此消彼长，两个版本的Pod在交替提供服务。

# kubectl get rs
NAME                         DESIRED   CURRENT   READY     AGE
deployment-demo-1818355944   3         3         3         12h
deployment-demo-2775967987   2         2         2         12h

# curl  -I http://10.109.173.225:80
HTTP/1.1 200 OK
Server: nginx/1.10.1
... ...

# curl  -I http://10.109.173.225:80
HTTP/1.1 200 OK
Server: nginx/1.11.9
... ...

# curl  -I http://10.109.173.225:80
HTTP/1.1 200 OK
Server: nginx/1.10.1
... ...

不过通过这种方式update后，通过rollout history查看到的历史就有些“不那么精确了”：

#kubectl rollout history deployment deployment-demo
deployments "deployment-demo"
REVISION    CHANGE-CAUSE
8       kubectl create -f deployment-demo-v0.1.yaml --record
9        kubectl create -f deployment-demo-v0.1.yaml --record

目前尚无好的方法。但rolling update的确是ok了。

Patch API支持三种类型的Content-type：json-patch+json、strategic-merge-patch+json和merge-patch+json。对于后面两种，从测试效果来看，都一样。但json-patch+json这种类型在测试的时候一直报错：

# curl -H 'Content-Type:application/json-patch+json' -X PATCH --data @deployment-demo-v0.2.json http://localhost:8080/apis/extensions/v1beta1/namespaces/default/deployments/deployment-demo
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "json: cannot unmarshal object into Go value of type jsonpatch.Patch",
  "code": 500
}

kubectl patch子命令似乎使用的是strategic-merge-patch+json。源码中也没有过多说明三种方式的差别：

//pkg/kubectl/cmd/patch.go
func getPatchedJSON(patchType api.PatchType, originalJS, patchJS []byte, obj runtime.Object) ([]byte, error) {
    switch patchType {
    case api.JSONPatchType:
        patchObj, err := jsonpatch.DecodePatch(patchJS)
        if err != nil {
            return nil, err
        }
        return patchObj.Apply(originalJS)

    case api.MergePatchType:
        return jsonpatch.MergePatch(originalJS, patchJS)

    case api.StrategicMergePatchType:
        return strategicpatch.StrategicMergePatchData(originalJS, patchJS, obj)

    default:
        // only here as a safety net - go-restful filters content-type
        return nil, fmt.Errorf("unknown Content-Type header for patch: %v", patchType)
    }
}

// DecodePatch decodes the passed JSON document as an RFC 6902 patch.

// MergePatch merges the patchData into the docData.

// StrategicMergePatch applies a strategic merge patch. The patch and the original document
// must be json encoded content. A patch can be created from an original and a modified document
// by calling CreateStrategicMergePatch.

接下来，我们使用deployment rollback API实现deployment的rollback。我们创建一个deployment-demo-rollback.json文件作为请求的内容：

//deployment-demo-rollback.json
{
        "name" : "deployment-demo",
        "rollbackTo" : {
                "revision" : 0
        }
}

revision:0 表示回退到上一个revision。执行下面命令实现rollback：

# curl -H 'Content-Type:application/json' -X POST --data @deployment-demo-rollback.json http://localhost:8080/apis/extensions/v1beta1/namespaces/default/deployments/deployment-demo/rollback
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "rollback request for deployment \"deployment-demo\" succeeded",
  "code": 200
}

# kubectl describe deployment/deployment-demo
... ...
Events:
  FirstSeen    LastSeen    Count    From                SubObjectPath    Type        Reason            Message
  ---------    --------    -----    ----                -------------    --------    ------            -------
... ...
 27s        27s        1    {deployment-controller }            Normal        DeploymentRollback    Rolled back deployment "deployment-demo" to revision 1
... ...

通过查看deployment状态可以看出rollback成功了。但这个API的response似乎有些bug，明明是succeeded了(code:200)，但status却是”Failure”。

如果你在patch或rollback过程中还遇到什么其他问题，可以通过kubectl describe deployment/deployment-demo 查看输出的Events中是否有异常提示。

五、小结

从上面的实验来看，通过Kubernetes提供的API是可以实现Service中Pods的rolling-update的，但这更适用于无状态的Service。对于那些有状态的Service（通过PetSet或是1.5版本后的Stateful Set实现的），这么做是否还能满足要求还不能确定。由于暂时没有环境，这方面尚未测试。

上述各个manifest的源码可以在这里下载到。