标签 Kubernetes 下的文章

一步步打造基于Kubeadm的高可用Kubernetes集群-第二部分

续接上文

五、第三步:启动emei、wudang上的apiserver

跨三个node的etcd cluster已经建成并完成了数据同步,下面进行ha cluster改造的重要一步:启动wudang、emei上的apiserver

1、启动emei、wudang上的apiserver

以shaolin node上的/etc/kubernetes/manifests/kube-apiserver.yaml为副本,制作emei、wudang上的kube-apiserver.yaml:

唯一需要变动的就是- --advertise-address这个option的值:

wudang:

- --advertise-address=10.24.138.208

emei:

- --advertise-address=10.27.52.72

在各自node上将kube-apiserver.yaml放入/etc/kubernetes/manifests中,各自node上的kubelet将会启动kube-apiserver并且各个apiserver默认连接本节点的etcd:

root@emei:~# pods
NAMESPACE     NAME                              READY     STATUS    RESTARTS   AGE       IP              NODE
... ...
kube-system   kube-apiserver-emei               1/1       Running   0          1d        10.27.52.72     emei
kube-system   kube-apiserver-shaolin            1/1       Running   0          1d        10.27.53.32     shaolin
kube-system   kube-apiserver-wudang             1/1       Running   0          2d        10.24.138.208   wudang

2、将emei、wudang上的kubelet改为连接自己所在节点的apiserver

所有apiserver都启动了。wudang、emei上的kubelet也应该连接自己节点的apiserver了!修改各自的/etc/kubernetes/kubelet.conf,修改server配置项:

wudang:

server: https://10.24.138.208:6443

emei:

server: https://10.27.52.72:6443

各自重启kubelet:

以wudang为例:

root@wudang:~# systemctl daemon-reload
root@wudang:~# systemctl restart kubelet

不过,问题出现了!查看重启的kubelet日志:

root@wudang:~# journalctl -u kubelet -f
-- Logs begin at Mon 2017-05-08 15:12:01 CST. --
May 11 14:33:27 wudang kubelet[8794]: I0511 14:33:27.919223    8794 kubelet_node_status.go:230] Setting node annotation to enable volume controller attach/detach
May 11 14:33:27 wudang kubelet[8794]: I0511 14:33:27.921166    8794 kubelet_node_status.go:77] Attempting to register node wudang
May 11 14:33:27 wudang kubelet[8794]: E0511 14:33:27.926865    8794 kubelet_node_status.go:101] Unable to register node "wudang" with API server: Post https://10.24.138.208:6443/api/v1/nodes: x509: certificate is valid for 10.96.0.1, 10.27.53.32, not 10.24.138.208
May 11 14:33:28 wudang kubelet[8794]: E0511 14:33:28.283258    8794 event.go:208] Unable to write event: 'Post https://10.24.138.208:6443/api/v1/namespaces/default/events: x509: certificate is valid for 10.96.0.1, 10.27.53.32, not 10.24.138.208' (may retry after sleeping)
May 11 14:33:28 wudang kubelet[8794]: E0511 14:33:28.499209    8794 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/kubelet.go:390: Failed to list *v1.Node: Get https://10.24.138.208:6443/api/v1/nodes?fieldSelector=metadata.name%3Dwudang&resourceVersion=0: x509: certificate is valid for 10.96.0.1, 10.27.53.32, not 10.24.138.208
May 11 14:33:28 wudang kubelet[8794]: E0511 14:33:28.504593    8794 reflector.go:190] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Failed to list *v1.Pod: Get https://10.24.138.208:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dwudang&resourceVersion=0: x509: certificate is valid for 10.96.0.1, 10.27.53.32, not 10.24.138.208

从错误日志判断来看,似乎是wudang上的kubelet在与同一节点上的kube-apiserver通信过程中,发现这个apiserver返回的tls证书是属于10.27.53.32的,即shaolin node上的apiserver的,而不是wudang node上的apiserver的,于是报了错!问题的原因很明了,因为Wudang上的kube-apiserver用的apiserver.crt的确是从shaolin node上copy过来的。也就是说要解决这个问题,我们需要为wudang、emei两个node上的apiserver各自生成自己的数字证书。

我们先来查看一下shaolin上的apiserver.crt内容是什么样子的:

root@shaolin:/etc/kubernetes/pki# openssl x509 -noout -text -in apiserver.crt

Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN=kubernetes

Subject: CN=kube-apiserver

X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication
            X509v3 Subject Alternative Name:
                DNS:shaolin, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, IP Address:10.96.0.1, IP Address:10.27.53.32

我们看到证书使用到了x509v3的扩展功能:subject alternative name,并且指定了多个value。我们为wudang、emei生成的apiserver.crt也应该如此。如何做呢?好在我们有整个集群的ca.key和ca.crt,可以用来签署证书请求。以wudang node为例,我们来为wudang node上的apiserver生成apiserver-wudang.key和apiserver-wudang.crt:

//生成2048位的密钥对
root@wudang:~# openssl genrsa -out apiserver-wudang.key 2048

//生成证书签署请求文件
root@wudang:~# openssl req -new -key apiserver-wudang.key -subj "/CN=kube-apiserver," -out apiserver-wudang.csr

// 编辑apiserver-wudang.ext文件,内容如下:
subjectAltName = DNS:wudang,DNS:kubernetes,DNS:kubernetes.default,DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, IP:10.96.0.1, IP:10.24.138.208

// 使用ca.key和ca.crt签署上述请求
root@wudang:~# openssl x509 -req -in apiserver-wudang.csr -CA /etc/kubernetes/pki/ca.crt -CAkey /etc/kubernetes/pki/ca.key -CAcreateserial -out apiserver-wudang.key.crt -days 365 -extfile apiserver-wudang.ext
Signature ok
subject=/CN=10.24.138.208
Getting CA Private Key

//查看新生成的证书:
root@wudang:~# openssl x509 -noout -text -in apiserver-wudang.crt
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 16019625340257831745 (0xde51245f10ea0b41)
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN=kubernetes
        Validity
            Not Before: May 12 08:40:40 2017 GMT
            Not After : May 12 08:40:40 2018 GMT
        Subject: CN=kube-apiserver,
        Subject Public Key Info:
            ... ...
        X509v3 extensions:
            X509v3 Subject Alternative Name:
                DNS:wudang, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, IP Address:10.96.0.1, IP Address:10.24.138.208

将apiserver-wudang.key和apiserver-wudang.crt放入/etc/kubernetes/pki目录下,修改kube-apiserver.yaml文件:

// /etc/kubernetes/pki
- --tls-cert-file=/etc/kubernetes/pki/apiserver-wudang.crt
- --tls-private-key-file=/etc/kubernetes/pki/apiserver-wudang.key

kube-apiserver重启后,再来查看kubelet日志,你会发现kubelet运行一切ok了。emei节点也要进行同样的操作。

至此,整个集群的状态示意图如下:

img{512x368}

六、第四步:启动emei、wudang上的kube-controller-manager和kube-scheduler

这一步我们只需要将shaolin node上的/etc/kubernetes/manifests中的kube-controller-manager.yaml和kube-scheduler.yaml拷贝到wudang、emei两个node的相应目录下即可:

root@emei:~/kubernetes-conf-shaolin/manifests# pods
NAMESPACE     NAME                              READY     STATUS    RESTARTS   AGE       IP              NODE
... ...
kube-system   kube-controller-manager-emei      1/1       Running   0          8s        10.27.52.72     emei
kube-system   kube-controller-manager-shaolin   1/1       Running   3          1d        10.27.53.32     shaolin
kube-system   kube-controller-manager-wudang    1/1       Running   0          1m        10.24.138.208   wudang
... ...
kube-system   kube-scheduler-emei               1/1       Running   0          15s       10.27.52.72     emei
kube-system   kube-scheduler-shaolin            1/1       Running   3          1d        10.27.53.32     shaolin
kube-system   kube-scheduler-wudang             1/1       Running   0          3m        10.24.138.208   wudang
... ...

查看一下各个node下kcm和scheduler的日志:

root@wudang:~/demo# kubectl logs -f kube-controller-manager-emei -n kube-system
I0511 07:34:53.804831       1 leaderelection.go:179] attempting to acquire leader lease...

root@wudang:~/demo# kubectl logs -f kube-controller-manager-wudang -n kube-system
I0511 07:33:20.725669       1 leaderelection.go:179] attempting to acquire leader lease...

root@wudang:~/demo# kubectl logs -f kube-scheduler-emei -n kube-system
I0511 07:34:45.711032       1 leaderelection.go:179] attempting to acquire leader lease...

root@wudang:~/demo# kubectl logs -f kube-scheduler-wudang -n kube-system
I0511 07:31:35.077090       1 leaderelection.go:179] attempting to acquire leader lease...

root@wudang:~/demo# kubectl logs -f kube-scheduler-shaolin -n kube-system

I0512 08:55:30.838806       1 event.go:217] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"my-nginx-2267614806-v1dst", UID:"c075c6c7-36f0-11e7-9c66-00163e000c7f", APIVersion:"v1", ResourceVersion:"166279", FieldPath:""}): type: 'Normal' reason: 'Scheduled' Successfully assigned my-nginx-2267614806-v1dst to emei
I0512 08:55:30.843104       1 event.go:217] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"my-nginx-2267614806-drnzv", UID:"c075da9f-36f0-11e7-9c66-00163e000c7f", APIVersion:"v1", ResourceVersion:"166278", FieldPath:""}): type: 'Normal' reason: 'Scheduled' Successfully assigned my-nginx-2267614806-drnzv to wudang
I0512 09:13:21.121864       1 event.go:217] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"my-nginx-2267614806-ld1dr", UID:"3e73d350-36f3-11e7-9c66-00163e000c7f", APIVersion:"v1", ResourceVersion:"168070", FieldPath:""}): type: 'Normal' reason: 'Scheduled' Successfully assigned my-nginx-2267614806-ld1dr to wudang
I0512 09:13:21.124295       1 event.go:217] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"my-nginx-2267614806-cmmkh", UID:"3e73c8b2-36f3-11e7-9c66-00163e000c7f", APIVersion:"v1", ResourceVersion:"168071", FieldPath:""}): type: 'Normal' reason: 'Scheduled' Successfully assigned my-nginx-2267614806-cmmkh to emei

可以看出,当前shaolin node上的kcm和scheduler是leader。

至此,整个集群的状态示意图如下:

img{512x368}

六、第五步:将wudang、emei设置为master node

我们试着在wudang节点上创建一个pod:

// run-my-nginx.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: my-nginx
spec:
  replicas: 2
  template:
    metadata:
      labels:
        run: my-nginx
    spec:
      containers:
      - name: my-nginx
        image: nginx:1.10.1
        ports:
        - containerPort: 80

发现pod居然被调度到了wudang、emei节点上了!

NAMESPACE     NAME                              READY     STATUS    RESTARTS   AGE       IP              NODE
default       my-nginx-2267614806-drnzv         1/1       Running   0          5s        172.32.192.1    wudang
default       my-nginx-2267614806-v1dst         1/1       Running   0          5s        172.32.64.0     emei

emei、wudang并没有执行taint,为何能承载workload? 查看当前cluster的node状态:

root@wudang:~# kubectl get node --show-labels
NAME      STATUS    AGE       VERSION   LABELS
emei      Ready     1d        v1.6.2    beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=emei
shaolin   Ready     2d        v1.6.2    beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=shaolin,node-role.kubernetes.io/master=
wudang    Ready     1d        v1.6.2    beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=wudang

从label看到,status列并没有明确输出谁是master,这和1.5.1版本以前似乎不同。emei、wudang与shaolin唯一的不同就是shaolin有一个key: node-role.kubernetes.io/master。难道这个label是指示谁是master的?我们给wudang打上这个label:

root@wudang:~/demo# kubectl label node wudang node-role.kubernetes.io/master=
node "wudang" labeled
root@wudang:~/demo# kubectl get node --show-labels
NAME      STATUS    AGE       VERSION   LABELS
emei      Ready     1d        v1.6.2    beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=emei
shaolin   Ready     2d        v1.6.2    beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=shaolin,node-role.kubernetes.io/master=
wudang    Ready     1d        v1.6.2    beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=wudang,node-role.kubernetes.io/master=

再创建nginx pod,我们发现pod依旧分配在wudang、emei两个node上:

NAMESPACE     NAME                              READY     STATUS    RESTARTS   AGE       IP              NODE
default       my-nginx-2267614806-cmmkh         1/1       Running   0          5s        172.32.64.0     emei
default       my-nginx-2267614806-ld1dr         1/1       Running   0          5s        172.32.192.1    wudang

我们进一步查看并对比相关信息:

查看clustre-info:

wuddang node:
root@wudang:~/demo# kubectl cluster-info
Kubernetes master is running at https://10.24.138.208:6443 //wudang node:
KubeDNS is running at https://10.24.138.208:6443/api/v1/proxy/namespaces/kube-system/services/kube-dns

shaolin node:

root@shaolin:~/k8s-install/demo# kubectl cluster-info
Kubernetes master is running at https://10.27.53.32:6443
KubeDNS is running at https://10.27.53.32:6443/api/v1/proxy/namespaces/kube-system/services/kube-dns

查看详细node信息:

root@wudang:~# kubectl describe node/shaolin

Name:            shaolin
Role:
Labels:            beta.kubernetes.io/arch=amd64
            beta.kubernetes.io/os=linux
            kubernetes.io/hostname=shaolin
            node-role.kubernetes.io/master=
Annotations:        node.alpha.kubernetes.io/ttl=0
            volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:            node-role.kubernetes.io/master:NoSchedule

root@wudang:~# kubectl describe node/wudang

Name:            wudang
Role:
Labels:            beta.kubernetes.io/arch=amd64
            beta.kubernetes.io/os=linux
            kubernetes.io/hostname=wudang
            node-role.kubernetes.io/master=
Annotations:        node.alpha.kubernetes.io/ttl=0
            volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:            <none>

我们看到,在Taints属性里,shaolin node的值为 node-role.kubernetes.io/master:NoSchedule,而wudang node的为空。初步猜测这就是wudang被分配pod的原因了。

我们设置wudang node的Taints属性:

root@wudang:~# kubectl taint nodes wudang node-role.kubernetes.io/master=:NoSchedule
node "wudang" tainted

root@wudang:~# kubectl describe node/wudang|more
Name:            wudang
Role:
Labels:            beta.kubernetes.io/arch=amd64
            beta.kubernetes.io/os=linux
            kubernetes.io/hostname=wudang
            node-role.kubernetes.io/master=
Annotations:        node.alpha.kubernetes.io/ttl=0
            volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:            node-role.kubernetes.io/master:NoSchedule

再创建nginx deployment:

root@wudang:~/demo# pods
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
default my-nginx-2267614806-hmz5d 1/1 Running 0 14s 172.32.64.0 emei
default my-nginx-2267614806-kkt79 1/1 Running 0 14s 172.32.64.1 emei

发现pod全部分配到emei上了!

接下来按同样操作对emei的taints属性进行设置,这里就不赘述了。

到目前为止,整个k8s cluster的状态如下示意图:
img{512x368}

七、第六步:Load Balance

Kubernetes HA cluster的建立得益于kube-apiserver的无状态,按照最终目标,在三个kube-apiserver的前面是要假设一个负载均衡器的。考虑到apiserver对外通过https暴露服务,在七层做lb需要将证书配置在lb上,这改动较大;这里我们用四层lb。在这里,我们仅是搭建一个简易的demo性质的基于nginx的四层lb,在生产环境,如果你有硬件lb或者你所在的cloud provider提供类似lb服务,可以直接使用。

演示方便起见,我直接在emei上安装一个nginx(注意一定要安装支持–with-stream支持的nginx,可以通过-V查看):

root@emei:~# nginx -V
nginx version: nginx/1.10.3 (Ubuntu)
built with OpenSSL 1.0.2g  1 Mar 2016
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_geoip_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module --with-http_v2_module --with-http_sub_module --with-http_xslt_module --with-stream --with-stream_ssl_module --with-mail --with-mail_ssl_module --with-threads

我这里直接修改nginx的默认配置文件:/etc/nginx/nginx.conf,添加如下配置:

// /etc/nginx/nginx.conf
... ...
stream {
    upstream apiserver {
        server 10.27.53.32:6443 weight=5 max_fails=3 fail_timeout=30s;
        server 10.24.138.208:6443 weight=5 max_fails=3 fail_timeout=30s;
        server 10.27.52.72:6443 weight=5 max_fails=3 fail_timeout=30s;
    }

    server {
        listen 8443;
        proxy_connect_timeout 1s;
        proxy_timeout 3s;
        proxy_pass apiserver;
    }
}
... ...

nginx -s reload后,配置生效!

我们用wudang上的kubectl来访问一下lb,我们先来做一下配置

root@wudang:~# cp /etc/kubernetes/admin.conf ./
root@wudang:~# mv admin.conf admin-lb.conf
root@wudang:~# vi admin-lb.conf

修改admin-lb.conf中的:
server: https://10.27.52.72:8443

export KUBECONFIG=~/admin-lb.conf

执行下面命令:

root@wudang:~# kubectl get pods -n kube-system
Unable to connect to the server: x509: certificate is valid for 10.96.0.1, 10.27.53.32, not 10.27.52.72
root@wudang:~# kubectl get pods -n kube-system
Unable to connect to the server: x509: certificate is valid for 10.24.138.208, not 10.27.52.72

可以看到上述两个请求被lb分别转到了shaolin和wudang两个node的apiserver上,客户端在校验server端发送的证书时认为server端”有诈“,于是报了错!怎么解决呢?在上面我们为每个apiserver生成apiserver.crt时,我们在subject alternative name值中填写了多个域名,我们用域名来作为client端访问的目的地址,再来看看:

修改~/admin-lb.conf中的:

server: https://kubernetes.default.svc:8443

在wudang node的/etc/hosts中添加:

10.27.52.72 kubernetes.default.svc

再访问集群:

root@wudang:~# kubectl get pods -n kube-system
NAME                              READY     STATUS    RESTARTS   AGE
etcd-emei                         1/1       Running   0          1d
etcd-shaolin                      1/1       Running   0          1d
etcd-wudang                       1/1       Running   0          4d
kube-apiserver-emei               1/1       Running   0          1d
... ...

这里只是一个demo,在您自己的环境里如何将lb与apiserver配合在一起,方法有很多种,需要根据实际情况具体确定。

到目前为止,整个k8s cluster的状态如下示意图:
img{512x368}

八、第七步:kube-proxy配置修改

kube-proxy是一个由一个daemonset创建的:

root@wudang:~# kubectl get ds -n kube-system
NAME         DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE-SELECTOR   AGE
kube-proxy   3         3         3         3            3           <none>          5d

并且kube-proxy的配置是由一个configmap提供的,并未在外部留有修改的口,比如类似kube-scheduler.yaml或.conf那样:

root@shaolin:~# kubectl get configmap -n kube-system
NAME                                 DATA      AGE
kube-proxy                           1         5d

root@shaolin:~# kubectl get configmap/kube-proxy -n kube-system -o yaml
apiVersion: v1
data:
  kubeconfig.conf: |
    apiVersion: v1
    kind: Config
    clusters:
    - cluster:
        certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        server: https://10.27.53.32:6443
      name: default
    contexts:
    - context:
        cluster: default
        namespace: default
        user: default
      name: default
    current-context: default
    users:
    - name: default
      user:
        tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
  creationTimestamp: 2017-05-10T01:48:28Z
  labels:
    app: kube-proxy
  name: kube-proxy
  namespace: kube-system
  resourceVersion: "81"
  selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
  uid: c34f7d5f-3522-11e7-8f77-00163e000c7f

在这个默认的configmap中,kube-proxy连接的cluster的server地址硬编码为 https://10.27.53.32:6443,即shaolin node上apiserver的公共接口地址。这样一旦shaolin node宕掉了,其他node上的kube-proxy将无法连接到apiserver进行正常操作。而kube-proxy pod自身又是使用的是host network,因此我们需要将server地址配置为lb的地址,这样保证各node上kube-proxy的高可用。

我们根据上述输出的configmap的内容进行修改,并更新kube-proxy-configmap的内容:

root@shaolin:~# kubectl get configmap/kube-proxy -n kube-system -o yaml > kube-proxy-configmap.yaml

修改kube-proxy-configmap.yaml中的server为:

server: https://kubernetes.default.svc:6443

保存并更新configmap: kube-proxy:

root@shaolin:~# kubectl apply -f kube-proxy-configmap.yaml
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
configmap "kube-proxy" configured

root@shaolin:~# kubectl get configmap/kube-proxy -n kube-system -o yaml
apiVersion: v1
data:
  kubeconfig.conf: |
    apiVersion: v1
    kind: Config
    clusters:
    - cluster:
        certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        server: https://kubernetes.default.svc:6443
      name: default
... ...

重启kube-proxy(kubectl delete pods/kube-proxy-xxx -n kube-system)后,查看kube-proxy的日志:

root@shaolin:~# kubectl logs -f kube-proxy-h5sg8 -n kube-system
I0515 13:57:03.526032       1 server.go:225] Using iptables Proxier.
W0515 13:57:03.621532       1 proxier.go:298] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0515 13:57:03.621578       1 server.go:249] Tearing down userspace rules.
I0515 13:57:03.738015       1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0515 13:57:03.741824       1 conntrack.go:66] Setting conntrack hashsize to 32768
I0515 13:57:03.742555       1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0515 13:57:03.742731       1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600

九、小结

到这里,我们在第一部分中的最终思路方案已经实现了。不过这两篇文章对kubernetes ha cluster的打造还仅限于探索阶段,可能还有一些深层次的问题没有暴露出来,因此不建议在生产环境中采用。kubeadm在后续的版本中必然加入对k8s ha cluster的支持,那个时候,搭建一套可用于生产环境的HA cluster将不再这么麻烦了!

一步步打造基于Kubeadm的高可用Kubernetes集群-第一部分

Kubernetes集群的核心是其master node,但目前默认情况下master node只有一个,一旦master node出现问题,Kubernetes集群将陷入“瘫痪”,对集群的管理、Pod的调度等均将无法实施,即便此时某些用户的Pod依旧可以正常运行。这显然不能符合我们对于运行于生产环境下的Kubernetes集群的要求,我们需要一个高可用的Kubernetes集群。

不过,目前Kubernetes官方针对构建高可用(high-availability)的集群的支持还是非常有限的,只是针对少数cloud-provider提供了粗糙的部署方法,比如:使用kube-up.sh脚本在GCE上使用kops在AWS上等等。

高可用Kubernetes集群是Kubernetes演进的必然方向,官方在“Building High-Availability Clusters”一文中给出了当前搭建HA cluster的粗略思路。Kubeadm也将HA列入了后续版本的里程碑计划,并且已经出了一版使用kubeadm部署高可用cluster的方法提议草案

在kubeadm没有真正支持自动bootstrap的HA Kubernetes cluster之前,如果要搭建一个HA k8s cluster,我们应该如何做呢?本文将探索性地一步一步的给出打造一个HA K8s cluster的思路和具体步骤。不过需要注意的是:这里搭建的HA k8s cluser仅在实验室中测试ok,还并未在生产环境中run过,因此在某些未知的细节方面可能存在思路上的纰漏

一、测试环境

高可用Kubernetes集群主要就是master node的高可用,因此,我们申请了三台美国西部区域的阿里云ECS作为三个master节点。通过hostnamectl将这三个节点的static hostname分别改为shaolin、wudang和emei:

shaolin: 10.27.53.32
wudang: 10.24.138.208
emei: 10.27.52.72

三台主机运行的都是Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-63-generic x86_64),使用root用户。

Docker版本如下:

root@shaolin:~# docker version
Client:
 Version:      17.03.1-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Mon Mar 27 17:14:09 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.1-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Mon Mar 27 17:14:09 2017
 OS/Arch:      linux/amd64
 Experimental: false

Ubuntu上Docker CE版本的安装步骤可以参看这里,由于我的服务器在美西,因此不存在”墙”的问题。对于主机在国内的朋友,你需要根据安装过程中是否输出错误日志自行决定是否需要配置一个加速器。另外,这里用的docker版本有些新,Kubernetes官网上提及最多的、兼容最好的还是docker 1.12.x版本,你也可以直接安装这个版本。

二、Master节点高可用的思路

通过对single-master node的探索,我们知道master节点上运行着如下几个Kubernetes组件:

  • kube-apiserver:集群核心,集群API接口、集群各个组件通信的中枢;集群安全控制;
  • etcd:集群的数据中心;
  • kube-scheduler:集群Pod的调度中心;
  • kube-controller-manager:集群状态管理器,当集群状态与期望不同时,kcm会努力让集群恢复期望状态,比如:当一个pod死掉,kcm会努力新建一个pod来恢复对应replicas set期望的状态;
  • kubelet: kubernetes node agent,负责与node上的docker engine打交道;
  • kubeproxy: 每个node上一个,负责service vip到endpoint pod的流量转发,当前主要通过设置iptables规则实现。

Kubernetes集群的高可用就是master节点的高可用,master节点的高可用归根结底就是上述这些运行于master node上的组件的高可用。因此,我们的思路就是考量如何让这些组件高可用起来!综合Kubernetes官方提供的资料以及一些proposal draft,我们知道完全从头搭建的hard way形式似乎不甚理智^0^,将一个由kubeadm创建的k8s cluster改造为一个ha的k8s cluster似乎更可行。下面是我的思路方案:

img{512x368}

前面提到过,我们的思路是基于kubeadm启动的kubernetes集群,通过逐步修改配置或替换,形成最终HA的k8s cluster。上图是k8s ha cluster的最终图景,我们可以看到:

  • kube-apiserver:得益于apiserver的无状态,每个master节点的apiserver都是active的,并处理来自Load Balance分配过来的流量;
  • etcd:状态的集中存储区。通过将多个master节点上的etcd组成一个etcd集群,使得apiserver共享集群状态和数据;
  • kube-controller-manager:kcm自带leader-elected功能,多个master上的kcm构成一个集群,但只有被elected为leader的kcm在工作。每个master节点上的kcm都连接本node上的apiserver;
  • kube-scheduler:scheduler自带leader-elected功能,多个master上的scheduler构成一个集群,但只有被elected为leader的scheduler在工作。每个master节点上的scheduler都连接本node上的apiserver;
  • kubelet: 由于master上的各个组件均以container的形式呈现,因此不承担workload的master节点上的kubelet更多是用来管理这些master组件容器。每个master节点上的kubelet都连接本node上的apiserver;
  • kube-proxy: 由于master节点不承载workload,因此master节点上的kube-proxy同样仅服务于一些特殊的服务,比如: kube-dns等。由于kubeadm下kube-proxy没有暴露出可供外部调整的配置,因此kube-proxy需要连接Load Balance暴露的apiserver的端口。

接下来,我们就来一步步按照我们的思路,对kubeadm启动的single-master node k8s cluster进行改造,逐步演进到我们期望的ha cluster状态。

三、第一步:使用kubeadm安装single-master k8s cluster

距离第一次使用kubeadm安装kubernetes 1.5.1集群已经有一些日子了,kubernetes和kubeadm都有了一些变化。当前kubernetes和kubeadm的最新release版都是1.6.2版本:

root@wudang:~# kubeadm version
kubeadm version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:22:08Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

root@wudang:~# docker images
REPOSITORY                                               TAG                 IMAGE ID            CREATED             SIZE
gcr.io/google_containers/kube-proxy-amd64                v1.6.2              7a1b61b8f5d4        3 weeks ago         109 MB
gcr.io/google_containers/kube-controller-manager-amd64   v1.6.2              c7ad09fe3b82        3 weeks ago         133 MB
gcr.io/google_containers/kube-apiserver-amd64            v1.6.2              e14b1d5ee474        3 weeks ago         151 MB
gcr.io/google_containers/kube-scheduler-amd64            v1.6.2              b55f2a2481b9        3 weeks ago         76.8 MB
... ...

虽然kubeadm版本有更新,但安装过程没有太多变化,这里仅列出一些关键步骤,一些详细信息输出就在这里省略了。

我们先在shaolin node上安装相关程序文件:

root@shaolin:~# apt-get update && apt-get install -y apt-transport-https

root@shaolin:~# curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
OK

root@shaolin:~# cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
> deb http://apt.kubernetes.io/ kubernetes-xenial main
> EOF

root@shaolin:~# apt-get update

root@shaolin:~# apt-get install -y kubelet kubeadm kubectl kubernetes-cni

接下来,使用kubeadm启动集群。注意:由于在aliyun上flannel 网络插件一直不好用,这里还是使用weave network

root@shaolin:~/k8s-install# kubeadm init --apiserver-advertise-address 10.27.53.32
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.6.2
[init] Using Authorization mode: RBAC
[preflight] Running pre-flight checks
[preflight] WARNING: docker version is greater than the most recently validated version. Docker version: 17.03.1-ce. Max validated version: 1.12
[preflight] Starting the kubelet service
[certificates] Generated CA certificate and key.
[certificates] Generated API server certificate and key.
[certificates] API Server serving cert is signed for DNS names [shaolin kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.27.53.32]
[certificates] Generated API server kubelet client certificate and key.
[certificates] Generated service account token signing key and public key.
[certificates] Generated front-proxy CA certificate and key.
[certificates] Generated front-proxy client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[apiclient] Created API client, waiting for the control plane to become ready
[apiclient] All control plane components are healthy after 17.045449 seconds
[apiclient] Waiting for at least one node to register
[apiclient] First node has registered after 5.008588 seconds
[token] Using token: a8dd42.afdb86eda4a8c987
[apiconfig] Created RBAC rules
[addons] Created essential addon: kube-proxy
[addons] Created essential addon: kube-dns

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run (as a regular user):

  sudo cp /etc/kubernetes/admin.conf $HOME/
  sudo chown $(id -u):$(id -g) $HOME/admin.conf
  export KUBECONFIG=$HOME/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:

http://kubernetes.io/docs/admin/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join --token abcdefghijklmn 10.27.53.32:6443

root@shaolin:~/k8s-install# pods
NAMESPACE     NAME                              READY     STATUS    RESTARTS   AGE       IP            NODE
kube-system   etcd-shaolin                      1/1       Running   0          34s       10.27.53.32   shaolin
kube-system   kube-apiserver-shaolin            1/1       Running   0          35s       10.27.53.32   shaolin
kube-system   kube-controller-manager-shaolin   1/1       Running   0          23s       10.27.53.32   shaolin
kube-system   kube-dns-3913472980-tkr91         0/3       Pending   0          1m        <none>
kube-system   kube-proxy-bzvvk                  1/1       Running   0          1m        10.27.53.32   shaolin
kube-system   kube-scheduler-shaolin            1/1       Running   0          46s       10.27.53.32   shaolin

k8s 1.6.2版本的weave network的安装与之前稍有不同,因为k8s 1.6启用了更为安全的机制,默认采用RBAC对运行于cluster上的workload进行有限授权。我们要使用的weave network plugin的yaml为weave-daemonset-k8s-1.6.yaml

root@shaolin:~/k8s-install# kubectl apply -f https://git.io/weave-kube-1.6
clusterrole "weave-net" created
serviceaccount "weave-net" created
clusterrolebinding "weave-net" created
daemonset "weave-net" created

如果你的weave pod启动失败且原因类似如下日志:

Network 172.30.0.0/16 overlaps with existing route 172.16.0.0/12 on host.

你需要修改你的weave network的 IPALLOC_RANGE(这里我使用了172.32.0.0/16):

//weave-daemonset-k8s-1.6.yaml
... ...
spec:
  template:
    metadata:
      labels:
        name: weave-net
    spec:
      hostNetwork: true
      hostPID: true
      containers:
        - name: weave
          env:
            - name: IPALLOC_RANGE
              value: 172.32.0.0/16
... ...

master安装ok后,我们将wudang、emei两个node作为k8s minion node,来测试一下cluster的搭建是否是正确的,同时这一过程也在wudang、emei上安装上了kubelet和kube-proxy,这两个组件在后续的“改造”过程中是可以直接使用的:

以emei node为例:

root@emei:~# kubeadm join --token abcdefghijklmn 10.27.53.32:6443
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[preflight] Running pre-flight checks
[preflight] WARNING: docker version is greater than the most recently validated version. Docker version: 17.03.1-ce. Max validated version: 1.12
[preflight] Starting the kubelet service
[discovery] Trying to connect to API Server "10.27.53.32:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.27.53.32:6443"
[discovery] Cluster info signature and contents are valid, will use API Server "https://10.27.53.32:6443"
[discovery] Successfully established connection with API Server "10.27.53.32:6443"
[bootstrap] Detected server version: v1.6.2
[bootstrap] The server supports the Certificates API (certificates.k8s.io/v1beta1)
[csr] Created API client to obtain unique certificate for this node, generating keys and certificate signing request
[csr] Received signed certificate from the API server, generating KubeConfig...
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"

Node join complete:
* Certificate signing request sent to master and response
  received.
* Kubelet informed of new secure connection details.

Run 'kubectl get nodes' on the master to see this machine join.

建立一个多pod的nginx服务,测试一下集群网络是否通!这里就不赘述了。

安装后的single-master kubernetes cluster的状态就如下图所示:

img{512x368}

四、第二步:搭建etcd cluster for ha k8s cluster

k8s集群状态和数据都存储在etcd中,高可用的k8s集群离不开高可用的etcd cluster。我们需要为最终的ha k8s cluster提供一个ha的etcd cluster,如何做呢?

当前k8s cluster中,shaolin master node上的etcd存储着k8s集群的所有数据和状态。我们需要在wudang和emei两个节点上也建立起etcd实例,与现存在 etcd共同构建成为高可用的且存储有cluster数据和状态的集群。我们将这一过程再细化为几个小步骤:

0、在emei、wudang两个节点上启动kubelet服务

etcd cluster可以采用完全独立的、与k8s组件无关的建立方法。不过这里我采用的是和master一样的方式,即采用由wudang和emei两个node上kubelet启动的etcd作为etcd cluster的两个member。此时,wudang和emei两个node的角色是k8s minion node,我们需要首先清理一下这两个node的数据:

root@shaolin:~/k8s-install # kubectl drain wudang --delete-local-data --force --ignore-daemonsets
node "wudang" cordoned
WARNING: Ignoring DaemonSet-managed pods: kube-proxy-mxwp3, weave-net-03jbh; Deleting pods with local storage: weave-net-03jbh
pod "my-nginx-2267614806-fqzph" evicted
node "wudang" drained

root@wudang:~# kubeadm reset
[preflight] Running pre-flight checks
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Removing kubernetes-managed containers
[reset] No etcd manifest found in "/etc/kubernetes/manifests/etcd.yaml", assuming external etcd.
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

root@shaolin:~/k8s-install # kubectl drain emei --delete-local-data --force --ignore-daemonsets
root@emei:~# kubeadm reset

root@shaolin:~/k8s-install# kubectl delete node/wudang
root@shaolin:~/k8s-install# kubectl delete node/emei

我们的小目标中:etcd cluster将由各个node上的kubelet自动启动;而kubelet则是由systemd在sys init时启动,且其启动配置如下:

root@wudang:~# cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--kubeconfig=/etc/kubernetes/kubelet.conf --require-kubeconfig=true"
Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true"
Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
Environment="KUBELET_DNS_ARGS=--cluster-dns=10.96.0.10 --cluster-domain=cluster.local"
Environment="KUBELET_AUTHZ_ARGS=--authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_EXTRA_ARGS

我们需要首先在wudang和emei node上将kubelet启动起来,我们以wudang node为例:

root@wudang:~# systemctl enable kubelet
root@wudang:~# systemctl start kubelet

查看kubelet service日志:

root@wudang:~# journalctl -u kubelet -f

May 10 10:58:41 wudang systemd[1]: Started kubelet: The Kubernetes Node Agent.
May 10 10:58:41 wudang kubelet[27179]: I0510 10:58:41.798507   27179 feature_gate.go:144] feature gates: map[]
May 10 10:58:41 wudang kubelet[27179]: error: failed to run Kubelet: invalid kubeconfig: stat /etc/kubernetes/kubelet.conf: no such file or directory
May 10 10:58:41 wudang systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
May 10 10:58:41 wudang systemd[1]: kubelet.service: Unit entered failed state.
May 10 10:58:41 wudang systemd[1]: kubelet.service: Failed with result 'exit-code'.

kubelet启动失败,因为缺少/etc/kubernetes/kubelet.conf这个配置文件。我们需要向shaolin node求援,我们需要将shaolin node上的同名配置文件copy到wudang和emei两个node下面,当然同时需要copy的还包括shaolin node上的/etc/kubernetes/pki目录:

root@wudang:~# kubectl --kubeconfig=/etc/kubernetes/kubelet.conf config view
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: REDACTED
    server: https://10.27.53.32:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: system:node:shaolin
  name: system:node:shaolin@kubernetes
current-context: system:node:shaolin@kubernetes
kind: Config
preferences: {}
users:
- name: system:node:shaolin
  user:
    client-certificate-data: REDACTED
    client-key-data: REDACTED

root@wudang:~# ls /etc/kubernetes/pki
apiserver.crt  apiserver-kubelet-client.crt  ca.crt  ca.srl              front-proxy-ca.key      front-proxy-client.key  sa.pub
apiserver.key  apiserver-kubelet-client.key ca.key  front-proxy-ca.crt  front-proxy-client.crt  sa.key

systemctl daemon-reload; systemctl restart kubelet后,再查看kubelet service日志,你会发现kubelet起来了!

以wudang node为例:

root@wudang:~# journalctl -u kubelet -f
-- Logs begin at Mon 2017-05-08 15:12:01 CST. --
May 11 10:37:07 wudang kubelet[26907]: I0511 10:37:07.213529   26907 factory.go:54] Registering systemd factory
May 11 10:37:07 wudang kubelet[26907]: I0511 10:37:07.213674   26907 factory.go:86] Registering Raw factory
May 11 10:37:07 wudang kubelet[26907]: I0511 10:37:07.213813   26907 manager.go:1106] Started watching for new ooms in manager
May 11 10:37:07 wudang kubelet[26907]: I0511 10:37:07.216383   26907 oomparser.go:185] oomparser using systemd
May 11 10:37:07 wudang kubelet[26907]: I0511 10:37:07.217415   26907 manager.go:288] Starting recovery of all containers
May 11 10:37:07 wudang kubelet[26907]: I0511 10:37:07.285428   26907 manager.go:293] Recovery completed
May 11 10:37:07 wudang kubelet[26907]: I0511 10:37:07.344425   26907 kubelet_node_status.go:230] Setting node annotation to enable volume controller attach/detach
May 11 10:37:07 wudang kubelet[26907]: E0511 10:37:07.356188   26907 eviction_manager.go:214] eviction manager: unexpected err: failed GetNode: node 'wudang' not found
May 11 10:37:07 wudang kubelet[26907]: I0511 10:37:07.358402   26907 kubelet_node_status.go:77] Attempting to register node wudang
May 11 10:37:07 wudang kubelet[26907]: I0511 10:37:07.363083   26907 kubelet_node_status.go:80] Successfully registered node wudang

此时此刻,我们先让wudang、emei node上的kubelet先连着shaolin node上的apiserver。

1、在emei、wudang两个节点上建立一个etcd cluster

我们以shaolin node上的/etc/kubernetes/manifests/etcd.yaml为蓝本,修改出wudang和emei上的etcd.yaml,主要的变化在于containers:command部分:

wudang上的/etc/kubernetes/manifests/etcd.yaml:

spec:
  containers:
  - command:
    - etcd
    - --name=etcd-wudang
    - --initial-advertise-peer-urls=http://10.24.138.208:2380
    - --listen-peer-urls=http://10.24.138.208:2380
    - --listen-client-urls=http://10.24.138.208:2379,http://127.0.0.1:2379
    - --advertise-client-urls=http://10.24.138.208:2379
    - --initial-cluster-token=etcd-cluster
    - --initial-cluster=etcd-wudang=http://10.24.138.208:2380,etcd-emei=http://10.27.52.72:2380
    - --initial-cluster-state=new
    - --data-dir=/var/lib/etcd
    image: gcr.io/google_containers/etcd-amd64:3.0.17

emei上的/etc/kubernetes/manifests/etcd.yaml:

spec:
  containers:
  - command:
    - etcd
    - --name=etcd-emei
    - --initial-advertise-peer-urls=http://10.27.52.72:2380
    - --listen-peer-urls=http://10.27.52.72:2380
    - --listen-client-urls=http://10.27.52.72:2379,http://127.0.0.1:2379
    - --advertise-client-urls=http://10.27.52.72:2379
    - --initial-cluster-token=etcd-cluster
    - --initial-cluster=etcd-emei=http://10.27.52.72:2380,etcd-wudang=http://10.24.138.208:2380
    - --initial-cluster-state=new
    - --data-dir=/var/lib/etcd
    image: gcr.io/google_containers/etcd-amd64:3.0.17

将这两个文件分别放入各自node的/etc/kubernetes/manifests目录后,各自node上的kubelet将会自动将对应的etcd pod启动起来!

root@shaolin:~# pods
NAMESPACE     NAME                              READY     STATUS    RESTARTS   AGE       IP              NODE
kube-system   etcd-emei                         1/1       Running   0          11s       10.27.52.72     emei
kube-system   etcd-shaolin                      1/1       Running   0          25m       10.27.53.32     shaolin
kube-system   etcd-wudang                       1/1       Running   0          24s       10.24.138.208   wudang

我们查看一下当前etcd cluster的状态:

# etcdctl endpoint status --endpoints=10.27.52.72:2379,10.24.138.208:2379
10.27.52.72:2379, 6e80adf8cd57f826, 3.0.17, 25 kB, false, 17, 660
10.24.138.208:2379, f3805d1ab19c110b, 3.0.17, 25 kB, true, 17, 660

注:输出的列从左到右分别表示:endpoint URL, ID, version, database size, leadership status, raft term, and raft status.
因此,我们可以看出wudang(10.24.138.208)上的etcd被选为cluster leader了

我们测试一下etcd cluster,put一些key:

在wudang节点:(注意:export ETCDCTL_API=3)

root@wudang:~# etcdctl put foo bar
OK
root@wudang:~# etcdctl put foo1 bar1
OK
root@wudang:~# etcdctl get foo
foo
bar

在emei节点:

root@emei:~# etcdctl get foo
foo
bar

至此,当前kubernetes cluster的状态示意图如下:

img{512x368}

2、同步shaolin上etcd的数据到etcd cluster中

kubernetes 1.6.2版本默认使用3.x版本etcd。etcdctl 3.x版本提供了一个make-mirror功能用于在etcd cluster间同步数据,这样我们就可以通过etcdctl make-mirror将shaolin上etcd的k8s cluster数据同步到上述刚刚创建的etcd cluster中。在emei node上执行下面命令:

root@emei:~# etcdctl make-mirror --no-dest-prefix=true  127.0.0.1:2379  --endpoints=10.27.53.32:2379 --insecure-skip-tls-verify=true
... ...
261
302
341
380
420
459
498
537
577
616
655

... ...

etcdctl make-mirror每隔30s输出一次日志,不过通过这些日志无法看出来同步过程。并且etcdctl make-mirror似乎是流式同步:没有结束的边界。因此你需要手工判断一下数据是否都同步过去了!比如通过查看某个key,对比两边的差异的方式:

# etcdctl get --from-key /api/v2/registry/clusterrolebindings/cluster-admin

.. ..
compact_rev_key
122912

或者通过endpoint status命令查看数据库size大小,对比双方的size是否一致。一旦差不多了,就可以停掉make-mirror的执行了!

3、将shaolin上的apiserver连接的etcd改为连接etcd cluster,停止并删除shaolin上的etcd

修改shaolin node上的/etc/kubernetes/manifests/kube-apiserver.yaml,让shaolin上的kube0-apiserver连接到emei node上的etcd:

修改下面一行:
- --etcd-servers=http://10.27.52.72:2379

修改保存后,kubelet会自动重启kube-apiserver,重启后的kube-apiserver工作正常!

接下来,我们停掉并删除掉shaolin上的etcd(并删除相关数据存放目录):

root@shaolin:~# rm /etc/kubernetes/manifests/etcd.yaml
root@shaolin:~# rm -fr /var/lib/etcd

再查看k8s cluster当前pod,你会发现etcd-shaolin不见了。

至此,k8s集群的当前状态示意图如下:

img{512x368}

4、重新创建shaolin上的etcd ,并以member形式加入etcd cluster

我们首先需要在已存在的etcd cluster中添加etcd-shaolin这个member:

root@wudang:~/kubernetes-conf-shaolin/manifests# etcdctl member add etcd-shaolin --peer-urls=http://10.27.53.32:2380
Member 3184cfa57d8ef00c added to cluster 140cec6dd173ab61

然后,在shaolin node上基于原shaolin上的etcd.yaml文件进行如下修改:

// /etc/kubernetes/manifests/etcd.yaml
... ...
spec:
  containers:
  - command:
    - etcd
    - --name=etcd-shaolin
    - --initial-advertise-peer-urls=http://10.27.53.32:2380
    - --listen-peer-urls=http://10.27.53.32:2380
    - --listen-client-urls=http://10.27.53.32:2379,http://127.0.0.1:2379
    - --advertise-client-urls=http://10.27.53.32:2379
    - --initial-cluster-token=etcd-cluster
    - --initial-cluster=etcd-shaolin=http://10.27.53.32:2380,etcd-wudang=http://10.24.138.208:2380,etcd-emei=http://10.27.52.72:2380
    - --initial-cluster-state=existing
    - --data-dir=/var/lib/etcd
    image: gcr.io/google_containers/etcd-amd64:3.0.17

修改保存后,kubelet将自动拉起etcd-shaolin:

root@shaolin:~/k8s-install# pods
NAMESPACE     NAME                              READY     STATUS    RESTARTS   AGE       IP              NODE
kube-system   etcd-emei                         1/1       Running   0          3h        10.27.52.72     emei
kube-system   etcd-shaolin                      1/1       Running   0          8s        10.27.53.32     shaolin
kube-system   etcd-wudang                       1/1       Running   0          3h        10.24.138.208   wudang

查看etcd cluster状态:

root@shaolin:~# etcdctl endpoint status --endpoints=10.27.52.72:2379,10.24.138.208:2379,10.27.53.32:2379
10.27.52.72:2379, 6e80adf8cd57f826, 3.0.17, 11 MB, false, 17, 34941
10.24.138.208:2379, f3805d1ab19c110b, 3.0.17, 11 MB, true, 17, 34941
10.27.53.32:2379, 3184cfa57d8ef00c, 3.0.17, 11 MB, false, 17, 34941

可以看出三个etcd实例的数据size、raft status是一致的,wudang node上的etcd是leader!

5、将shaolin上的apiserver的etcdserver指向改回etcd-shaolin

// /etc/kubernetes/manifests/kube-apiserver.yaml

... ...
- --etcd-servers=http://127.0.0.1:2379
... ...

生效重启后,当前kubernetes cluster的状态如下面示意图:

img{512x368}

第二部分在这里

Kubernetes集群node主机名修改导致的异常

除了在生产环境使用的Kubernetes 1.3.7集群之外,我这里还有一套1.5.1的Kubernetes测试环境,这个测试环境一来用于验证各种技术方案,二来也是为了跟踪Kubernetes的最新进展。本篇要记录的一个异常就是发生在该测试Kubernetes集群中的。

一、缘起

前两天我在Kubernetes测试环境搭建一套Ceph,为了便于ceph-deploy的安装,我通过hostnamectl命令将阿里云默认提供的复杂又冗长的主机名改为短小且更有意义的主机名:

iZ25beglnhtZ -> yypdmaster
iz2ze39jeyizepdxhwqci6z -> yypdnode

以yypdmaster为例,修改过程如下:

# hostnamectl --static set-hostname yypdmaster
# hostnamectl status
Static hostname: yypdmaster
Transient hostname: iZ25beglnhtZ
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 91aa4b8f2556de49e743dc2f53e8a5c4
           Boot ID: 5d0e642ebafa460086388da4177e488e
    Virtualization: kvm
  Operating System: Ubuntu 16.04.1 LTS
            Kernel: Linux 4.4.0-58-generic
      Architecture: x86-64

# cat /etc/hostname
yypdmaster

hostnamectl并未修改/etc/hosts,我手动在/etc/hosts中将yypdmaster对应的ip配置上:

xx.xx.xx.xx yypdmaster

重新登录后,我们看到主机名状态:Transient hostname不见了,只剩下了静态主机名:

# hostnamectl status
   Static hostname: yypdmaster
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 91aa4b8f2556de49e743dc2f53e8a5c4
           Boot ID: 5d0e642ebafa460086388da4177e488e
    Virtualization: kvm
  Operating System: Ubuntu 16.04.1 LTS
            Kernel: Linux 4.4.0-58-generic
      Architecture: x86-64

另外一台主机也是如此修改。主机名修改后,整个k8s集群工作一切正常,因此我最初以为hostname的修改对k8s cluster的运行没有影响。

二、集群”Crash”

昨天在做跨节点挂载Cephfs测试时,发现在yypdmaster上kubectl exec另外一个node上的pod不好用,提示:连接10250端口超时!而且从错误日志来看,yypdmaster上的k8s组件居然通过yypdnode的外网ip去访问yypdnode上的10250端口,也就是yypdnode上kubelet监听的端口。由于aliyun的安全组规则限制,这个端口是不允许外网访问的,因此timeout错误是合理的。但为什么之前集群都是好好的?突然间出现这个问题呢?为什么不用内网的ip地址访问呢?

我尝试重启了yypdnode上的kubelet服务。不过似乎没什么效果!正当我疑惑时,我发现集群似乎”Crash”了,下面是当时查看集群的pod情况的输出:

# kubectl get pod --all-namespaces -o wide

NAMESPACE                    NAME                                    READY     STATUS             RESTARTS   AGE       IP             NODE
default                      ceph-pod2                               1/1       Unknown            0          26m       172.30.192.4   iz2ze39jeyizepdxhwqci6z
default                      ceph-pod2-with-secret                   1/1       Unknown            0          38m       172.30.192.2   iz2ze39jeyizepdxhwqci6z
default                      ceph-pod2-with-secret-on-master         1/1       Unknown            0          34m       172.30.0.51    iz25beglnhtz
default                      nginx-kit-3630450072-2c0jk              0/2       Pending            0          12m       <none>
default                      nginx-kit-3630450072-3n50m              2/2       Unknown            20         35d       172.30.0.44    iz25beglnhtz
default                      nginx-kit-3630450072-90v4q              0/2       Pending            0          12m       <none>
default                      nginx-kit-3630450072-j8qrk              2/2       Unknown            20         72d       172.30.0.47    iz25beglnhtz
kube-system                  dummy-2088944543-9382n                  1/1       Running            0          12m       xx.xx.xx.xx   yypdmaster
kube-system                  dummy-2088944543-93f4c                  1/1       Unknown            16         130d      xx.xx.xx.xx   iz25beglnhtz
kube-system                  elasticsearch-logging-v1-dhl35          1/1       Running            0          12m       172.30.192.6   yypdnode
kube-system                  elasticsearch-logging-v1-s3sbj          1/1       Unknown            9          35d       172.30.0.45    iz25beglnhtz
kube-system                  elasticsearch-logging-v1-t8wg0          1/1       Unknown            29         68d       172.30.0.43    iz25beglnhtz
kube-system                  elasticsearch-logging-v1-zdp19          1/1       Running            0          12m       172.30.0.3     yypdmaster
kube-system                  etcd-iz25beglnhtz                       1/1       Unknown            17         130d      xx.xx.xx.xx   iz25beglnhtz
kube-system                  etcd-yypdmaster                         1/1       Running            17         17m       xx.xx.xx.xx   yypdmaster
kube-system                  fluentd-es-v1.22-ggvv4                  1/1       NodeLost           24         68d       172.30.0.46    iz25beglnhtz
kube-system                  fluentd-es-v1.22-rj871                  1/1       Running            0          17m       172.30.0.1     yypdmaster
kube-system                  fluentd-es-v1.22-xn77x                  1/1       NodeLost           0          6d        172.30.192.0   iz2ze39jeyizepdxhwqci6z
kube-system                  fluentd-es-v1.22-z82rz                  1/1       Running            0          18m       172.30.192.5   yypdnode
kube-system                  kibana-logging-3746979809-dplzv         1/1       Running            0          12m       172.30.0.4     yypdmaster
kube-system                  kibana-logging-3746979809-lq9m3         1/1       Unknown            9          35d       172.30.0.49    iz25beglnhtz
kube-system                  kube-apiserver-iz25beglnhtz             1/1       Unknown            19         104d      xx.xx.xx.xx   iz25beglnhtz
kube-system                  kube-apiserver-yypdmaster               1/1       Running            19         17m       xx.xx.xx.xx   yypdmaster
kube-system                  kube-controller-manager-iz25beglnhtz    1/1       Unknown            21         130d      xx.xx.xx.xx   iz25beglnhtz
kube-system                  kube-controller-manager-yypdmaster      1/1       Running            21         17m       xx.xx.xx.xx   yypdmaster
kube-system                  kube-discovery-1769846148-wh1z4         1/1       Unknown            12         73d       xx.xx.xx.xx   iz25beglnhtz
kube-system                  kube-discovery-1769846148-z2v87         0/1       Pending            0          12m       <none>
kube-system                  kube-dns-2924299975-206tg               4/4       Unknown            129        130d      172.30.0.48    iz25beglnhtz
kube-system                  kube-dns-2924299975-g1kks               4/4       Running            0          12m       172.30.0.5     yypdmaster
kube-system                  kube-proxy-3z29k                        1/1       Running            0          18m       yy.yy.yy.yy    yypdnode
kube-system                  kube-proxy-kfzxv                        1/1       Running            0          17m       xx.xx.xx.xx   yypdmaster
kube-system                  kube-proxy-n2xmf                        1/1       NodeLost           16         130d      xx.xx.xx.xx   iz25beglnhtz

观察这个输出,我们看到几点异常:

  • 不常见的Pod状态:Unknown、NodeLost
  • Node一列居然出现了四个Node: yypdmaster、yypdnode、 iz25beglnhtz和 iz2ze39jeyizepdxhwqci6z

等了一会儿,这种状态依然不见好转。我于是重启了master上的kubelet、重启了两个节点上的docker engine,不过启动后问题依旧!

查看Running状态的Pod情况:

# kubectl get pod --all-namespaces -o wide|grep Running
kube-system                  dummy-2088944543-9382n                  1/1       Running            0          18m       xx.xx.xx.xx   yypdmaster
kube-system                  elasticsearch-logging-v1-dhl35          1/1       Running            0          18m       172.30.192.6   yypdnode
kube-system                  elasticsearch-logging-v1-zdp19          1/1       Running            0          18m       172.30.0.3     yypdmaster
kube-system                  etcd-yypdmaster                         1/1       Running            17         23m       xx.xx.xx.xx   yypdmaster
kube-system                  fluentd-es-v1.22-rj871                  1/1       Running            0          23m       172.30.0.1     yypdmaster
kube-system                  fluentd-es-v1.22-z82rz                  1/1       Running            0          24m       172.30.192.5   yypdnode
kube-system                  kibana-logging-3746979809-dplzv         1/1       Running            0          18m       172.30.0.4     yypdmaster
kube-system                  kube-apiserver-yypdmaster               1/1       Running            19         23m       xx.xx.xx.xx   yypdmaster
kube-system                  kube-controller-manager-yypdmaster      1/1       Running            21         23m       xx.xx.xx.xx   yypdmaster
kube-system                  kube-dns-2924299975-g1kks               4/4       Running            0          18m       172.30.0.5     yypdmaster
kube-system                  kube-proxy-3z29k                        1/1       Running            0          24m       yy.yy.yy.yy    yypdnode
kube-system                  kube-proxy-kfzxv                        1/1       Running            0          23m       xx.xx.xx.xx   yypdmaster
kube-system                  kube-scheduler-yypdmaster               1/1       Running            22         23m       xx.xx.xx.xx   yypdmaster
kube-system                  kubernetes-dashboard-3109525988-cj74d   1/1       Running            0          18m       172.30.0.6     yypdmaster
mioss-namespace-s0fcvegcmw   console-sm7cg2-101699315-f3g55          1/1       Running            0          18m       172.30.0.7     yypdmaster

似乎Kubernetes集群并未真正”Crash”,但从Node列来看,正常的pod归属的node不是yypdmaster就是yypdnode, iz25beglnhtz和 iz2ze39jeyize




这里是Tony Bai的个人Blog,欢迎访问、订阅和留言!订阅Feed请点击上面图片

如果您觉得这里的文章对您有帮助,请扫描上方二维码进行捐赠,加油后的Tony Bai将会为您呈现更多精彩的文章,谢谢!

如果您希望通过微信捐赠,请用微信客户端扫描下方赞赏码:


如果您希望通过比特币或以太币捐赠,可以扫描下方二维码:

比特币:


以太币:


如果您喜欢通过微信App浏览本站内容,可以扫描下方二维码,订阅本站官方微信订阅号“iamtonybai”;点击二维码,可直达本人官方微博主页^_^:



本站Powered by Digital Ocean VPS。

选择Digital Ocean VPS主机,即可获得10美元现金充值,可免费使用两个月哟!

著名主机提供商Linode 10$优惠码:linode10,在这里注册即可免费获得。

阿里云推荐码:1WFZ0V立享9折!

View Tony Bai's profile on LinkedIn


文章

评论

  • 正在加载...

分类

标签

归档











更多