flannel - Tony Bai

标签 flannel 下的文章

使用Kubeadm安装Kubernetes

十二月 30, 2016
8 条评论

在《当Docker遇到systemd》一文中，我提到过这两天儿一直在做的一个task：使用kubeadm在Ubuntu 16.04上安装部署Kubernetes的最新发布版本-k8s 1.5.1。

年中，Docker宣布在Docker engine中集成swarmkit工具包，这一announcement在轻量级容器界引发轩然大波。毕竟开发者是懒惰的^0^，有了docker swarmkit，驱动developer去安装其他容器编排工具的动力在哪里呢？即便docker engine还不是当年那个被人们高频使用的IE浏览器。作为针对Docker公司这一市场行为的回应，容器集群管理和服务编排领先者Kubernetes在三个月后发布了Kubernetes1.4.0版本。在这个版本中K8s新增了kubeadm工具。kubeadm的使用方式有点像集成在docker engine中的swarm kit工具，旨在改善开发者在安装、调试和使用k8s时的体验，降低安装和使用门槛。理论上通过两个命令：init和join即可搭建出一套完整的Kubernetes cluster。

不过，和初入docker引擎的swarmkit一样，kubeadm目前也在active development中，也不是那么stable，因此即便在当前最新的k8s 1.5.1版本中，它仍然处于Alpha状态，官方不建议在Production环境下使用。每次执行kubeadm init时，它都会打印如下提醒日志：

[kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters.

不过由于之前部署的k8s 1.3.7集群运行良好，这给了我们在k8s这条路上继续走下去并走好的信心。但k8s在部署和管理方面的体验的确是太繁琐了，于是我们准备试验一下kubeadm是否能带给我们超出预期的体验。之前在aliyun ubuntu 14.04上安装kubernetes 1.3.7的经验和教训，让我略微有那么一丢丢底气，但实际安装过程依旧是一波三折。这既与kubeadm的unstable有关，同样也与cni、第三方网络add-ons的质量有关。无论哪一方出现问题都会让你的install过程异常坎坷曲折。

一、环境与约束

在kubeadm支持的Ubuntu 16.04+, CentOS 7 or HypriotOS v1.0.1+三种操作系统中，我们选择了Ubuntu 16.04。由于阿里云尚无官方16.04 Image可用，我们新开了两个Ubuntu 14.04ECS实例，并通过apt-get命令手工将其升级到Ubuntu 16.04.1，详细版本是：Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-58-generic x86_64)。

Ubuntu 16.04使用了systemd作为init system，在安装和配置Docker时，可以参考我的这篇《当Docker遇到system》。Docker版本我选择了目前可以得到的lastest stable release: 1.12.5。

# docker version
Client:
 Version:      1.12.5
 API version:  1.24
 Go version:   go1.6.4
 Git commit:   7392c3b
 Built:        Fri Dec 16 02:42:17 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.5
 API version:  1.24
 Go version:   go1.6.4
 Git commit:   7392c3b
 Built:        Fri Dec 16 02:42:17 2016
 OS/Arch:      linux/amd64

至于Kubernetes版本，前面已经提到过了，我们就使用最新发布的Kubernetes 1.5.1版本。1.5.1是1.5.0的一个紧急fix版本，主要”to address default flag values which in isolation were not problematic, but in concert could result in an insecure cluster”。官方建议skip 1.5.0，直接用1.5.1。

这里再重申一下：Kubernetes的安装、配置和调通是很难的，在阿里云上调通就更难了，有时还需要些运气。Kubernetes、Docker、cni以及各种网络Add-ons都在active development中，也许今天还好用的step、tip和trick，明天就out-dated，因此在借鉴本文的操作步骤时，请谨记这些^0^。

二、安装包准备

我们这次新开了两个ECS实例，一个作为master node，一个作为minion node。Kubeadm默认安装时，master node将不会参与Pod调度，不会承载work load，即不会有非核心组件的Pod在Master node上被创建出来。当然通过kubectl taint命令可以解除这一限制，不过这是后话了。

集群拓扑：

master node：10.47.217.91，主机名：iZ25beglnhtZ
minion node：10.28.61.30，主机名：iZ2ze39jeyizepdxhwqci6Z

本次安装的主参考文档就是Kubernetes官方的那篇《Installing Kubernetes on Linux with kubeadm》。

本小节，我们将进行安装包准备，即将kubeadm以及此次安装所需要的k8s核心组件统统下载到上述两个Node上。注意：如果你有加速器，那么本节下面的安装过程将尤为顺利，反之，… 。以下命令，在两个Node上均要执行。

1、添加apt-key

# curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
OK

2、添加Kubernetes源并更新包信息

添加Kubernetes源到sources.list.d目录下：

# cat <<EOF > /etc/apt/sources.list.d/kubernetes.list
  deb http://apt.kubernetes.io/ kubernetes-xenial main
  EOF

# cat /etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main

更新包信息：

# apt-get update
... ...
Hit:2 http://mirrors.aliyun.com/ubuntu xenial InRelease
Hit:3 https://apt.dockerproject.org/repo ubuntu-xenial InRelease
Get:4 http://mirrors.aliyun.com/ubuntu xenial-security InRelease [102 kB]
Get:1 https://packages.cloud.google.com/apt kubernetes-xenial InRelease [6,299 B]
Get:5 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 Packages [1,739 B]
Get:6 http://mirrors.aliyun.com/ubuntu xenial-updates InRelease [102 kB]
Get:7 http://mirrors.aliyun.com/ubuntu xenial-proposed InRelease [253 kB]
Get:8 http://mirrors.aliyun.com/ubuntu xenial-backports InRelease [102 kB]
Fetched 568 kB in 19s (28.4 kB/s)
Reading package lists... Done

3、下载Kubernetes核心组件

在此次安装中，我们通过apt-get就可以下载Kubernetes的核心组件，包括kubelet、kubeadm、kubectl和kubernetes-cni等。

# apt-get install -y kubelet kubeadm kubectl kubernetes-cni
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
  libtimedate-perl
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  ebtables ethtool socat
The following NEW packages will be installed:
  ebtables ethtool kubeadm kubectl kubelet kubernetes-cni socat
0 upgraded, 7 newly installed, 0 to remove and 0 not upgraded.
Need to get 37.6 MB of archives.
After this operation, 261 MB of additional disk space will be used.
Get:2 http://mirrors.aliyun.com/ubuntu xenial/main amd64 ebtables amd64 2.0.10.4-3.4ubuntu1 [79.6 kB]
Get:6 http://mirrors.aliyun.com/ubuntu xenial/main amd64 ethtool amd64 1:4.5-1 [97.5 kB]
Get:7 http://mirrors.aliyun.com/ubuntu xenial/universe amd64 socat amd64 1.7.3.1-1 [321 kB]
Get:1 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubernetes-cni amd64 0.3.0.1-07a8a2-00 [6,877 kB]
Get:3 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubelet amd64 1.5.1-00 [15.1 MB]
Get:4 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubectl amd64 1.5.1-00 [7,954 kB]
Get:5 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubeadm amd64 1.6.0-alpha.0-2074-a092d8e0f95f52-00 [7,120 kB]
Fetched 37.6 MB in 36s (1,026 kB/s)
... ...
Unpacking kubeadm (1.6.0-alpha.0-2074-a092d8e0f95f52-00) ...
Processing triggers for systemd (229-4ubuntu13) ...
Processing triggers for ureadahead (0.100.0-19) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up ebtables (2.0.10.4-3.4ubuntu1) ...
update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults
Setting up ethtool (1:4.5-1) ...
Setting up kubernetes-cni (0.3.0.1-07a8a2-00) ...
Setting up socat (1.7.3.1-1) ...
Setting up kubelet (1.5.1-00) ...
Setting up kubectl (1.5.1-00) ...
Setting up kubeadm (1.6.0-alpha.0-2074-a092d8e0f95f52-00) ...
Processing triggers for systemd (229-4ubuntu13) ...
Processing triggers for ureadahead (0.100.0-19) ...
... ...

下载后的kube组件并未自动运行起来。在 /lib/systemd/system下面我们能看到kubelet.service：

# ls /lib/systemd/system|grep kube
kubelet.service

//kubelet.service
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=http://kubernetes.io/docs/

[Service]
ExecStart=/usr/bin/kubelet
Restart=always
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target

kubelet的版本：

# kubelet --version
Kubernetes v1.5.1

k8s的核心组件都有了，接下来我们就要boostrap kubernetes cluster了。同时，问题也就随之而来了，而这些问题以及问题的解决才是本篇要说明的重点。

三、初始化集群

前面说过，理论上通过kubeadm使用init和join命令即可建立一个集群，这init就是在master节点对集群进行初始化。和k8s 1.4之前的部署方式不同的是，kubeadm安装的k8s核心组件都是以容器的形式运行于master node上的。因此在kubeadm init之前，最好给master node上的docker engine挂上加速器代理，因为kubeadm要从gcr.io/google_containers repository中pull许多核心组件的images，大约有如下一些：

gcr.io/google_containers/kube-controller-manager-amd64   v1.5.1                     cd5684031720        2 weeks ago         102.4 MB
gcr.io/google_containers/kube-apiserver-amd64            v1.5.1                     8c12509df629        2 weeks ago         124.1 MB
gcr.io/google_containers/kube-proxy-amd64                v1.5.1                     71d2b27b03f6        2 weeks ago         175.6 MB
gcr.io/google_containers/kube-scheduler-amd64            v1.5.1                     6506e7b74dac        2 weeks ago         53.97 MB
gcr.io/google_containers/etcd-amd64                      3.0.14-kubeadm             856e39ac7be3        5 weeks ago         174.9 MB
gcr.io/google_containers/kubedns-amd64                   1.9                        26cf1ed9b144        5 weeks ago         47 MB
gcr.io/google_containers/dnsmasq-metrics-amd64           1.0                        5271aabced07        7 weeks ago         14 MB
gcr.io/google_containers/kube-dnsmasq-amd64              1.4                        3ec65756a89b        3 months ago        5.13 MB
gcr.io/google_containers/kube-discovery-amd64            1.0                        c5e0c9a457fc        3 months ago        134.2 MB
gcr.io/google_containers/exechealthz-amd64               1.2                        93a43bfb39bf        3 months ago        8.375 MB
gcr.io/google_containers/pause-amd64                     3.0                        99e59f495ffa        7 months ago        746.9 kB

在Kubeadm的文档中，Pod Network的安装是作为一个单独的步骤的。kubeadm init并没有为你选择一个默认的Pod network进行安装。我们将首选Flannel 作为我们的Pod network，这不仅是因为我们的上一个集群用的就是flannel，而且表现稳定。更是由于Flannel就是coreos为k8s打造的专属overlay network add-ons。甚至于flannel repository的readme.md都这样写着：“flannel is a network fabric for containers, designed for Kubernetes”。如果我们要使用Flannel，那么在执行init时，按照kubeadm文档要求，我们必须给init命令带上option：–pod-network-cidr=10.244.0.0/16。

1、执行kubeadm init

执行kubeadm init命令：

# kubeadm init --pod-network-cidr=10.244.0.0/16
[kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters.
[preflight] Running pre-flight checks
[preflight] Starting the kubelet service
[init] Using Kubernetes version: v1.5.1
[tokens] Generated token: "2e7da9.7fc5668ff26430c7"
[certificates] Generated Certificate Authority key and certificate.
[certificates] Generated API Server key and certificate
[certificates] Generated Service Account signing keys
[certificates] Created keys and certificates in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[apiclient] Created API client, waiting for the control plane to become ready //如果没有挂加速器，可能会在这里hang住。
[apiclient] All control plane components are healthy after 54.789750 seconds
[apiclient] Waiting for at least one node to register and become ready
[apiclient] First node is ready after 1.003053 seconds
[apiclient] Creating a test deployment
[apiclient] Test deployment succeeded
[token-discovery] Created the kube-discovery deployment, waiting for it to become ready
[token-discovery] kube-discovery is ready after 62.503441 seconds
[addons] Created essential addon: kube-proxy
[addons] Created essential addon: kube-dns

Your Kubernetes master has initialized successfully!

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:

http://kubernetes.io/docs/admin/addons/

You can now join any number of machines by running the following on each node:

kubeadm join --token=2e7da9.7fc5668ff26430c7 123.56.200.187

init成功后的master node有啥变化？k8s的核心组件均正常启动：

# ps -ef|grep kube
root      2477  2461  1 16:36 ?        00:00:04 kube-proxy --kubeconfig=/run/kubeconfig
root     30860     1 12 16:33 ?        00:01:09 /usr/bin/kubelet --kubeconfig=/etc/kubernetes/kubelet.conf --require-kubeconfig=true --pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin --cluster-dns=10.96.0.10 --cluster-domain=cluster.local
root     30952 30933  0 16:33 ?        00:00:01 kube-scheduler --address=127.0.0.1 --leader-elect --master=127.0.0.1:8080
root     31128 31103  2 16:33 ?        00:00:11 kube-controller-manager --address=127.0.0.1 --leader-elect --master=127.0.0.1:8080 --cluster-name=kubernetes --root-ca-file=/etc/kubernetes/pki/ca.pem --service-account-private-key-file=/etc/kubernetes/pki/apiserver-key.pem --cluster-signing-cert-file=/etc/kubernetes/pki/ca.pem --cluster-signing-key-file=/etc/kubernetes/pki/ca-key.pem --insecure-experimental-approve-all-kubelet-csrs-for-group=system:kubelet-bootstrap --allocate-node-cidrs=true --cluster-cidr=10.244.0.0/16
root     31223 31207  2 16:34 ?        00:00:10 kube-apiserver --insecure-bind-address=127.0.0.1 --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota --service-cluster-ip-range=10.96.0.0/12 --service-account-key-file=/etc/kubernetes/pki/apiserver-key.pem --client-ca-file=/etc/kubernetes/pki/ca.pem --tls-cert-file=/etc/kubernetes/pki/apiserver.pem --tls-private-key-file=/etc/kubernetes/pki/apiserver-key.pem --token-auth-file=/etc/kubernetes/pki/tokens.csv --secure-port=6443 --allow-privileged --advertise-address=123.56.200.187 --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --anonymous-auth=false --etcd-servers=http://127.0.0.1:2379
root     31491 31475  0 16:35 ?        00:00:00 /usr/local/bin/kube-discovery

而且是多以container的形式启动：

# docker ps
CONTAINER ID        IMAGE                                                           COMMAND                  CREATED                  STATUS                  PORTS               NAMES
c16c442b7eca        gcr.io/google_containers/kube-proxy-amd64:v1.5.1                "kube-proxy --kubecon"   6 minutes ago            Up 6 minutes                                k8s_kube-proxy.36dab4e8_kube-proxy-sb4sm_kube-system_43fb1a2c-cb46-11e6-ad8f-00163e1001d7_2ba1648e
9f73998e01d7        gcr.io/google_containers/kube-discovery-amd64:1.0               "/usr/local/bin/kube-"   8 minutes ago            Up 8 minutes                                k8s_kube-discovery.7130cb0a_kube-discovery-1769846148-6z5pw_kube-system_1eb97044-cb46-11e6-ad8f-00163e1001d7_fd49c2e3
dd5412e5e15c        gcr.io/google_containers/kube-apiserver-amd64:v1.5.1            "kube-apiserver --ins"   9 minutes ago            Up 9 minutes                                k8s_kube-apiserver.1c5a91d9_kube-apiserver-iz25beglnhtz_kube-system_eea8df1717e9fea18d266103f9edfac3_8cae8485
60017f8819b2        gcr.io/google_containers/etcd-amd64:3.0.14-kubeadm              "etcd --listen-client"   9 minutes ago            Up 9 minutes                                k8s_etcd.c323986f_etcd-iz25beglnhtz_kube-system_3a26566bb004c61cd05382212e3f978f_06d517eb
03c2463aba9c        gcr.io/google_containers/kube-controller-manager-amd64:v1.5.1   "kube-controller-mana"   9 minutes ago            Up 9 minutes                                k8s_kube-controller-manager.d30350e1_kube-controller-manager-iz25beglnhtz_kube-system_9a40791dd1642ea35c8d95c9e610e6c1_3b05cb8a
fb9a724540a7        gcr.io/google_containers/kube-scheduler-amd64:v1.5.1            "kube-scheduler --add"   9 minutes ago            Up 9 minutes                                k8s_kube-scheduler.ef325714_kube-scheduler-iz25beglnhtz_kube-system_dc58861a0991f940b0834f8a110815cb_9b3ccda2
.... ...

不过这些核心组件并不是跑在pod network中的（没错，此时的pod network还没有创建），而是采用了host network。以kube-apiserver的pod信息为例：

kube-system   kube-apiserver-iz25beglnhtz            1/1       Running   0          1h        10.47.217.91   iz25beglnhtz

kube-apiserver的IP是host ip，从而推断容器使用的是host网络，这从其对应的pause容器的network属性就可以看出：

# docker ps |grep apiserver
a5a76bc59e38        gcr.io/google_containers/kube-apiserver-amd64:v1.5.1            "kube-apiserver --ins"   About an hour ago   Up About an hour                        k8s_kube-apiserver.2529402_kube-apiserver-iz25beglnhtz_kube-system_25d646be9a0092138dc6088fae6f1656_ec0079fc
ef4d3bf057a6        gcr.io/google_containers/pause-amd64:3.0                        "/pause"                 About an hour ago   Up About an hour                        k8s_POD.d8dbe16c_kube-apiserver-iz25beglnhtz_kube-system_25d646be9a0092138dc6088fae6f1656_bbfd8a31

inspect pause容器，可以看到pause container的NetworkMode的值：

"NetworkMode": "host",

如果kubeadm init执行过程中途出现了什么问题，比如前期忘记挂加速器导致init hang住，你可能会ctrl+c退出init执行。重新配置后，再执行kubeadm init，这时你可能会遇到下面kubeadm的输出：

# kubeadm init --pod-network-cidr=10.244.0.0/16
[kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters.
[preflight] Running pre-flight checks
[preflight] Some fatal errors occurred:
    Port 10250 is in use
    /etc/kubernetes/manifests is not empty
    /etc/kubernetes/pki is not empty
    /var/lib/kubelet is not empty
    /etc/kubernetes/admin.conf already exists
    /etc/kubernetes/kubelet.conf already exists
[preflight] If you know what you are doing, you can skip pre-flight checks with `--skip-preflight-checks`

kubeadm会自动检查当前环境是否有上次命令执行的“残留”。如果有，必须清理后再行执行init。我们可以通过”kubeadm reset”来清理环境，以备重来。

# kubeadm reset
[preflight] Running pre-flight checks
[reset] Draining node: "iz25beglnhtz"
[reset] Removing node: "iz25beglnhtz"
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Removing kubernetes-managed containers
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/etcd]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf]

2、安装flannel pod网络

kubeadm init之后，如果你探索一下当前cluster的状态或者核心组件的日志，你会发现某些“异常”，比如：从kubelet的日志中我们可以看到一直刷屏的错误信息：

Dec 26 16:36:48 iZ25beglnhtZ kubelet[30860]: E1226 16:36:48.365885   30860 docker_manager.go:2201] Failed to setup network for pod "kube-dns-2924299975-pddz5_kube-system(43fd7264-cb46-11e6-ad8f-00163e1001d7)" using network plugins "cni": cni config unintialized; Skipping pod

通过命令kubectl get pod –all-namespaces -o wide，你也会发现kube-dns pod处于ContainerCreating状态。

这些都不打紧，因为我们还没有为cluster安装Pod network呢。前面说过，我们要使用Flannel网络，因此我们需要执行如下安装命令：

#kubectl apply -f  https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
configmap "kube-flannel-cfg" created
daemonset "kube-flannel-ds" created

稍等片刻，我们再来看master node上的cluster信息：

# ps -ef|grep kube|grep flannel
root      6517  6501  0 17:20 ?        00:00:00 /opt/bin/flanneld --ip-masq --kube-subnet-mgr
root      6573  6546  0 17:20 ?        00:00:00 /bin/sh -c set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done

# kubectl get pods --all-namespaces
NAMESPACE     NAME                                   READY     STATUS    RESTARTS   AGE
kube-system   dummy-2088944543-s0c5g                 1/1       Running   0          50m
kube-system   etcd-iz25beglnhtz                      1/1       Running   0          50m
kube-system   kube-apiserver-iz25beglnhtz            1/1       Running   0          50m
kube-system   kube-controller-manager-iz25beglnhtz   1/1       Running   0          50m
kube-system   kube-discovery-1769846148-6z5pw        1/1       Running   0          50m
kube-system   kube-dns-2924299975-pddz5              4/4       Running   0          49m
kube-system   kube-flannel-ds-5ww9k                  2/2       Running   0          4m
kube-system   kube-proxy-sb4sm                       1/1       Running   0          49m
kube-system   kube-scheduler-iz25beglnhtz            1/1       Running   0          49m

至少集群的核心组件已经全部run起来了。看起来似乎是成功了。

3、minion node：join the cluster

接下来，就该minion node加入cluster了。这里我们用到了kubeadm的第二个命令：kubeadm join。

在minion node上执行（注意：这里要保证master node的9898端口在防火墙是打开的）：

# kubeadm join --token=2e7da9.7fc5668ff26430c7 123.56.200.187
[kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters.
[preflight] Running pre-flight checks
[tokens] Validating provided token
[discovery] Created cluster info discovery client, requesting info from "http://123.56.200.187:9898/cluster-info/v1/?token-id=2e7da9"
[discovery] Cluster info object received, verifying signature using given token
[discovery] Cluster info signature and contents are valid, will use API endpoints [https://123.56.200.187:6443]
[bootstrap] Trying to connect to endpoint https://123.56.200.187:6443
[bootstrap] Detected server version: v1.5.1
[bootstrap] Successfully established connection with endpoint "https://123.56.200.187:6443"
[csr] Created API client to obtain unique certificate for this node, generating keys and certificate signing request
[csr] Received signed certificate from the API server:
Issuer: CN=kubernetes | Subject: CN=system:node:iZ2ze39jeyizepdxhwqci6Z | CA: false
Not before: 2016-12-26 09:31:00 +0000 UTC Not After: 2017-12-26 09:31:00 +0000 UTC
[csr] Generating kubelet configuration
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"

Node join complete:
* Certificate signing request sent to master and response
  received.
* Kubelet informed of new secure connection details.

Run 'kubectl get nodes' on the master to see this machine join.

也很顺利。我们在minion node上看到的k8s组件情况如下：

d85cf36c18ed        gcr.io/google_containers/kube-proxy-amd64:v1.5.1      "kube-proxy --kubecon"   About an hour ago   Up About an hour                        k8s_kube-proxy.36dab4e8_kube-proxy-lsn0t_kube-system_b8eddf1c-cb4e-11e6-ad8f-00163e1001d7_5826f32b
a60e373b48b8        gcr.io/google_containers/pause-amd64:3.0              "/pause"                 About an hour ago   Up About an hour                        k8s_POD.d8dbe16c_kube-proxy-lsn0t_kube-system_b8eddf1c-cb4e-11e6-ad8f-00163e1001d7_46bfcf67
a665145eb2b5        quay.io/coreos/flannel-git:v0.6.1-28-g5dde68d-amd64   "/bin/sh -c 'set -e -"   About an hour ago   Up About an hour                        k8s_install-cni.17d8cf2_kube-flannel-ds-tr8zr_kube-system_06eca729-cb72-11e6-ad8f-00163e1001d7_01e12f61
5b46f2cb0ccf        gcr.io/google_containers/pause-amd64:3.0              "/pause"                 About an hour ago   Up About an hour                        k8s_POD.d8dbe16c_kube-flannel-ds-tr8zr_kube-system_06eca729-cb72-11e6-ad8f-00163e1001d7_ac880d20

我们在master node上查看当前cluster状态：

# kubectl get nodes
NAME                      STATUS         AGE
iz25beglnhtz              Ready,master   1h
iz2ze39jeyizepdxhwqci6z   Ready          21s

k8s cluster创建”成功”！真的成功了吗？“折腾”才刚刚开始:(！

三、Flannel Pod Network问题

Join成功所带来的“余温”还未散去，我就发现了Flannel pod network的问题，troubleshooting正式开始:(。

1、minion node上的flannel时不时地报错

刚join时还好好的，可过了没一会儿，我们就发现在kubectl get pod –all-namespaces中有错误出现：

kube-system   kube-flannel-ds-tr8zr                  1/2       CrashLoopBackOff   189        16h

我们发现这是minion node上的flannel pod中的一个container出错导致的，跟踪到的具体错误如下：

# docker logs bc0058a15969
E1227 06:17:50.605110       1 main.go:127] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-tr8zr': Get https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-tr8zr: dial tcp 10.96.0.1:443: i/o timeout

10.96.0.1是pod network中apiserver service的cluster ip，而minion node上的flannel组件居然无法访问到这个cluster ip！这个问题的奇怪之处还在于，有些时候这个Pod在被调度restart N多次后或者被删除重启后，又突然变为running状态了，行为十分怪异。

在flannel github.com issues中，至少有两个open issue与此问题有密切关系：

https://github.com/coreos/flannel/issues/545

https://github.com/coreos/flannel/issues/535

这个问题暂无明确解。当minion node上的flannel pod自恢复为running状态时，我们又可以继续了。

2、minion node上flannel pod启动失败的一个应对方法

在下面issue中，很多developer讨论了minion node上flannel pod启动失败的一种可能原因以及临时应对方法：

https://github.com/kubernetes/kubernetes/issues/34101

这种说法大致就是minion node上的kube-proxy使用了错误的interface，通过下面方法可以fix这个问题。在minion node上执行：

#  kubectl -n kube-system get ds -l 'component=kube-proxy' -o json | jq '.items[0].spec.template.spec.containers[0].command |= .+ ["--cluster-cidr=10.244.0.0/16"]' | kubectl apply -f - && kubectl -n kube-system delete pods -l 'component=kube-proxy'
daemonset "kube-proxy" configured
pod "kube-proxy-lsn0t" deleted
pod "kube-proxy-sb4sm" deleted

执行后，flannel pod的状态：

kube-system   kube-flannel-ds-qw291                  2/2       Running   8          17h
kube-system   kube-flannel-ds-x818z                  2/2       Running   17         1h

经过17次restart，minion node上的flannel pod 启动ok了。其对应的flannel container启动日志如下：

# docker logs 1f64bd9c0386
I1227 07:43:26.670620       1 main.go:132] Installing signal handlers
I1227 07:43:26.671006       1 manager.go:133] Determining IP address of default interface
I1227 07:43:26.670825       1 kube.go:233] starting kube subnet manager
I1227 07:43:26.671514       1 manager.go:163] Using 59.110.67.15 as external interface
I1227 07:43:26.671575       1 manager.go:164] Using 59.110.67.15 as external endpoint
I1227 07:43:26.746811       1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
I1227 07:43:26.749785       1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
I1227 07:43:26.752343       1 ipmasq.go:47] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE
I1227 07:43:26.755126       1 manager.go:246] Lease acquired: 10.244.1.0/24
I1227 07:43:26.755444       1 network.go:58] Watching for L3 misses
I1227 07:43:26.755475       1 network.go:66] Watching for new subnet leases
I1227 07:43:27.755830       1 network.go:153] Handling initial subnet events
I1227 07:43:27.755905       1 device.go:163] calling GetL2List() dev.link.Index: 10
I1227 07:43:27.756099       1 device.go:168] calling NeighAdd: 123.56.200.187, ca:68:7c:9b:cc:67

issue中说到，在kubeadm init时，显式地指定–advertise-address将会避免这个问题。不过目前不要在–advertise-address后面写上多个IP，虽然文档上说是支持的，但实际情况是，当你显式指定–advertise-address的值为两个或两个以上IP时，比如下面这样：

#kubeadm init --api-advertise-addresses=10.47.217.91,123.56.200.187 --pod-network-cidr=10.244.0.0/16

master初始化成功后，当minion node执行join cluster命令时，会panic掉：

# kubeadm join --token=92e977.f1d4d090906fc06a 10.47.217.91
[kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters.
... ...
[bootstrap] Successfully established connection with endpoint "https://10.47.217.91:6443"
[bootstrap] Successfully established connection with endpoint "https://123.56.200.187:6443"
E1228 10:14:05.405294   28378 runtime.go:64] Observed a panic: "close of closed channel" (close of closed channel)
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/runtime/runtime.go:70
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/runtime/runtime.go:63
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/runtime/runtime.go:49
/usr/local/go/src/runtime/asm_amd64.s:479
/usr/local/go/src/runtime/panic.go:458
/usr/local/go/src/runtime/chan.go:311
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/node/bootstrap.go:85
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:96
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:97
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:52
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/node/bootstrap.go:93
/usr/local/go/src/runtime/asm_amd64.s:2086
[csr] Created API client to obtain unique certificate for this node, generating keys and certificate signing request
panic: close of closed channel [recovered]
    panic: close of closed channel

goroutine 29 [running]:
panic(0x1342de0, 0xc4203eebf0)
    /usr/local/go/src/runtime/panic.go:500 +0x1a1
k8s.io/kubernetes/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/runtime/runtime.go:56 +0x126
panic(0x1342de0, 0xc4203eebf0)
    /usr/local/go/src/runtime/panic.go:458 +0x243
k8s.io/kubernetes/cmd/kubeadm/app/node.EstablishMasterConnection.func1.1()
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/node/bootstrap.go:85 +0x29d
k8s.io/kubernetes/pkg/util/wait.JitterUntil.func1(0xc420563ee0)
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:96 +0x5e
k8s.io/kubernetes/pkg/util/wait.JitterUntil(0xc420563ee0, 0x12a05f200, 0x0, 0xc420022e01, 0xc4202c2060)
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:97 +0xad
k8s.io/kubernetes/pkg/util/wait.Until(0xc420563ee0, 0x12a05f200, 0xc4202c2060)
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:52 +0x4d
k8s.io/kubernetes/cmd/kubeadm/app/node.EstablishMasterConnection.func1(0xc4203a82f0, 0xc420269b90, 0xc4202c2060, 0xc4202c20c0, 0xc4203d8d80, 0x401, 0x480, 0xc4201e75e0, 0x17, 0xc4201e7560, ...)
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/node/bootstrap.go:93 +0x100
created by k8s.io/kubernetes/cmd/kubeadm/app/node.EstablishMasterConnection
    /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/node/bootstrap.go:94 +0x3ed

关于join panic这个问题，在这个issue中有详细讨论：https://github.com/kubernetes/kubernetes/issues/36988

3、open /run/flannel/subnet.env: no such file or directory

前面说过，默认情况下，考虑安全原因，master node是不承担work load的，不参与pod调度。我们这里机器少，只能让master node也辛苦一下。通过下面这个命令可以让master node也参与pod调度：

# kubectl taint nodes --all dedicated-
node "iz25beglnhtz" tainted

接下来，我们create一个deployment，manifest描述文件如下：

//run-my-nginx.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: my-nginx
spec:
  replicas: 2
  template:
    metadata:
      labels:
        run: my-nginx
    spec:
      containers:
      - name: my-nginx
        image: nginx:1.10.1
        ports:
        - containerPort: 80

create后，我们发现调度到master上的my-nginx pod启动是ok的，但minion node上的pod则一直失败，查看到的失败原因如下：

Events:
  FirstSeen    LastSeen    Count    From                    SubObjectPath    Type        Reason        Message
  ---------    --------    -----    ----                    -------------    --------    ------        -------
  28s        28s        1    {default-scheduler }                    Normal        Scheduled    Successfully assigned my-nginx-2560993602-0440x to iz2ze39jeyizepdxhwqci6z
  27s        1s        26    {kubelet iz2ze39jeyizepdxhwqci6z}            Warning        FailedSync    Error syncing pod, skipping: failed to "SetupNetwork" for "my-nginx-2560993602-0440x_default" with SetupNetworkError: "Failed to setup network for pod \"my-nginx-2560993602-0440x_default(ba5ce554-cbf1-11e6-8c42-00163e1001d7)\" using network plugins \"cni\": open /run/flannel/subnet.env: no such file or directory; Skipping pod"

在minion node上的确没有找到/run/flannel/subnet.env该文件。但master node上有这个文件：

// /run/flannel/subnet.env

FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

于是手动在minion node上创建一份/run/flannel/subnet.env，并复制master node同名文件的内容，保存。稍许片刻，minion node上的my-nginx pod从error变成running了。

4、no IP addresses available in network: cbr0

将之前的一个my-nginx deployment的replicas改为3，并创建基于该deployment中pods的my-nginx service：

//my-nginx-svc.yaml

apiVersion: v1
kind: Service
metadata:
  name: my-nginx
  labels:
    run: my-nginx
spec:
  type: NodePort
  ports:
  - port: 80
    nodePort: 30062
    protocol: TCP
  selector:
    run: my-nginx

修改后，通过curl localhost:30062测试服务连通性。发现通过VIP负载均衡到master node上的my-nginx pod的request都成功得到了Response，但是负载均衡到minion node上pod的request，则阻塞在那里，直到timeout。查看pod信息才发现，原来新调度到minion node上的my-nginx pod并没有启动ok，错误原因如下：

Events:
  FirstSeen    LastSeen    Count    From                    SubObjectPath    Type        Reason        Message
  ---------    --------    -----    ----                    -------------    --------    ------        -------
  2m        2m        1    {default-scheduler }                    Normal        Scheduled    Successfully assigned my-nginx-1948696469-ph11m to iz2ze39jeyizepdxhwqci6z
  2m        0s        177    {kubelet iz2ze39jeyizepdxhwqci6z}            Warning        FailedSync    Error syncing pod, skipping: failed to "SetupNetwork" for "my-nginx-1948696469-ph11m_default" with SetupNetworkError: "Failed to setup network for pod \"my-nginx-1948696469-ph11m_default(3700d74a-cc12-11e6-8c42-00163e1001d7)\" using network plugins \"cni\": no IP addresses available in network: cbr0; Skipping pod"

查看minion node上/var/lib/cni/networks/cbr0目录，发现该目录下有如下文件：

10.244.1.10   10.244.1.12   10.244.1.14   10.244.1.16   10.244.1.18   10.244.1.2    10.244.1.219  10.244.1.239  10.244.1.3   10.244.1.5   10.244.1.7   10.244.1.9
10.244.1.100  10.244.1.120  10.244.1.140  10.244.1.160  10.244.1.180  10.244.1.20   10.244.1.22   10.244.1.24   10.244.1.30  10.244.1.50  10.244.1.70  10.244.1.90
10.244.1.101  10.244.1.121  10.244.1.141  10.244.1.161  10.244.1.187  10.244.1.200  10.244.1.220  10.244.1.240  10.244.1.31  10.244.1.51  10.244.1.71  10.244.1.91
10.244.1.102  10.244.1.122  10.244.1.142  10.244.1.162  10.244.1.182  10.244.1.201  10.244.1.221  10.244.1.241  10.244.1.32  10.244.1.52  10.244.1.72  10.244.1.92
10.244.1.103  10.244.1.123  10.244.1.143  10.244.1.163  10.244.1.183  10.244.1.202  10.244.1.222  10.244.1.242  10.244.1.33  10.244.1.53  10.244.1.73  10.244.1.93
10.244.1.104  10.244.1.124  10.244.1.144  10.244.1.164  10.244.1.184  10.244.1.203  10.244.1.223  10.244.1.243  10.244.1.34  10.244.1.54  10.244.1.74  10.244.1.94
10.244.1.105  10.244.1.125  10.244.1.145  10.244.1.165  10.244.1.185  10.244.1.204  10.244.1.224  10.244.1.244  10.244.1.35  10.244.1.55  10.244.1.75  10.244.1.95
10.244.1.106  10.244.1.126  10.244.1.146  10.244.1.166  10.244.1.186  10.244.1.205  10.244.1.225  10.244.1.245  10.244.1.36  10.244.1.56  10.244.1.76  10.244.1.96
10.244.1.107  10.244.1.127  10.244.1.147  10.244.1.167  10.244.1.187  10.244.1.206  10.244.1.226  10.244.1.246  10.244.1.37  10.244.1.57  10.244.1.77  10.244.1.97
10.244.1.108  10.244.1.128  10.244.1.148  10.244.1.168  10.244.1.188  10.244.1.207  10.244.1.227  10.244.1.247  10.244.1.38  10.244.1.58  10.244.1.78  10.244.1.98
10.244.1.109  10.244.1.129  10.244.1.149  10.244.1.169  10.244.1.189  10.244.1.208  10.244.1.228  10.244.1.248  10.244.1.39  10.244.1.59  10.244.1.79  10.244.1.99
10.244.1.11   10.244.1.13   10.244.1.15   10.244.1.17   10.244.1.19   10.244.1.209  10.244.1.229  10.244.1.249  10.244.1.4   10.244.1.6   10.244.1.8   last_reserved_ip
10.244.1.110  10.244.1.130  10.244.1.150  10.244.1.170  10.244.1.190  10.244.1.21   10.244.1.23   10.244.1.25   10.244.1.40  10.244.1.60  10.244.1.80
10.244.1.111  10.244.1.131  10.244.1.151  10.244.1.171  10.244.1.191  10.244.1.210  10.244.1.230  10.244.1.250  10.244.1.41  10.244.1.61  10.244.1.81
10.244.1.112  10.244.1.132  10.244.1.152  10.244.1.172  10.244.1.192  10.244.1.211  10.244.1.231  10.244.1.251  10.244.1.42  10.244.1.62  10.244.1.82
10.244.1.113  10.244.1.133  10.244.1.153  10.244.1.173  10.244.1.193  10.244.1.212  10.244.1.232  10.244.1.252  10.244.1.43  10.244.1.63  10.244.1.83
10.244.1.114  10.244.1.134  10.244.1.154  10.244.1.174  10.244.1.194  10.244.1.213  10.244.1.233  10.244.1.253  10.244.1.44  10.244.1.64  10.244.1.84
10.244.1.115  10.244.1.135  10.244.1.155  10.244.1.175  10.244.1.195  10.244.1.214  10.244.1.234  10.244.1.254  10.244.1.45  10.244.1.65  10.244.1.85
10.244.1.116  10.244.1.136  10.244.1.156  10.244.1.176  10.244.1.196  10.244.1.215  10.244.1.235  10.244.1.26   10.244.1.46  10.244.1.66  10.244.1.86
10.244.1.117  10.244.1.137  10.244.1.157  10.244.1.177  10.244.1.197  10.244.1.216  10.244.1.236  10.244.1.27   10.244.1.47  10.244.1.67  10.244.1.87
10.244.1.118  10.244.1.138  10.244.1.158  10.244.1.178  10.244.1.198  10.244.1.217  10.244.1.237  10.244.1.28   10.244.1.48  10.244.1.68  10.244.1.88
10.244.1.119  10.244.1.139  10.244.1.159  10.244.1.179  10.244.1.199  10.244.1.218  10.244.1.238  10.244.1.29   10.244.1.49  10.244.1.69  10.244.1.89

这已经将10.244.1.x段的所有ip占满，自然没有available的IP可供新pod使用了。至于为何占满，这个原因尚不明朗。下面两个open issue与这个问题相关：

https://github.com/containernetworking/cni/issues/306

https://github.com/kubernetes/kubernetes/issues/21656

进入到/var/lib/cni/networks/cbr0目录下，执行下面命令可以释放那些可能是kubelet leak的IP资源：

for hash in $(tail -n +1 * | grep '^[A-Za-z0-9]*$' | cut -c 1-8); do if [ -z $(docker ps -a | grep $hash | awk '{print $1}') ]; then grep -irl $hash ./; fi; done | xargs rm

执行后，目录下的文件列表变成了：

ls -l
total 32
drw-r--r-- 2 root root 12288 Dec 27 17:11 ./
drw-r--r-- 3 root root  4096 Dec 27 13:52 ../
-rw-r--r-- 1 root root    64 Dec 27 17:11 10.244.1.2
-rw-r--r-- 1 root root    64 Dec 27 17:11 10.244.1.3
-rw-r--r-- 1 root root    64 Dec 27 17:11 10.244.1.4
-rw-r--r-- 1 root root    10 Dec 27 17:11 last_reserved_ip

不过pod仍然处于失败状态，但这次失败的原因又发生了变化：

Events:
  FirstSeen    LastSeen    Count    From                    SubObjectPath    Type        Reason        Message
  ---------    --------    -----    ----                    -------------    --------    ------        -------
  23s        23s        1    {default-scheduler }                    Normal        Scheduled    Successfully assigned my-nginx-1948696469-7p4nn to iz2ze39jeyizepdxhwqci6z
  22s        1s        22    {kubelet iz2ze39jeyizepdxhwqci6z}            Warning        FailedSync    Error syncing pod, skipping: failed to "SetupNetwork" for "my-nginx-1948696469-7p4nn_default" with SetupNetworkError: "Failed to setup network for pod \"my-nginx-1948696469-7p4nn_default(a40fe652-cc14-11e6-8c42-00163e1001d7)\" using network plugins \"cni\": \"cni0\" already has an IP address different from 10.244.1.1/24; Skipping pod"

而/var/lib/cni/networks/cbr0目录下的文件又开始迅速增加！问题陷入僵局。

5、flannel vxlan不通，后端换udp，仍然不通

折腾到这里，基本筋疲力尽了。于是在两个node上执行kubeadm reset，准备重新来过。

kubeadm reset后，之前flannel创建的bridge device cni0和网口设备flannel.1依然健在。为了保证环境彻底恢复到初始状态，我们可以通过下面命令删除这两个设备：

# ifconfig  cni0 down
# brctl delbr cni0
# ip link delete flannel.1

有了前面几个问题的“磨炼”后，重新init和join的k8s cluster显得格外顺利。这次minion node没有再出现什么异常。

#  kubectl get nodes -o wide
NAME                      STATUS         AGE       EXTERNAL-IP
iz25beglnhtz              Ready,master   5m        <none>
iz2ze39jeyizepdxhwqci6z   Ready          51s       <none>

# kubectl get pod --all-namespaces
NAMESPACE     NAME                                   READY     STATUS    RESTARTS   AGE
default       my-nginx-1948696469-71h1l              1/1       Running   0          3m
default       my-nginx-1948696469-zwt5g              1/1       Running   0          3m
default       my-ubuntu-2560993602-ftdm6             1/1       Running   0          3m
kube-system   dummy-2088944543-lmlbh                 1/1       Running   0          5m
kube-system   etcd-iz25beglnhtz                      1/1       Running   0          6m
kube-system   kube-apiserver-iz25beglnhtz            1/1       Running   0          6m
kube-system   kube-controller-manager-iz25beglnhtz   1/1       Running   0          6m
kube-system   kube-discovery-1769846148-l5lfw        1/1       Running   0          5m
kube-system   kube-dns-2924299975-mdq5r              4/4       Running   0          5m
kube-system   kube-flannel-ds-9zwr1                  2/2       Running   0          5m
kube-system   kube-flannel-ds-p7xh2                  2/2       Running   0          1m
kube-system   kube-proxy-dwt5f                       1/1       Running   0          5m
kube-system   kube-proxy-vm6v2                       1/1       Running   0          1m
kube-system   kube-scheduler-iz25beglnhtz            1/1       Running   0          6m

接下来我们创建my-nginx deployment和service来测试flannel网络的连通性。通过curl my-nginx service的nodeport，发现可以reach master上的两个nginx pod，但是minion node上的pod依旧不通。

在master上看flannel docker的日志：

I1228 02:52:22.097083       1 network.go:225] L3 miss: 10.244.1.2
I1228 02:52:22.097169       1 device.go:191] calling NeighSet: 10.244.1.2, 46:6c:7a:a6:06:60
I1228 02:52:22.097335       1 network.go:236] AddL3 succeeded
I1228 02:52:55.169952       1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2
I1228 02:53:00.801901       1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2
I1228 02:53:03.801923       1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2
I1228 02:53:04.801764       1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2
I1228 02:53:05.801848       1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2
I1228 02:53:06.888269       1 network.go:225] L3 miss: 10.244.1.2
I1228 02:53:06.888340       1 device.go:191] calling NeighSet: 10.244.1.2, 46:6c:7a:a6:06:60
I1228 02:53:06.888507       1 network.go:236] AddL3 succeeded
I1228 02:53:39.969791       1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2
I1228 02:53:45.153770       1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2
I1228 02:53:48.154822       1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2
I1228 02:53:49.153774       1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2
I1228 02:53:50.153734       1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2
I1228 02:53:52.154056       1 network.go:225] L3 miss: 10.244.1.2
I1228 02:53:52.154110       1 device.go:191] calling NeighSet: 10.244.1.2, 46:6c:7a:a6:06:60
I1228 02:53:52.154256       1 network.go:236] AddL3 succeeded

日志中有大量：“Ignoring not a miss”字样的日志，似乎vxlan网络有问题。这个问题与下面issue中描述颇为接近：

https://github.com/coreos/flannel/issues/427

Flannel默认采用vxlan作为backend，使用kernel vxlan默认的udp 8742端口。Flannel还支持udp的backend，使用udp 8285端口。于是试着更换一下flannel后端。更换flannel后端的步骤如下：

将https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml文件下载到本地;
修改kube-flannel.yml文件内容：主要是针对net-conf.json属性，增加”Backend”字段属性：

---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "type": "flannel",
      "delegate": {
        "isDefaultGateway": true
      }
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "udp",
        "Port": 8285
      }
    }
---
... ...

卸载并重新安装pod网络

# kubectl delete -f kube-flannel.yml
configmap "kube-flannel-cfg" deleted
daemonset "kube-flannel-ds" deleted

# kubectl apply -f kube-flannel.yml
configmap "kube-flannel-cfg" created
daemonset "kube-flannel-ds" created

# netstat -an|grep 8285
udp        0      0 123.56.200.187:8285     0.0.0.0:*

经过测试发现：udp端口是通的。在两个node上tcpdump -i flannel0 可以看到udp数据包的发送和接收。但是两个node间的pod network依旧不通。

6、failed to register network: failed to acquire lease: node “iz25beglnhtz” not found

正常情况下master node和minion node上的flannel pod的启动日志如下：

master node flannel的运行:

I1227 04:56:16.577828       1 main.go:132] Installing signal handlers
I1227 04:56:16.578060       1 kube.go:233] starting kube subnet manager
I1227 04:56:16.578064       1 manager.go:133] Determining IP address of default interface
I1227 04:56:16.578576       1 manager.go:163] Using 123.56.200.187 as external interface
I1227 04:56:16.578616       1 manager.go:164] Using 123.56.200.187 as external endpoint
E1227 04:56:16.579079       1 network.go:106] failed to register network: failed to acquire lease: node "iz25beglnhtz" not found
I1227 04:56:17.583744       1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
I1227 04:56:17.585367       1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
I1227 04:56:17.587765       1 ipmasq.go:47] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE
I1227 04:56:17.589943       1 manager.go:246] Lease acquired: 10.244.0.0/24
I1227 04:56:17.590203       1 network.go:58] Watching for L3 misses
I1227 04:56:17.590255       1 network.go:66] Watching for new subnet leases
I1227 07:43:27.164103       1 network.go:153] Handling initial subnet events
I1227 07:43:27.164211       1 device.go:163] calling GetL2List() dev.link.Index: 5
I1227 07:43:27.164350       1 device.go:168] calling NeighAdd: 59.110.67.15, ca:50:97:1f:c2:ea

minion node上flannel的运行：

# docker logs 1f64bd9c0386
I1227 07:43:26.670620       1 main.go:132] Installing signal handlers
I1227 07:43:26.671006       1 manager.go:133] Determining IP address of default interface
I1227 07:43:26.670825       1 kube.go:233] starting kube subnet manager
I1227 07:43:26.671514       1 manager.go:163] Using 59.110.67.15 as external interface
I1227 07:43:26.671575       1 manager.go:164] Using 59.110.67.15 as external endpoint
I1227 07:43:26.746811       1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
I1227 07:43:26.749785       1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
I1227 07:43:26.752343       1 ipmasq.go:47] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE
I1227 07:43:26.755126       1 manager.go:246] Lease acquired: 10.244.1.0/24
I1227 07:43:26.755444       1 network.go:58] Watching for L3 misses
I1227 07:43:26.755475       1 network.go:66] Watching for new subnet leases
I1227 07:43:27.755830       1 network.go:153] Handling initial subnet events
I1227 07:43:27.755905       1 device.go:163] calling GetL2List() dev.link.Index: 10
I1227 07:43:27.756099       1 device.go:168] calling NeighAdd: 123.56.200.187, ca:68:7c:9b:cc:67

但在进行上面问题5的测试过程中，我们发现flannel container的启动日志中有如下错误：

master node:

# docker logs c2d1cee3df3d
I1228 06:53:52.502571       1 main.go:132] Installing signal handlers
I1228 06:53:52.502735       1 manager.go:133] Determining IP address of default interface
I1228 06:53:52.503031       1 manager.go:163] Using 123.56.200.187 as external interface
I1228 06:53:52.503054       1 manager.go:164] Using 123.56.200.187 as external endpoint
E1228 06:53:52.503869       1 network.go:106] failed to register network: failed to acquire lease: node "iz25beglnhtz" not found
I1228 06:53:52.503899       1 kube.go:233] starting kube subnet manager
I1228 06:53:53.522892       1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
I1228 06:53:53.524325       1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
I1228 06:53:53.526622       1 ipmasq.go:47] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE
I1228 06:53:53.528438       1 manager.go:246] Lease acquired: 10.244.0.0/24
I1228 06:53:53.528744       1 network.go:58] Watching for L3 misses
I1228 06:53:53.528777       1 network.go:66] Watching for new subnet leases

minion node:

# docker logs dcbfef45308b
I1228 05:28:05.012530       1 main.go:132] Installing signal handlers
I1228 05:28:05.012747       1 manager.go:133] Determining IP address of default interface
I1228 05:28:05.013011       1 manager.go:163] Using 59.110.67.15 as external interface
I1228 05:28:05.013031       1 manager.go:164] Using 59.110.67.15 as external endpoint
E1228 05:28:05.013204       1 network.go:106] failed to register network: failed to acquire lease: node "iz2ze39jeyizepdxhwqci6z" not found
I1228 05:28:05.013237       1 kube.go:233] starting kube subnet manager
I1228 05:28:06.041602       1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
I1228 05:28:06.042863       1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
I1228 05:28:06.044896       1 ipmasq.go:47] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE
I1228 05:28:06.046497       1 manager.go:246] Lease acquired: 10.244.1.0/24
I1228 05:28:06.046780       1 network.go:98] Watching for new subnet leases
I1228 05:28:07.047052       1 network.go:191] Subnet added: 10.244.0.0/24

两个Node都有“注册网络”失败的错误：failed to register network: failed to acquire lease: node “xxxx” not found。很难断定是否是因为这两个错误导致的两个node间的网络不通。从整个测试过程来看，这个问题时有时无。在下面flannel issue中也有类似的问题讨论：

https://github.com/coreos/flannel/issues/435

Flannel pod network的诸多问题让我决定暂时放弃在kubeadm创建的kubernetes cluster中继续使用Flannel。

四、Calico pod network

Kubernetes支持的pod network add-ons中，除了Flannel，还有calico、Weave net等。这里我们试试基于边界网关BGP协议实现的Calico pod network。Calico Project针对在kubeadm建立的K8s集群的Pod网络安装也有专门的文档。文档中描述的需求和约束我们均满足，比如：

master node带有kubeadm.alpha.kubernetes.io/role: master标签：

# kubectl get nodes -o wide --show-labels
NAME           STATUS         AGE       EXTERNAL-IP   LABELS
iz25beglnhtz   Ready,master   3m        <none>        beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubeadm.alpha.kubernetes.io/role=master,kubernetes.io/hostname=iz25beglnhtz

在安装calico之前，我们还是要执行kubeadm reset重置环境，并将flannel创建的各种网络设备删除，可参考上面几个小节中的命令。

1、初始化集群

使用calico的kubeadm init无需再指定–pod-network-cidr=10.244.0.0/16 option：

# kubeadm init --api-advertise-addresses=10.47.217.91
[kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters.
[preflight] Running pre-flight checks
[preflight] Starting the kubelet service
[init] Using Kubernetes version: v1.5.1
[tokens] Generated token: "531b3f.3bd900d61b78d6c9"
[certificates] Generated Certificate Authority key and certificate.
[certificates] Generated API Server key and certificate
[certificates] Generated Service Account signing keys
[certificates] Created keys and certificates in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[apiclient] Created API client, waiting for the control plane to become ready
[apiclient] All control plane components are healthy after 13.527323 seconds
[apiclient] Waiting for at least one node to register and become ready
[apiclient] First node is ready after 0.503814 seconds
[apiclient] Creating a test deployment
[apiclient] Test deployment succeeded
[token-discovery] Created the kube-discovery deployment, waiting for it to become ready
[token-discovery] kube-discovery is ready after 1.503644 seconds
[addons] Created essential addon: kube-proxy
[addons] Created essential addon: kube-dns

Your Kubernetes master has initialized successfully!

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:

http://kubernetes.io/docs/admin/addons/

You can now join any number of machines by running the following on each node:

kubeadm join --token=531b3f.3bd900d61b78d6c9 10.47.217.91

2、创建calico network

# kubectl apply -f http://docs.projectcalico.org/v2.0/getting-started/kubernetes/installation/hosted/kubeadm/calico.yaml
configmap "calico-config" created
daemonset "calico-etcd" created
service "calico-etcd" created
daemonset "calico-node" created
deployment "calico-policy-controller" created
job "configure-calico" created

实际创建过程需要一段时间，因为calico需要pull 一些images：

# docker images
REPOSITORY                                               TAG                        IMAGE ID            CREATED             SIZE
quay.io/calico/node                                      v1.0.0                     74bff066bc6a        7 days ago          256.4 MB
calico/ctl                                               v1.0.0                     069830246cf3        8 days ago          43.35 MB
calico/cni                                               v1.5.5                     ada87b3276f3        12 days ago         67.13 MB
gcr.io/google_containers/etcd                            2.2.1                      a6cd91debed1        14 months ago       28.19 MB

calico在master node本地创建了两个network device：

# ip a
... ...
47: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 192.168.91.0/32 scope global tunl0
       valid_lft forever preferred_lft forever
48: califa32a09679f@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 62:39:10:55:44:c8 brd ff:ff:ff:ff:ff:ff link-netnsid 0

3、minion node join

执行下面命令，将minion node加入cluster：

# kubeadm join --token=531b3f.3bd900d61b78d6c9 10.47.217.91

calico在minion node上也创建了一个network device:

57988: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 192.168.136.192/32 scope global tunl0
       valid_lft forever preferred_lft forever

join成功后，我们查看一下cluster status：

# kubectl get pods --all-namespaces -o wide
NAMESPACE     NAME                                       READY     STATUS    RESTARTS   AGE       IP             NODE
kube-system   calico-etcd-488qd                          1/1       Running   0          18m       10.47.217.91   iz25beglnhtz
kube-system   calico-node-jcb3c                          2/2       Running   0          18m       10.47.217.91   iz25beglnhtz
kube-system   calico-node-zthzp                          2/2       Running   0          4m        10.28.61.30    iz2ze39jeyizepdxhwqci6z
kube-system   calico-policy-controller-807063459-f21q4   1/1       Running   0          18m       10.47.217.91   iz25beglnhtz
kube-system   dummy-2088944543-rtsfk                     1/1       Running   0          23m       10.47.217.91   iz25beglnhtz
kube-system   etcd-iz25beglnhtz                          1/1       Running   0          23m       10.47.217.91   iz25beglnhtz
kube-system   kube-apiserver-iz25beglnhtz                1/1       Running   0          23m       10.47.217.91   iz25beglnhtz
kube-system   kube-controller-manager-iz25beglnhtz       1/1       Running   0          23m       10.47.217.91   iz25beglnhtz
kube-system   kube-discovery-1769846148-51wdk            1/1       Running   0          23m       10.47.217.91   iz25beglnhtz
kube-system   kube-dns-2924299975-fhf5f                  4/4       Running   0          23m       192.168.91.1   iz25beglnhtz
kube-system   kube-proxy-2s7qc                           1/1       Running   0          4m        10.28.61.30    iz2ze39jeyizepdxhwqci6z
kube-system   kube-proxy-h2qds                           1/1       Running   0          23m       10.47.217.91   iz25beglnhtz
kube-system   kube-scheduler-iz25beglnhtz                1/1       Running   0          23m       10.47.217.91   iz25beglnhtz

所有组件都是ok的。似乎是好兆头！但跨node的pod network是否联通，还需进一步探究。

4、探究跨node的pod network联通性

我们依旧利用上面测试flannel网络的my-nginx-svc.yaml和run-my-nginx.yaml，创建my-nginx service和my-nginx deployment。注意：这之前要先在master node上执行一下”kubectl taint nodes –all dedicated-”，以让master node承载work load。

遗憾的是，结果和flannel很相似，分配到master node上http request得到了nginx的响应；minion node上的pod依旧无法联通。

这次我不想在calico这块过多耽搁，我要快速看看下一个候选者：weave net是否满足要求。

由于wordpress莫名其妙的问题，导致这篇文章无法发布完整，因此将其拆分为两个部分，本文为第一部分，第二部分请移步这里阅读。

一篇文章带你了解Kubernetes安装

十月 18, 2016
13 条评论

由于之前在阿里云上部署的Docker 1.12.2的Swarm集群没能正常展示出其所宣称的Routing mesh和VIP等功能，为了满足项目需要，我们只能转向另外一种容器集群管理和服务编排工具Kubernetes。

注：之前Docker1.12集群的Routing mesh和VIP功能失效的问题，经过在github上与Docker开发人员的沟通，目前已经将问题原因缩小在阿里云的网络上面，目前看是用于承载vxlan数据通信的节点4789 UDP端口不通的问题，针对这个问题，我正在通过阿里云售后工程师做进一步沟通，希望能找出真因。

Kubernetes(以下称k8s)是Google开源的一款容器集群管理工具，是Google内部工具Borg的“开源版”。背靠Google这个高大上的亲爹，k8s一出生就吸引了足够的眼球，并得到了诸多知名IT公司的支持。至于Google开源k8s的初衷，美好的说法是Google希望通过输出自己在容器领域长达10多年的丰富经验，帮助容器领域的开发人员和客户提升开发效率和容器管理的档次。但任何一种公司行为都会有其背后的短期或长期的商业目的，Google作为一个商业公司也不会例外。Google推出k8s到底为啥呢？众说纷纭。一种说法是Google通过k8s输出其容器工具的操作和使用方法、API标准等，为全世界的开发人员使用其公有容器预热并提供“零门槛”体验。

k8s目前是公认的最先进的容器集群管理工具，在1.0版本发布后，k8s的发展速度更加迅猛，并且得到了容器生态圈厂商的全力支持，这包括coreos、rancher等，诸多提供公有云服务的厂商在提供容器服务时也都基于k8s做二次开发来提供基础设施层的支撑，比如华为。可以说k8s也是Docker进军容器集群管理和服务编排领域最为强劲的竞争对手。

不过和已经原生集成了集群管理工具swarmkit的Docker相比，k8s在文档、安装和集群管理方面的体验还有很大的提升空间。k8s最新发布的1.4版本就是一个着重在这些方面进行改善的版本。比如1.4版本对于Linux主要发行版本Ubuntu Xenial和Red Hat centos7的用户，可以使用熟悉的apt-get和yum来直接安装Kubernetes。再比如，1.4版本引入了kubeadm命令，将集群启动简化为两条命令，不需要再使用复杂的kube-up脚本。

但对于1.4版本以前的1.3.x版本来说，安装起来的赶脚用最近流行的网络词汇来形容就是“蓝瘦，香菇”，但有些时候我们还不得不去挑战这个过程，本文要带大家了解的就是利用阿里云国内区的ECS主机，在Ubuntu 14.04.4操作系统上安装k8s 1.3.7版本的方法和安装过程。

零、心理建设

由于k8s是Google出品，很多组件与google是“打断了骨头还连着筋”，因此在国内网络中安装k8s是需要先进行心理建设的^_^，因为和文档中宣称的k8s 1.4版的安装或docker 1.12.x的安装相比，k8s 1.3.7版本的安装简直就是“灾难级”的。

要想让这一过程适当顺利一些，我们必须准备一个“加速器（你懂的）”。利用加速器应对三件事：慢、断和无法连接。

慢：国内从github或其他国外公有云上下东西简直太慢了，稍大一些的文件，通常都是几个小时或是10几个小时。
断：你说慢就算了，还总断。断了之后，遇到不支持断点续传的，一切还得重来。动不动就上G的文件，重来的时间成本是我们无法承受的。
无法连接：这个你知道的，很多托管在google名下的东西，你总是无法下载的。

总而言之，k8s的安装和容器集群的搭建过程是一个“漫长”且可能反复的过程，需要做好心理准备。

BTW，我在安装过程使用的网友noah_昨夜星辰推荐的多态加速器，只需配置一个http_proxy即可，尤其适合服务器后台加速，非常方便，速度也很好。

一、安装模型

k8s的文档不可谓不丰富，尤其在k8s安装这个环节，k8s提供了针对各种云平台、裸机、各类OS甚至各类cluster network model实现的安装文档，你着实得费力挑选一个最适合自己情况的。

由于目前国内阿里云尚未提供Ubuntu 16.04LTS版本虚拟机镜像（通过apt-get install可直接安装最新1.4.x版本k8s），我们只能用ubuntu 14.04.x来安装k8s 1.3.x版本，k8s 1.4版本使用了systemd的相关组件，在ubuntu 14.04.x上手工安装k8s 1.4难度估计将是“地狱级”的。网络模型实现我选择coreos提供的flannel，因此我们需要参考的是由国内浙大团队维护的这份k8s安装文档。浙大的这份安装文档针对的是k8s 1.2+的，从文档评分来看，只是二星半，由此推断，完全按照文档中的步骤安装，成功与否要看运气^_^。注意该文档中提到：文档针对ubuntu 14.04是测试ok的，但由于ubuntu15.xx使用systemd替代upstart了，因此无法保证在ubuntu 15.xx上可以安装成功。

关于k8s的安装过程，网上也有很多资料，多数资料一上来就是下载xxx，配置yyy，install zzz，缺少一个k8s安装的总体视图。与内置编排引擎swarmkit的单一docker engine的安装不同，k8s是由一系列核心组件配合协作共同完成容器集群调度和服务编排功能的，安装k8s实际上就是将不同组件安装到承担不同角色的节点上去。

k8s的节点只有两种角色：master和minion，对比Docker swarm集群，master相当于docker swarm集群中的manager，而minion则相当于docker swarm集群中的worker。

在master节点上运行的k8s核心组件包括：

# ls /opt/bin|grep kube
kube-apiserver
kube-controller-manager
kubelet
kube-proxy
kube-scheduler

在minion节点上，k8s核心组件较少，包括：

# ls /opt/bin|grep kube
kubelet
kube-proxy

k8s的安装模型可以概述为：在安装机上将k8s的各个组件分别部署到不同角色的节点上去(通过ssh远程登录到各节点)，并启动起来。用下面这个简易图表达起来可能更加形象：

安装机(放置k8s的安装程序和安装脚本） ----- install k8s core components to(via ssh) ---->  master and minion nodes

在安装之前，这里再明确一下我所用的环境信息：

阿里云ECS: Ubuntu 14.04.4 LTS (GNU/Linux 3.19.0-70-generic x86_64)

root@iZ25cn4xxnvZ:~# docker version
Client:
Version: 1.12.2
API version: 1.24
Go version: go1.6.3
Git commit: bb80604
Built: Tue Oct 11 17:00:50 2016
OS/Arch: linux/amd64

Server:
Version: 1.12.2
API version: 1.24
Go version: go1.6.3
Git commit: bb80604
Built: Tue Oct 11 17:00:50 2016
OS/Arch: linux/amd64

二、先决条件

根据浙大团队的那篇在Ubuntu上安装k8s的文章，在真正安装k8s组件之前，需要先满足一些先决条件：

1、安装Docker

关于Docker的文档，不得不说，写的还是不错的。Docker到目前为止已经发展了许多年了，其在Ubuntu上的安装已经逐渐成熟了。在其官方文档中有针对ubuntu 12.04、14.04和16.04的详细安装说明。如果你的Ubuntu服务器上docker版本较低，还可以用国内Daocloud提供的一键安装服务来安装最新版的Docker。

2、安装bridge-utils

安装网桥管理工具：

[sudo] apt-get install bridge-utils

安装后，可以测试一下安装是否ok：

root@iZ25cn4xxnvZ:~# brctl show
bridge name    bridge id        STP enabled    interfaces
docker0        8000.0242988b938c    no        veth901efcb
docker_gwbridge        8000.0242bffb02d5    no        veth21546ed
                            veth984b294

3、确保master node可以连接互联网并下载必要的文件

这里要提到的是为master node配置上”加速器”。同时如果master node还承担逻辑上的minion node角色，还需要为节点上Docker配置上加速器（如果加速器是通过代理配置的），minion node上亦是如此，比如：

/etc/default/docker

export http_proxy=http://duotai:xxxxx@sheraton.h.xduotai.com:24448
export https_proxy=$http_proxy

4、在安装机上配置自动免密ssh登录各个master node 和minion node

我在阿里云上开了两个ECS（暂成为node1 – 10.47.136.60和node2 – 10.46.181.146），我的k8s集群就由这两个物理node承载，但在逻辑上node1和node2承担着多种角色，逻辑上这是一个由一个master node和两个minion node组成的k8s集群：

安装机：node1
master node：node1
minion node: node1和node2

因此为了满足安装机到各个k8s node免密ssh登录的先决条件，我需要实现从安装机(node1)到master node(node1)和minion node(node1和node2)的免费ssh登录设置。

在安装机node上执行：

# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
... ...

安装机免密登录逻辑意义上的master node(实际上就是登录自己，即node1）：

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

安装机免费登录minion node(node2)：

将公钥复制到server：
#scp ~/.ssh/id_rsa.pub root@10.46.181.146:/root/id_rsa.pub
The authenticity of host '10.46.181.146 (10.46.181.146)' can't be established.
ECDSA key fingerprint is b7:31:8d:33:f5:6e:ef:a4:a1:cc:72:5f:cf:68:c6:3d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.46.181.146' (ECDSA) to the list of known hosts.
root@10.46.181.146's password:
id_rsa.pub

在minion node，即node2上，导入安装机的公钥并修改访问权限：

cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
root@iZ25mjza4msZ:~# chmod 700 ~/.ssh
root@iZ25mjza4msZ:~#     chmod 600 ~/.ssh/authorized_keys

配置完成，你可以在安装机上测试一下到自身(node1)和到node2的免密登录，以免密登录node2为例：

root@iZ25cn4xxnvZ:~/.ssh# ssh 10.46.181.146
Welcome to Ubuntu 14.04.4 LTS (GNU/Linux 3.19.0-70-generic x86_64)

 * Documentation:  https://help.ubuntu.com/
New release '16.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

Welcome to aliyun Elastic Compute Service!

Last login: Thu Oct 13 12:55:21 2016 from 218.25.32.210

5、下载pause-amd64镜像

k8s集群启动后，启动容器时会去下载google的gcr.io/google_containers下的一个pause-amd64镜像，为了避免那时出错时不便于查找，这些先下手为强，先通过“加速器”将该镜像下载到各个k8s node上：

修改/etc/default/docker，添加带有加速器的http_proxy/https_proxy，并增加–insecure-registry gcr.io

# If you need Docker to use an HTTP proxy, it can also be specified here.
export http_proxy=http://duotai:xxxx@sheraton.h.xduotai.com:24448
export https_proxy=http://duotai:xxxx@sheraton.h.xduotai.com:24448

# This is also a handy place to tweak where Docker's temporary files go.
#export TMPDIR="/mnt/bigdrive/docker-tmp"
DOCKER_OPTS="$DOCKER_OPTS -H unix:///var/run/docker.sock -H tcp://0.0.0.0:2375 --insecure-registry gcr.io"

重启docker daemon服务。下载pause-amd64 image：

root@iZ25cn4xxnvZ:~# docker search gcr.io/google_containers/pause-amd64
NAME                            DESCRIPTION   STARS     OFFICIAL   AUTOMATED
google_containers/pause-amd64                 0
root@iZ25cn4xxnvZ:~# docker pull gcr.io/google_containers/pause-amd64
Using default tag: latest
Pulling repository gcr.io/google_containers/pause-amd64
Tag latest not found in repository gcr.io/google_containers/pause-amd64

latest标签居然都没有，尝试下载3.0标签的pause-amd64：

root@iZ25cn4xxnvZ:~# docker pull gcr.io/google_containers/pause-amd64:3.0
3.0: Pulling from google_containers/pause-amd64
a3ed95caeb02: Pull complete
f11233434377: Pull complete
Digest: sha256:163ac025575b775d1c0f9bf0bdd0f086883171eb475b5068e7defa4ca9e76516
Status: Downloaded newer image for gcr.io/google_containers/pause-amd64:3.0

三、设置工作目录，进行安装前的各种配置

到目前为止，所有node上，包括安装机node上还是“一无所有”的。接下来，我们开始在安装机node上做文章。

俗话说：“巧妇不为无米炊”。安装机想在各个node上安装k8s组件，安装机本身就要有”米”才行，这个米就是k8s源码包或release包中的安装脚本。

在官方文档中，这个获取“米”的步骤为clone k8s的源码库。由于之前就下载了k8s 1.3.7的release包，这里我就直接使用release包中的”米”。

解压kubernetes.tar.gz后，在当前目录下将看到kubernetes目录：

root@iZ25cn4xxnvZ:~/k8stest/1.3.7/kubernetes# ls -F
cluster/  docs/  examples/  federation/  LICENSES  platforms/  README.md  server/  third_party/  Vagrantfile  version

这个kubernetes目录就是我们安装k8s的工作目录。由于我们在ubuntu上安装k8s，因此我们实际上要使用的脚本都在工作目录下的cluster/ubuntu下面，后续有详细说明。

在安装机上，我们最终是要执行这样一句脚本的：

KUBERNETES_PROVIDER=ubuntu ./cluster/kube-up.sh

在provider=ubuntu的情况下，./cluster/kube-up.sh最终会调用到./cluster/ubuntu/util.sh中的kube-up shell函数，kube-up函数则会调用./cluster/ubuntu/download-release.sh下载k8s安装所使用到的所有包，包括k8s的安装包(kubernetes.tar.gz)、etcd和flannel等。由于之前我们已经下载完k8s的1.3.7版本release包了，这里我们就需要对down-release.sh做一些修改，防止重新下载，导致安装时间过长。

./cluster/ubuntu/download-release.sh

    # KUBE_VERSION=$(get_latest_version_number | sed 's/^v//')
    #curl -L https://github.com/kubernetes/kubernetes/releases/download/v${KUBE_VERSION}/kubernetes.tar.gz -o kubernetes.tar.gz

这种情况下，你还需要把已经下载的kubernetes.tar.gz文件copy一份，放到./cluster/ubuntu下面。

如果你的网络访问国外主机足够快，你还有足够耐心，那么你大可忽略上面脚本修改的步骤。

在真正执行./cluster/kube-up.sh之前，安装机还需要知道：

1、k8s物理集群都有哪些node组成，node的角色都是什么？
2、k8s的各个依赖程序，比如etcd的版本是什么？

我们需要通过配置./cluster/ubuntu/config-default.sh让./cluster/kube-up.sh获取这些信息。

./cluster/ubuntu/config-default.sh

# node信息，本集群由两个物理node组成，其中第一个node既是master，也是minion
export nodes=${nodes:-"root@10.47.136.60  root@10.46.181.146"}
roles=${roles:-"ai i"}

# minion node个数
export NUM_NODES=${NUM_NODES:-2}

# 为安装脚本配置网络代理，这里主要是为了使用加速器，方便或加速下载一些包
PROXY_SETTING=${PROXY_SETTING:-"http_proxy=http://duotai:xxxx@sheraton.h.xduotai.com:24448 https_proxy=http://duotai:xxxx@sheraton.h.xduotai.com:24448"}

通过环境变量设置k8s要下载的依赖程序的版本：

export KUBE_VERSION=1.3.7
export FLANNEL_VERSION=0.5.5
export ETCD_VERSION=3.0.12

如果不设置环境变量，./cluster/ubuntu/download-release.sh中默认的版本号将是：

k8s： 最新版本
etcd：2.3.1
flannel : 0.5.5

四、执行安装

在安装机上，进入./cluster目录，执行如下安装命令：

KUBERNETES_PROVIDER=ubuntu ./kube-up.sh

执行输出如下：

root@iZ25cn4xxnvZ:~/k8stest/1.3.7/kubernetes/cluster# KUBERNETES_PROVIDER=ubuntu ./kube-up.sh
... Starting cluster using provider: ubuntu
... calling verify-prereqs
Identity added: /root/.ssh/id_rsa (/root/.ssh/id_rsa)
... calling kube-up
~/k8stest/1.3.7/kubernetes/cluster/ubuntu ~/k8stest/1.3.7/kubernetes/cluster

Prepare flannel 0.5.5 release ...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   608    0   608    0     0    410      0 --:--:--  0:00:01 --:--:--   409
100 3408k  100 3408k    0     0   284k      0  0:00:11  0:00:11 --:--:--  389k

Prepare etcd 3.0.12 release ...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   607    0   607    0     0    388      0 --:--:--  0:00:01 --:--:--   388
  3  9.8M    3  322k    0     0  84238      0  0:02:02  0:00:03  0:01:59  173k
100  9.8M  100  9.8M    0     0   327k      0  0:00:30  0:00:30 --:--:--  344k

Prepare kubernetes 1.3.7 release ...

~/k8stest/1.3.7/kubernetes/cluster/ubuntu/kubernetes/server ~/k8stest/1.3.7/kubernetes/cluster/ubuntu ~/k8stest/1.3.7/kubernetes/cluster
~/k8stest/1.3.7/kubernetes/cluster/ubuntu ~/k8stest/1.3.7/kubernetes/cluster

Done! All your binaries locate in kubernetes/cluster/ubuntu/binaries directory
~/k8stest/1.3.7/kubernetes/cluster

Deploying master and node on machine 10.47.136.60
saltbase/salt/generate-cert/make-ca-cert.sh: No such file or directory
easy-rsa.tar.gz                                                                                                                               100%   42KB  42.4KB/s   00:00
config-default.sh                                                                                                                             100% 5610     5.5KB/s   00:00
util.sh                                                                                                                                       100%   29KB  28.6KB/s   00:00
kubelet.conf                                                                                                                                  100%  644     0.6KB/s   00:00
kube-proxy.conf                                                                                                                               100%  684     0.7KB/s   00:00
kubelet                                                                                                                                       100% 2158     2.1KB/s   00:00
kube-proxy                                                                                                                                    100% 2233     2.2KB/s   00:00
etcd.conf                                                                                                                                     100%  709     0.7KB/s   00:00
kube-scheduler.conf                                                                                                                           100%  674     0.7KB/s   00:00
kube-apiserver.conf                                                                                                                           100%  674     0.7KB/s   00:00
kube-controller-manager.conf                                                                                                                  100%  744     0.7KB/s   00:00
kube-scheduler                                                                                                                                100% 2360     2.3KB/s   00:00
kube-controller-manager                                                                                                                       100% 2672     2.6KB/s   00:00
kube-apiserver                                                                                                                                100% 2358     2.3KB/s   00:00
etcd                                                                                                                                          100% 2073     2.0KB/s   00:00
reconfDocker.sh                                                                                                                               100% 2074     2.0KB/s   00:00
kube-scheduler                                                                                                                                100%   56MB  56.2MB/s   00:01
kube-controller-manager                                                                                                                       100%   95MB  95.4MB/s   00:01
kube-apiserver                                                                                                                                100%  105MB 104.9MB/s   00:00
etcdctl                                                                                                                                       100%   18MB  17.6MB/s   00:00
flanneld                                                                                                                                      100%   16MB  15.8MB/s   00:01
etcd                                                              100% 2074     2.0KB/s   00:00
kube-scheduler                                                                                              100%   56MB  56.2MB/s   0         100%   56MB  56.2MB/s   00:01
kube-controller-manager                                                                                     100%   95MB  95.4MB/s             100%   95MB  95.4MB/s   00:01
kube-apiserver                                                                                             100%  105MB 104.9MB/s              100%  105MB 104.9MB/s   00:00
etcdctl                                                                                                    100%   18MB  17.6MB/s us           100%   18MB  17.6MB/s   00:00
flanneld                 10                                                                                100%   16MB  15.8MB/sge            100%   16MB  15.8MB/s   00:01

... ...

结果中并没有出现代表着安装成功的如下log字样：

Cluster validation succeeded

查看上面安装日志输出，发现在向10.47.136.60 master节点部署组件时，出现如下错误日志：

saltbase/salt/generate-cert/make-ca-cert.sh: No such file or directory

查看一下./cluster下的确没有saltbase目录，这个问题在网上找到了答案，解决方法如下：

k8s安装包目录下，在./server/kubernetes下已经有salt包：kubernetes-salt.tar.gz，解压后，将saltbase整个目录cp到.cluster/下即可。

再次执行：KUBERNETES_PROVIDER=ubuntu ./kube-up.sh，可以看到如下执行输出：

... ...

Deploying master and node on machine 10.47.136.60
make-ca-cert.sh                                                                                                                               100% 4028     3.9KB/s   00:00
easy-rsa.tar.gz                                                                                                                               100%   42KB  42.4KB/s   00:00
config-default.sh                                                                                                                             100% 5632     5.5KB/s   00:00
util.sh                                                                                                                                       100%   29KB  28.6KB/s   00:00
kubelet.conf                                                                                                                                  100%  644     0.6KB/s   00:00
kube-proxy.conf                                                                                                                               100%  684     0.7KB/s   00:00
kubelet                                                                                                                                       100% 2158     2.1KB/s   00:00
kube-proxy                                                                                                                                    100% 2233     2.2KB/s   00:00
etcd.conf                                                                                                                                     100%  709     0.7KB/s   00:00
kube-scheduler.conf                                                                                                                           100%  674     0.7KB/s   00:00
kube-apiserver.conf                                                                                                                           100%  674     0.7KB/s   00:00
kube-controller-manager.conf                                                                                                                  100%  744     0.7KB/s   00:00
kube-scheduler                                                                                                                                100% 2360     2.3KB/s   00:00
kube-controller-manager                                                                                                                       100% 2672     2.6KB/s   00:00
kube-apiserver                                                                                                                                100% 2358     2.3KB/s   00:00
etcd                                                                                                                                          100% 2073     2.0KB/s   00:00
reconfDocker.sh                                                                                                                               100% 2074     2.0KB/s   00:00
kube-scheduler                                                                                                                                100%   56MB  56.2MB/s   00:01
kube-controller-manager                                                                                                                       100%   95MB  95.4MB/s   00:00
kube-apiserver                                                                                                                                100%  105MB 104.9MB/s   00:01
etcdctl                                                                                                                                       100%   18MB  17.6MB/s   00:00
flanneld                                                                                                                                      100%   16MB  15.8MB/s   00:00
etcd                                                                                                                                          100%   19MB  19.3MB/s   00:00
flanneld                                                                                                                                      100%   16MB  15.8MB/s   00:00
kubelet                                                                                                                                       100%  103MB 103.1MB/s   00:01
kube-proxy                                                                                                                                    100%   48MB  48.4MB/s   00:00
flanneld.conf                                                                                                                                 100%  577     0.6KB/s   00:00
flanneld                                                                                                                                      100% 2121     2.1KB/s   00:00
flanneld.conf                                                                                                                                 100%  568     0.6KB/s   00:00
flanneld                                                                                                                                      100% 2131     2.1KB/s   00:00
etcd start/running, process 7997
Error:  dial tcp 127.0.0.1:2379: getsockopt: connection refused
{"Network":"172.16.0.0/16", "Backend": {"Type": "vxlan"}}
{"Network":"172.16.0.0/16", "Backend": {"Type": "vxlan"}}
docker stop/waiting
docker start/running, process 8220
Connection to 10.47.136.60 closed.

Deploying node on machine 10.46.181.146
config-default.sh                                                                                                                             100% 5632     5.5KB/s   00:00
util.sh                                                                                                                                       100%   29KB  28.6KB/s   00:00
reconfDocker.sh                                                                                                                               100% 2074     2.0KB/s   00:00
kubelet.conf                                                                                                                                  100%  644     0.6KB/s   00:00
kube-proxy.conf                                                                                                                               100%  684     0.7KB/s   00:00
kubelet                                                                                                                                       100% 2158     2.1KB/s   00:00
kube-proxy                                                                                                                                    100% 2233     2.2KB/s   00:00
flanneld                                                                                                                                      100%   16MB  15.8MB/s   00:00
kubelet                                                                                                                                       100%  103MB 103.1MB/s   00:01
kube-proxy                                                                                                                                    100%   48MB  48.4MB/s   00:00
flanneld.conf                                                                                                                                 100%  577     0.6KB/s   00:00
flanneld                                                                                                                                      100% 2121     2.1KB/s   00:00
flanneld start/running, process 2365
docker stop/waiting
docker start/running, process 2574
Connection to 10.46.181.146 closed.
Validating master
Validating root@10.47.136.60
Validating root@10.46.181.146
Using master 10.47.136.60
cluster "ubuntu" set.
user "ubuntu" set.
context "ubuntu" set.
switched to context "ubuntu".
Wrote config for ubuntu to /root/.kube/config
... calling validate-cluster

Error from server: an error on the server has prevented the request from succeeding
(kubectl failed, will retry 2 times)

Error from server: an error on the server has prevented the request from succeeding
(kubectl failed, will retry 1 times)

Error from server: an error on the server has prevented the request from succeeding
('kubectl get nodes' failed, giving up)

安装并未成功，至少calling validate-cluster后的validation过程并未成功。

但是和第一次的失败有所不同的是，在master node和minion node上，我们都可以看到已经安装并启动了的k8s核心组件：

master node：

root@iZ25cn4xxnvZ:~/k8stest/1.3.7/kubernetes/cluster# ps -ef|grep kube
root      8006     1  0 16:39 ?        00:00:00 /opt/bin/kube-scheduler --logtostderr=true --master=127.0.0.1:8080
root      8008     1  0 16:39 ?        00:00:01 /opt/bin/kube-apiserver --insecure-bind-address=0.0.0.0 --insecure-port=8080 --etcd-servers=http://127.0.0.1:4001 --logtostderr=true --service-cluster-ip-range=192.168.3.0/24 --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,SecurityContextDeny,ResourceQuota --service-node-port-range=30000-32767 --advertise-address=10.47.136.60 --client-ca-file=/srv/kubernetes/ca.crt --tls-cert-file=/srv/kubernetes/server.cert --tls-private-key-file=/srv/kubernetes/server.key
root      8009     1  0 16:39 ?        00:00:02 /opt/bin/kube-controller-manager --master=127.0.0.1:8080 --root-ca-file=/srv/kubernetes/ca.crt --service-account-private-key-file=/srv/kubernetes/server.key --logtostderr=true
root      8021     1  0 16:39 ?        00:00:04 /opt/bin/kubelet --hostname-override=10.47.136.60 --api-servers=http://10.47.136.60:8080 --logtostderr=true --cluster-dns=192.168.3.10 --cluster-domain=cluster.local --config=
root      8023     1  0 16:39 ?        00:00:00 /opt/bin/kube-proxy --hostname-override=10.47.136.60 --master=http://10.47.136.60:8080 --logtostderr=true

minion node：

root@iZ25mjza4msZ:~# ps -ef|grep kube
root      2370     1  0 16:39 ?        00:00:04 /opt/bin/kubelet --hostname-override=10.46.181.146 --api-servers=http://10.47.136.60:8080 --logtostderr=true --cluster-dns=192.168.3.10 --cluster-domain=cluster.local --config=
root      2371     1  0 16:39 ?        00:00:00 /opt/bin/kube-proxy --hostname-override=10.46.181.146 --master=http://10.47.136.60:8080 --logtostderr=true

那为什么安装节点上的安装脚本在验证安装是否成功时一直阻塞、最终超时失败呢？我在安装节点，同时也是master node上执行了一下kubectl get node命令：

root@iZ25cn4xxnvZ:~/k8stest/1.3.7/kubernetes/cluster# kubectl get nodes

Error from server: an error on the server ("<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">\n<html><head>\n<meta type=\"copyright\" content=\"Copyright (C) 1996-2015 The Squid Software Foundation and contributors\">\n<meta http-equiv=\"Content-Type\" CONTENT=\"text/html; charset=utf-8\">\n<title>ERROR: The requested URL could not be retrieved</title>\n<style type=\"text/css\"><!-- \n /*\n * Copyright (C) 1996-2015 The Squid Software Foundation and contributors\n *\n * Squid software is distributed under GPLv2+ license and includes\n * contributions from numerous individuals and organizations.\n * Please see the COPYING and CONTRIBUTORS files for details.\n */\n\n/*\n Stylesheet for Squid Error pages\n Adapted from design by Free CSS Templates\n http://www.freecsstemplates.org\n Released for free under a Creative Commons Attribution 2.5 License\n*/\n\n/* Page basics */\n* {\n\tfont-family: verdana, sans-serif;\n}\n\nhtml body {\n\tmargin: 0;\n\tpadding: 0;\n\tbackground: #efefef;\n\tfont-size: 12px;\n\tcolor: #1e1e1e;\n}\n\n/* Page displayed title area */\n#titles {\n\tmargin-left: 15px;\n\tpadding: 10px;\n\tpadding-left: 100px;\n\tbackground: url('/squid-internal-static/icons/SN.png') no-repeat left;\n}\n\n/* initial title */\n#titles h1 {\n\tcolor: #000000;\n}\n#titles h2 {\n\tcolor: #000000;\n}\n\n/* special event: FTP success page titles */\n#titles ftpsuccess {\n\tbackground-color:#00ff00;\n\twidth:100%;\n}\n\n/* Page displayed body content area */\n#content {\n\tpadding: 10px;\n\tbackground: #ffffff;\n}\n\n/* General text */\np {\n}\n\n/* error brief description */\n#error p {\n}\n\n/* some data which may have caused the problem */\n#data {\n}\n\n/* the error message received from the system or other software */\n#sysmsg {\n}\n\npre {\n    font-family:sans-serif;\n}\n\n/* special event: FTP / Gopher directory listing */\n#dirmsg {\n    font-family: courier;\n    color: black;\n    font-size: 10pt;\n}\n#dirlisting {\n    margin-left: 2%;\n    margin-right: 2%;\n}\n#dirlisting tr.entry td.icon,td.filename,td.size,td.date {\n    border-bottom: groove;\n}\n#dirlisting td.size {\n    width: 50px;\n    text-align: right;\n    padding-right: 5px;\n}\n\n/* horizontal lines */\nhr {\n\tmargin: 0;\n}\n\n/* page displayed footer area */\n#footer {\n\tfont-size: 9px;\n\tpadding-left: 10px;\n}\n\n\nbody\n:lang(fa) { direction: rtl; font-size: 100%; font-family: Tahoma, Roya, sans-serif; float: right; }\n:lang(he) { direction: rtl; }\n --></style>\n</head><body id=ERR_CONNECT_FAIL>\n<div id=\"titles\">\n<h1>ERROR</h1>\n<h2>The requested URL could not be retrieved</h2>\n</div>\n<hr>\n\n<div id=\"content\">\n<p>The following error was encountered while trying to retrieve the URL: <a href=\"http://10.47.136.60:8080/api\">http://10.47.136.60:8080/api</a></p>\n\n<blockquote id=\"error\">\n<p><b>Connection to 10.47.136.60 failed.</b></p>\n</blockquote>\n\n<p id=\"sysmsg\">The system returned: <i>(110) Connection timed out</i></p>\n\n<p>The remote host or network may be down. Please try the request again.</p>\n\n<p>Your cache administrator is <a href=\"mailto:webmaster?subject=CacheErrorInfo%20-%20ERR_CONNECT_FAIL&amp;body=CacheHost%3A%20192-241-236-182%0D%0AErrPage%3A%20ERR_CONNECT_FAIL%0D%0AErr%3A%20(110)%20Connection%20timed%20out%0D%0ATimeStamp%3A%20Thu,%2013%20Oct%202016%2008%3A49%3A35%20GMT%0D%0A%0D%0AClientIP%3A%20127.0.0.1%0D%0AServerIP%3A%2010.47.136.60%0D%0A%0D%0AHTTP%20Request%3A%0D%0AGET%20%2Fapi%20HTTP%2F1.1%0AUser-Agent%3A%20kubectl%2Fv1.4.0%20(linux%2Famd64)%20kubernetes%2F4b28af1%0D%0AAccept%3A%20application%2Fjson,%20*%2F*%0D%0AAccept-Encoding%3A%20gzip%0D%0AHost%3A%2010.47.136.60%3A8080%0D%0A%0D%0A%0D%0A\">webmaster</a>.</p>\n\n<br>\n</div>\n\n<hr>\n<div id=\"footer\">\n<p>Generated Thu, 13 Oct 2016 08:49:35 GMT by 192-241-236-182 (squid/3.5.12)</p>\n<!-- ERR_CONNECT_FAIL -->\n</div>\n</body></html>") has prevented the request from succeeding

可以看到kubectl得到一坨信息，这是一个html页面内容的数据，仔细分析body内容，我们可以看到：

<body id=ERR_CONNECT_FAIL>\n<div id=\"titles\">\n<h1>ERROR</h1>\n<h2>The requested URL could not be retrieved</h2>\n</div>\n<hr>\n\n<div id=\"content\">\n<p>The following error was encountered while trying to retrieve the URL: <a href=\"http://10.47.136.60:8080/api\">http://10.47.136.60:8080/api</a></p>\n\n<blockquote id=\"error\">\n<p><b>Connection to 10.47.136.60 failed.</b></p>\n</blockquote>\n\n<p id=\"sysmsg\">The system returned: <i>(110) Connection timed out</i></p>\n\n<p>The remote host or network may be down. Please try the request again.</p>

kubectl在访问http://10.47.136.60:8080/api这个url时出现了timed out错误。在master node上直接执行curl http://10.47.136.60:8080/api也是这个错误。猜想是否是我.bashrc中的http_proxy在作祟。于是在.bashrc中增加no_proxy:

export no_proxy='10.47.136.60,10.46.181.146,localhost,127.0.0.1'

生效后，再在master node上执行curl：

# curl http://10.47.136.60:8080/api
{
  "kind": "APIVersions",
  "versions": [
    "v1"
  ],
  "serverAddressByClientCIDRs": [
    {
      "clientCIDR": "0.0.0.0/0",
      "serverAddress": "10.47.136.60:6443"
    }
  ]
}

看来问题原因就是安装程序的PROXY_SETTING中没有加入no_proxy的设置的缘故，于是修改config-default.sh中的代理设置：

PROXY_SETTING=${PROXY_SETTING:-"http_proxy=http://duotai:xxxx@sheraton.h.xduotai.com:24448 https_proxy=http://duotai:xxxx@sheraton.h.xduotai.com:24448 no_proxy=10.47.136.60,10.46.181.146,localhost,127.0.0.1"}

然后重新deploy：

root@iZ25cn4xxnvZ:~/k8stest/1.3.7/kubernetes/cluster# KUBERNETES_PROVIDER=ubuntu ./kube-up.sh
... Starting cluster using provider: ubuntu
... calling verify-prereqs
Identity added: /root/.ssh/id_rsa (/root/.ssh/id_rsa)
... calling kube-up
~/k8stest/1.3.7/kubernetes/cluster/ubuntu ~/k8stest/1.3.7/kubernetes/cluster
Prepare flannel 0.5.5 release ...
Prepare etcd 3.0.12 release ...
Prepare kubernetes 1.3.7 release ...
Done! All your binaries locate in kubernetes/cluster/ubuntu/binaries directory
~/k8stest/1.3.7/kubernetes/cluster

Deploying master and node on machine 10.47.136.60
make-ca-cert.sh                                                                                                                               100% 4028     3.9KB/s   00:00
easy-rsa.tar.gz                                                                                                                               100%   42KB  42.4KB/s   00:00
config-default.sh                                                                                                                             100% 5678     5.5KB/s   00:00
... ...
cp: cannot create regular file ‘/opt/bin/etcd’: Text file busy
cp: cannot create regular file ‘/opt/bin/flanneld’: Text file busy
cp: cannot create regular file ‘/opt/bin/kube-apiserver’: Text file busy
cp: cannot create regular file ‘/opt/bin/kube-controller-manager’: Text file busy
cp: cannot create regular file ‘/opt/bin/kube-scheduler’: Text file busy
Connection to 10.47.136.60 closed.
Deploying master and node on machine 10.47.136.60 failed

重新部署时，由于之前k8s cluster在各个node的组件已经启动，因此failed。我们需要通过

KUBERNETES_PROVIDER=ubuntu kube-down.sh

将k8s集群停止后再尝试up，或者如果不用这个kube-down.sh脚本，也可以在各个节点上手动shutdown各个k8s组件(master上有五个核心组件，minion node上有两个核心组件，另外别忘了停止etcd和flanneld服务)，以kube-controller-manager为例：

service kube-controller-manager stop

即可。

再次执行kube-up.sh：

... ...
.. calling validate-cluster
Waiting for 2 ready nodes. 1 ready nodes, 2 registered. Retrying.
Found 2 node(s).
NAME            STATUS    AGE
10.46.181.146   Ready     4h
10.47.136.60    Ready     4h
Validate output:
NAME                 STATUS    MESSAGE              ERROR
scheduler            Healthy   ok
controller-manager   Healthy   ok
etcd-0               Healthy   {"health": "true"}
Cluster validation succeeded
Done, listing cluster services:

Kubernetes master is running at http://10.47.136.60:8080

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

通过字样：”Cluster validation succeeded”可以证明我们成功安装了k8s集群。

执行kubectl get node可以看到当前集群的节点组成情况：

# kubectl get node
NAME            STATUS    AGE
10.46.181.146   Ready     4h
10.47.136.60    Ready     4h

通过执行kubectl cluster-info dump 可以看到k8s集群更为详尽的信息。

五、测试k8s的service特性

之所以采用k8s，初衷就是因为Docker 1.12在阿里云搭建的swarm集群的VIP和Routing mesh机制不好用。因此，在k8s集群部署成功后，我们需要测试一下这两种机制在k8s上是否能够获得支持。

k8s中一些关于集群的抽象概念，比如node、deployment、pod、service等，这里就不赘述了，需要的话可以参考这里的Concept guide。

1、集群内负载均衡

在k8s集群中，有一个等同于docker swarm vip的概念，成为cluster ip，k8s回为每个service分配一个cluster ip，这个cluster ip在service生命周期中不会改变，并且访问cluster ip的请求会被自动负载均衡到service里的后端container中。

我们来启动一个replicas= 2的nginx service，我们需要先从一个描述文件来部署一个deployment：

//run-my-nginx.yaml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: my-nginx
spec:
  replicas: 2
  template:
    metadata:
      labels:
        run: my-nginx
    spec:
      containers:
      - name: my-nginx
        image: nginx:1.10.1
        ports:
        - containerPort: 80

启动deployment:

root@iZ25cn4xxnvZ:~/k8stest/demo# kubectl create -f ./run-my-nginx.yaml
deployment "my-nginx" created

root@iZ25cn4xxnvZ:~/k8stest/demo# kubectl get deployment
NAME       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
my-nginx   2         2         2            2           9s

root@iZ25cn4xxnvZ:~/k8stest/demo# kubectl get pods -l run=my-nginx -o wide
NAME                        READY     STATUS    RESTARTS   AGE       IP            NODE
my-nginx-2395715568-2t6xe   1/1       Running   0          50s       172.16.57.3   10.46.181.146
my-nginx-2395715568-gpljv   1/1       Running   0          50s       172.16.99.2   10.47.136.60

可以看到my-nginx deployment已经成功启动，并且被调度在两个minion node上。

接下来，我们将deployment转化为service:

# kubectl expose deployment/my-nginx
service "my-nginx" exposed

root@iZ25cn4xxnvZ:~/k8stest/demo# kubectl get svc my-nginx
NAME       CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
my-nginx   192.168.3.239   <none>        80/TCP    15s

# kubectl describe svc my-nginx
Name:            my-nginx
Namespace:        default
Labels:            run=my-nginx
Selector:        run=my-nginx
Type:            ClusterIP
IP:            192.168.3.239
Port:            <unset>    80/TCP
Endpoints:        172.16.57.3:80,172.16.99.2:80
Session Affinity:    None

我们看到通过expose命令，可以将deployment转化为service，转化后，my-nginx service被分配了一个cluster-ip：192.168.3.239。

我们启动一个client container用于测试内部负载均衡：

root@iZ25cn4xxnvZ:~/k8stest/demo# kubectl run myclient --image=registry.cn-hangzhou.aliyuncs.com/mioss/test --replicas=1 --command -- tail -f /var/log/bootstrap.log
deployment "myclient" created

root@iZ25cn4xxnvZ:~/k8stest/demo# kubectl get pods
NAME                        READY     STATUS    RESTARTS   AGE
my-nginx-2395715568-2t6xe   1/1       Running   0          24m
my-nginx-2395715568-gpljv   1/1       Running   0          24m
myclient-1460251692-g7rnl   1/1       Running   0          21s

通过docker exec -it containerid /bin/bash进入myclient容器内，通过curl向上面的cluster-ip发起http请求：

root@myclient-1460251692-g7rnl:/# curl -v 192.168.3.239:80

同时在两个minion节点上，通过docker logs -f查看my-nginx service下面的两个nginx container实例日志，可以看到两个container轮询收到http request:

root@iZ25cn4xxnvZ:~/k8stest/demo# docker logs -f  ccc2f9bb814a
172.16.57.0 - - [17/Oct/2016:06:35:57 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
172.16.57.0 - - [17/Oct/2016:06:36:13 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
172.16.57.0 - - [17/Oct/2016:06:37:06 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
172.16.57.0 - - [17/Oct/2016:06:37:45 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
172.16.57.0 - - [17/Oct/2016:06:37:46 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
172.16.57.0 - - [17/Oct/2016:06:37:50 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"

root@iZ25mjza4msZ:~# docker logs -f 0e533ec2dc71
172.16.57.4 - - [17/Oct/2016:06:33:14 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
172.16.57.4 - - [17/Oct/2016:06:33:18 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
172.16.57.4 - - [17/Oct/2016:06:34:06 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
172.16.57.4 - - [17/Oct/2016:06:34:09 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
172.16.57.4 - - [17/Oct/2016:06:35:45 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
172.16.57.4 - - [17/Oct/2016:06:36:59 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"

cluster-ip机制有效。

2、nodeport机制

k8s通过nodeport机制实现类似docker的routing mesh，但底层机制和原理是不同的。

k8s的nodePort的原理是在集群中的每个node上开了一个端口，将访问该端口的流量导入到该node上的kube-proxy，然后再由kube-proxy进一步讲流量转发给该对应该nodeport的service的alive的pod上。

我们先来删除掉前面启动的my-nginx service，再重新创建支持nodeport的新my-nginx service。在k8s delete service有点讲究，我们删除service的目的不仅要删除service“索引”，还要stop并删除该service对应的Pod中的所有docker container。但在k8s中，直接删除service或delete pods都无法让对应的container stop并deleted，而是要通过delete service and delete deployment两步才能彻底删除service。

root@iZ25cn4xxnvZ:~# kubectl delete svc my-nginx
service "my-nginx" deleted

root@iZ25cn4xxnvZ:~# kubectl get service my-nginx
Error from server: services "my-nginx" not found

//容器依然在运行
root@iZ25cn4xxnvZ:~# kubectl get deployment my-nginx
NAME       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
my-nginx   2         2         2            2           20h

root@iZ25cn4xxnvZ:~# kubectl delete deployment my-nginx
deployment "my-nginx" deleted

再执行docker ps，看看对应docker container应该已经被删除。

重新创建暴露nodeport的my-nginx服务，我们先来创建一个新的service文件：

//my-nginx-svc.yaml

apiVersion: v1
kind: Service
metadata:
  name: my-nginx
  labels:
    run: my-nginx
spec:
  type: NodePort
  ports:
  - port: 80
    nodePort: 30062
    protocol: TCP
  selector:
    run: my-nginx

创建服务：

root@iZ25cn4xxnvZ:~/k8stest/demo# kubectl create -f ./my-nginx-svc.yaml
deployment "my-nginx" created

查看服务信息：

root@iZ25cn4xxnvZ:~/k8stest/demo# kubectl describe service my-nginx
Name:            my-nginx
Namespace:        default
Labels:            run=my-nginx
Selector:        run=my-nginx
Type:            NodePort
IP:            192.168.3.179
Port:            <unset>    80/TCP
NodePort:        <unset>    30062/TCP
Endpoints:        172.16.57.3:80,172.16.99.2:80
Session Affinity:    None

可以看到与上一次的service信息相比，这里多出一个属性:NodePort 30062/TCP，这个就是整个服务暴露到集群外面的端口。

接下来我们通过这两个node的公网地址访问一下这个暴露的nodeport，看看service中的两个ngnix container是否能收到request：

通过公网ip curl 30062端口:

curl -v x.x.x.x:30062
curl -v  y.y.y.y:30062

同样，我们用docker logs -f来监控两个nginx container的日志输出，可以看到：

nginx1:

172.16.57.4 - - [17/Oct/2016:08:19:56 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
172.16.57.1 - - [17/Oct/2016:08:21:55 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.30.0" "-"
172.16.57.1 - - [17/Oct/2016:08:21:56 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.30.0" "-"
172.16.57.1 - - [17/Oct/2016:08:21:59 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.30.0" "-"
172.16.57.1 - - [17/Oct/2016:08:22:07 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.30.0" "-"
172.16.57.1 - - [17/Oct/2016:08:22:09 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.30.0" "-"

nginx2：

172.16.57.0 - - [17/Oct/2016:08:22:05 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.30.0" "-"
172.16.57.0 - - [17/Oct/2016:08:22:06 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.30.0" "-"
172.16.57.0 - - [17/Oct/2016:08:22:08 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.30.0" "-"
172.16.57.0 - - [17/Oct/2016:08:22:09 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.30.0" "-"

两个container轮询地收到外部转来的http request。

现在我们将my-nginx服务的scale由2缩减为1：

root@iZ25cn4xxnvZ:~# kubectl scale --replicas=1 deployment/my-nginx
deployment "my-nginx" scaled

再次测试nodeport机制：

curl -v x.x.x.x:30062
curl -v  y.y.y.y:30062

scale后，只有master上的my-nginx存活。由于nodeport机制，没有my-nginx上的node收到请求后，将请求转给kube-proxy，通过内部clusterip机制，发给有my-nginx的container。

master上的nginx container：

172.16.99.1 - - [18/Oct/2016:00:55:04 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.30.0" "-"
172.16.57.0 - - [18/Oct/2016:00:55:10 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.30.0" "-"

nodeport机制测试ok。通过netstat我们可以看到30062端口是node上的kube-proxy监听的端口，因此即便该node上没有nginx服务container运行，kube-proxy也会转发request。

root@iZ25cn4xxnvZ:~# netstat -tnlp|grep 30062
tcp6       0      0 :::30062                :::*                    LISTEN      22076/kube-proxy

六、尾声

到这里，k8s集群已经是可用的了。但要用好背后拥有15年容器经验沉淀的k8s，还有很长的路要走，比如安装Addon（DNS plugin等）、比如安装Dashboard等。这些在这里暂不提了，文章已经很长了。后续可能会有单独文章说明。