docker - Tony Bai

七月 20, 2017

前一段时间将之前采用kubeadm安装的Kubernetes 1.5.1环境升级到了1.6.4版本，升级过程较为顺利。由于该k8s cluster是一个测试环境，当时并没有过于关注，就忙别的事情了。最近项目组打算在这个环境下做一些事情，而当我们重新“捡起”这个环境时，发现Kubernetes Dashboard无法访问了。

Kubernetes的dashboard可以有很多种访问方式，比如：可以通过暴露nodeport的方式(无身份验证，不安全)、可以通过访问apiserver的api服务的方式等。我们的Dashboard通过APIServer进行访问：

https://apiserver_ip:secure_port/ui

正常情况下通过浏览器访问：https://apiserver_ip:secure_port/ui，浏览器会弹出身份验证对话框，待输入正确的用户名和密码后，便可成功进入Dashboard了。但当前，我们得到的结果却是：

User "system:anonymous" cannot proxy services in the namespace "kube-system".

而访问apiserver(https://apiserver_ip:secure_port/)得到的结果如下：

User "system:anonymous" cannot get  at the cluster scope.

一、问题原因分析

k8s 1.6.x版本与1.5.x版本的一个很大不同在于1.6.x版本启用了RBAC的Authorization mode(授权模型)，这点在K8s master init的日志中可以得到证实：

# kubeadm init --apiserver-advertise-address xx.xx.xx
... ...
[init] Using Kubernetes version: v1.6.4
[init] Using Authorization mode: RBAC
[preflight] Running pre-flight checks
[preflight] Starting the kubelet service
[certificates] Generated CA certificate and key.
[certificates] Generated API server certificate and key
.... ...
[apiconfig] Created RBAC rules
[addons] Created essential addon: kube-proxy
[addons] Created essential addon: kube-dns

Your Kubernetes master has initialized successfully!
... ...

在《Kubernetes集群的安全配置》一文中我们提到过Kubernetes API server的访问方法：

Authentication(身份验证) -> Authorization（授权）-> Admission Control(入口条件控制)

只不过在Kubernetes 1.5.x及以前的版本中，Authorization的环节都采用了默认的配置，即”AlwaysAllow”，对访问APIServer并不产生什么影响：

# kube-apiserver -h
... ...
--authorization-mode="AlwaysAllow": Ordered list of plug-ins to do authorization on secure port. Comma-delimited list of: AlwaysAllow,AlwaysDeny,ABAC,Webhook,RBAC
... ...

但K8s 1.6.x版本中，–authorization-mode的值发生了变化：

# cat /etc/kubernetes/manifests/kube-apiserver.yaml

spec:
  containers:
  - command:
    - kube-apiserver
    - --allow-privileged=true
    ... ...
    - --basic-auth-file=/etc/kubernetes/basic_auth_file
    - --authorization-mode=RBAC
    ... ...

注：这里我们依旧通过basic auth方式进行apiserver的Authentication，而不是用客户端数字证书校验等其他方式。

显然问题的原因就在于这里RBAC授权方式的使用，让我们无法正常访问Dashboard了。

二、Kubernetes RBAC Authorization简介

RBAC Authorization的基本概念是Role和RoleBinding。Role是一些permission的集合；而RoleBinding则是将Role授权给某些User、某些Group或某些ServiceAccount。K8s官方博客《RBAC Support in Kubernetes》一文的中的配图对此做了很生动的诠释：

img{512x368}

从上图中我们可以看到：

Role: pod-reader 拥有Pod的get和list permissions；
RoleBinding: pod-reader 将Role: pod-reader授权给右边的User、Group和ServiceAccount。

和Role和RoleBinding对应的是，K8s还有ClusterRole和ClusterRoleBinding的概念，它们不同之处在于：ClusterRole和ClusterRoleBinding是针对整个Cluster范围内有效的，无论用户或资源所在的namespace是什么；而Role和RoleBinding的作用范围是局限在某个k8s namespace中的。

Kubernetes 1.6.4安装时内建了许多Role/ClusterRole和RoleBinds/ClusterRoleBindings：

# kubectl get role -n kube-system
NAME                                        AGE
extension-apiserver-authentication-reader   50d
system:controller:bootstrap-signer          50d
system:controller:token-cleaner             50d

# kubectl get rolebinding -n kube-system
NAME                                 AGE
system:controller:bootstrap-signer   50d
system:controller:token-cleaner      50d

# kubectl get clusterrole
NAME                                           AGE
admin                                          50d
cluster-admin                                  50d
edit                                           50d
system:auth-delegator                          50d
system:basic-user                              50d
system:controller:attachdetach-controller      50d
... ...
system:discovery                               50d
system:heapster                                50d
system:kube-aggregator                         50d
system:kube-controller-manager                 50d
system:kube-dns                                50d
system:kube-scheduler                          50d
system:node                                    50d
system:node-bootstrapper                       50d
system:node-problem-detector                   50d
system:node-proxier                            50d
system:persistent-volume-provisioner           50d
view                                           50d
weave-net                                      50d

# kubectl get clusterrolebinding
NAME                                           AGE
cluster-admin                                  50d
kubeadm:kubelet-bootstrap                      50d
kubeadm:node-proxier                           50d
kubernetes-dashboard                           50d
system:basic-user                              50d
system:controller:attachdetach-controller      50d
... ...
system:controller:statefulset-controller       50d
system:controller:ttl-controller               50d
system:discovery                               50d
system:kube-controller-manager                 50d
system:kube-dns                                50d
system:kube-scheduler                          50d
system:node                                    50d
system:node-proxier                            50d
weave-net                                      50d

三、Dashboard的role和rolebinding

Kubernetes 1.6.x启用RBAC后，诸多周边插件也都推出了适合K8s 1.6.x的manifest描述文件，比如：weave-net等。Dashboard的manifest文件中也增加了关于rolebinding的描述，我当初用的是1.6.1版本，文件内容摘录如下：

// kubernetes-dashboard.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: kubernetes-dashboard
  name: kubernetes-dashboard
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: kubernetes-dashboard
  labels:
    k8s-app: kubernetes-dashboard
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: kubernetes-dashboard
  namespace: kube-system
... ...

我们看到在kubernetes-dashboard.yaml中，描述文件新建了一个ClusterRoleBinding：kubernetes-dashboard。该binding将ClusterRole: cluster-admin授权给了一个ServiceAccount: kubernetes-dashboard。我们看看ClusterRole: cluster-admin都包含了哪些permission:

# kubectl get clusterrole/cluster-admin -o yaml
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  creationTimestamp: 2017-05-30T14:06:39Z
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: cluster-admin
  resourceVersion: "11"
  selfLink: /apis/rbac.authorization.k8s.io/v1beta1/clusterrolescluster-admin
  uid: 331c79dc-4541-11e7-bc9a-12584ec3a8c9
rules:
- apiGroups:
  - '*'
  resources:
  - '*'
  verbs:
  - '*'
- nonResourceURLs:
  - '*'
  verbs:
  - '*'

可以看到，在rules设定中，cluster-admin似乎拥有了“无限”权限。不过注意：这里仅仅授权给了一个service account，并没有授权给user或group。并且这里的kubernetes-dashboard是dashboard访问apiserver时使用的(下图右侧流程)，并不是user访问APIServer时使用的。

img{512x368}

我们需要给登录dashboard或者说apiserver的user(图左侧)进行授权。

四、为user: admin进行授权

我们的kube-apiserver的启动参数中包含：

    - --basic-auth-file=/etc/kubernetes/basic_auth_file

也就是说我们访问apiserver使用的是basic auth的身份验证方式，而user恰为admin。而从本文开头的错误现象来看，admin这个user并未得到足够的授权。这里我们要做的就是给admin选择一个合适的clusterrole。但kubectl并不支持查看user的信息，初始的clusterrolebinding又那么多，一一查看十分麻烦。我们知道cluster-admin这个clusterrole是全权限的，我们就来将admin这个user与clusterrole: cluster-admin bind到一起：

# kubectl create clusterrolebinding login-on-dashboard-with-cluster-admin --clusterrole=cluster-admin --user=admin
clusterrolebinding "login-on-dashboard-with-cluster-admin" created

# kubectl get clusterrolebinding/login-on-dashboard-with-cluster-admin -o yaml
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: 2017-07-20T08:57:07Z
  name: login-on-dashboard-with-cluster-admin
  resourceVersion: "5363564"
  selfLink: /apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindingslogin-on-dashboard-with-cluster-admin
  uid: 686a3f36-6d29-11e7-8f69-00163e1001d7
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: admin

binding后，我们再来访问一下dashboard UI，不出意外的话，熟悉的dashboard界面就会出现在你的眼前。

注：Kubernetes API Server新增了–anonymous-auth选项，允许匿名请求访问secure port。没有被其他authentication方法拒绝的请求即Anonymous requests，这样的匿名请求的username为”system:anonymous”, 归属的组为”system:unauthenticated”。并且该选线是默认的。这样一来，当采用chrome浏览器访问dashboard UI时很可能无法弹出用户名、密码输入对话框，导致后续authorization失败。为了保证用户名、密码输入对话框的弹出，需要将–anonymous-auth设置为false：

// /etc/kubernetes/manifests/kube-apiserver.yaml
    - --anonymous-auth=false

用curl测试结果如下：

$curl -u admin:YOUR_PASSWORD -k https://apiserver_ip:secure_port/
{
  "paths": [
    "/api",
    "/api/v1",
    "/apis",
    "/apis/apps",
    "/apis/apps/v1beta1",
    "/apis/authentication.k8s.io",
    "/apis/authentication.k8s.io/v1",
    "/apis/authentication.k8s.io/v1beta1",
    "/apis/authorization.k8s.io",
    "/apis/authorization.k8s.io/v1",
    "/apis/authorization.k8s.io/v1beta1",
    "/apis/autoscaling",
    "/apis/autoscaling/v1",
    "/apis/autoscaling/v2alpha1",
    "/apis/batch",
    "/apis/batch/v1",
    "/apis/batch/v2alpha1",
    "/apis/certificates.k8s.io",
    "/apis/certificates.k8s.io/v1beta1",
    "/apis/extensions",
    "/apis/extensions/v1beta1",
    "/apis/policy",
    "/apis/policy/v1beta1",
    "/apis/rbac.authorization.k8s.io",
    "/apis/rbac.authorization.k8s.io/v1alpha1",
    "/apis/rbac.authorization.k8s.io/v1beta1",
    "/apis/settings.k8s.io",
    "/apis/settings.k8s.io/v1alpha1",
    "/apis/storage.k8s.io",
    "/apis/storage.k8s.io/v1",
    "/apis/storage.k8s.io/v1beta1",
    "/healthz",
    "/healthz/ping",
    "/healthz/poststarthook/bootstrap-controller",
    "/healthz/poststarthook/ca-registration",
    "/healthz/poststarthook/extensions/third-party-resources",
    "/healthz/poststarthook/rbac/bootstrap-roles",
    "/logs",
    "/metrics",
    "/swaggerapi/",
    "/ui/",
    "/version"
  ]
}

微博：@tonybai_cn
微信公众号：iamtonybai
github.com: https://github.com/bigwhite

解决登录Harbor Registry时鉴权失败的问题

六月 15, 2017

0 条评论

今天在测试之前搭建好的高可用Harbor时，发现了一个问题：使用docker login harbor时，有时成功，有时失败：

# docker login -u user -p passwd http://hub.my-domain.com:36666
Login Succeeded

# docker login -u user -p passwd http://hub.my-domain.com:36666
Error response from daemon: login attempt to http://hub.my-domain.com:36666/v2/ failed with status: 401 Unauthorized

我们在DNS中将hub.my-domain.com这个域名解析成两个IP，分别是两个Harbor节点的public IP，这可能是问题的诱发原因，但我还不知道问题根源在哪里。以下是问题的查找过程记录。

1、保证每个Harbor node都是可以login ok的

我在client端通过修改/etc/hosts将hub.my-domain.com分别解析成上述说到的两个node IP并测试。测试结果表明：无论单独解析成哪个IP，docker login http://hub.my-domain.com:36666都会100%的成功。

2、查看两个Harbor node上的registry log，弄清问题现象

将/etc/hosts中hub.my-domain.com的硬解析删除，恢复DNS解析。打开两个terminal tab分别监视连个Harbor node上的registry的日志。经过几次测试，发现一个现象：当docker login成功时，都是一个node上的日志出现更新；而当docker login fail时，我们会看到两个Node上的registry日志都有变化，似乎请求发给了两个node。

node1:
Jun 15 14:40:01 172.19.0.1 registry[30242]: time="2017-06-15T06:40:01.245822446Z" level=debug msg="authorizing request" go.version=go1.7.3 http.request.host="hub.my-domain.com:36666" http.request.id=62add46e-e176-4eb8-b36a-84a9fbe7ac9c http.request.method=GET http.request.remoteaddr=xx.xx.xx.xx http.request.uri="/v2/" http.request.useragent="docker/1.12.5 go/go1.6.4 git-commit/7392c3b kernel/4.4.0-58-generic os/linux arch/amd64 UpstreamClient(Docker-Client/1.12.5 \\(linux\\))" instance.id=43380207-7b61-4d45-b06a-a017c9a075af service=registry version="v2.4.1+unknown"

Jun 15 14:40:01 172.19.0.1 registry[30242]: time="2017-06-15T06:40:01.246002519Z" level=error msg="token signed by untrusted key with ID: \"BASH:RNPJ:PEBU:7THG:2NAR:OSFV:CG6U:ANV4:CCNB:ODZR:4BL6:TMD6\""

node2:

Jun 15 14:40:01 172.18.0.1 registry[28674]: time="2017-06-15T06:40:01.213604228Z" level=debug msg="authorizing request" go.version=go1.7.3 http.request.host="hub.my-domain.com:36666" http.request.id=bb6eeb8f-99f1-47a0-8cae-dae9b402b758 http.request.method=GET http.request.remoteaddr=xx.xx.xx.xx http.request.uri="/v2/" http.request.useragent="docker/1.12.5 go/go1.6.4 git-commit/7392c3b kernel/4.4.0-58-generic os/linux arch/amd64 UpstreamClient(Docker-Client/1.12.5 \\(linux\\))" instance.id=2a364e0c-425f-47a9-b144-887d439243ba service=registry version="v2.4.1+unknown"

Jun 15 14:40:01 172.18.0.1 registry[28674]: time="2017-06-15T06:40:01.21374491Z" level=warning msg="error authorizing context: authorization token required" go.version=go1.7.3 http.request.host="hub.my-domain.com:36666" http.request.id=bb6eeb8f-99f1-47a0-8cae-dae9b402b758 http.request.method=GET http.request.remoteaddr=xx.xx.xx.xx http.request.uri="/v2/" http.request.useragent="docker/1.12.5 go/go1.6.4 git-commit/7392c3b kernel/4.4.0-58-generic os/linux arch/amd64 UpstreamClient(Docker-Client/1.12.5 \\(linux\\))" instance.id=2a364e0c-425f-47a9-b144-887d439243ba service=registry version="v2.4.1+unknown"

Jun 15 14:40:01 172.18.0.1 registry[28674]: 172.18.0.3 - - [15/Jun/2017:06:40:01 +0000] "GET /v2/ HTTP/1.1" 401 87 "" "docker/1.12.5 go/go1.6.4 git-commit/7392c3b kernel/4.4.0-58-generic os/linux arch/amd64 UpstreamClient(Docker-Client/1.12.5 \\(linux\\))"

3、探寻Harbor原理，弄清问题根源

打开harbor在github.com的wiki页，在”Architecture Overview of Harbor“中我找到了docker login的流程：

img{512x368}

从图片上，我一眼就看到了从docker client发出的*”两个请求: a和c流程”，看来docker client的确不止一次向Harbor发起了请求。wiki上对docker login流程给了简明扼要的解释。大致的流程是:

docker向registry发起请求，由于registry是基于token auth的，因此registry回复应答，告诉docker client去哪个URL去获取token；
docker client根据应答中的URL向token service(ui)发起请求，通过user和passwd获取token；如果user和passwd在db中通过了验证，那么token service将用自己的私钥(harbor/common/config/ui/private_key.pem)生成一个token，返回给docker client端；
docker client获得token后再向registry发起login请求，registry用自己的证书(harbor/common/config/registry/root.crt)对token进行校验。通过则返回成功，否则返回失败。

从这个原理，我们可以知道问题就出在docker client多次向Harbor发起请求这个环节：对于每次请求，DNS会将域名可能解析为不同IP，因此不同请求可能落到不同的node上。这样当docker client拿着node1上token service分配的token去到node2的registry上鉴权时，就会出现鉴权失败的情况。

4、统一私钥和证书，问题得以解决

token service的私钥(harbor/common/config/ui/private_key.pem)和registry的证书(harbor/common/config/registry/root.crt)都是在prepare时生成的，两个节点都独立prepare过，因此两个node上的private_key.pem和root.crt是不同的，这就是问题根源。

解决这个问题很简单，就是统一私钥和证书。比如：将node1上的private_key.pem和root.crt复制到node2上，并重新创建node2上的container：

// node2上

将node1上的harbor/common/config/ui/private_key.pem复制到node2上的harbor/common/config/ui/private_key.pem；
将node1上的harbor/common/config/registry/root.crt复制到harbor/common/config/registry/root.crt；

$ docker-compose down -v
$ docker-compose up -d

更换了private_key.pem和root.crt的node2上的Harbor启动后，再进行login测试，就会100%成功了！

# docker login -u admin -p passwd http://hub.my-domain.com:36666
Login Succeeded

微博：@tonybai_cn
微信公众号：iamtonybai
github.com: https://github.com/bigwhite

标签 docker 下的文章

解决Kubernetes 1.6.4 Dashboard无法访问的问题

一、问题原因分析

二、Kubernetes RBAC Authorization简介

三、Dashboard的role和rolebinding

四、为user: admin进行授权

解决登录Harbor Registry时鉴权失败的问题

1、保证每个Harbor node都是可以login ok的

2、查看两个Harbor node上的registry log，弄清问题现象

3、探寻Harbor原理，弄清问题根源

4、统一私钥和证书，问题得以解决

文章

评论

分类

归档

链接

开源项目

翻译项目