rancher2.5.0 版本证书过期处理

通过rancher界面可以很方便查看管理kubernetes中的一些资源,有助于一些对k8s不太熟悉的人排查业务问题(如开发),运维人员看个人爱好使用。

坑:通过docker 起2.5.0版本rancher遇到的,rancher证书的有效期只有一年,一年后web管理界面无法登录,登陆宿主机查看相应容器的日志发现以下报错

2023/05/10 02:43:21 [INFO] Waiting for k3s to start

2023/05/10 02:43:22 [INFO] Starting API controllers

2023/05/10 02:45:05 [FATAL] Get "https://127.0.0.1:6443/api?timeout=15m0s": x509: certificate signed by unknown authority

2023/05/10 02:45:06 [INFO] Rancher version v2.5.0 (65f3525cd) is starting

2023/05/10 02:45:06 [INFO] Rancher arguments {ACMEDomains:[] AddLocal:true Embedded:false BindHost: HTTPListenPort:80 HTTPSListenPort:443 K8sMode:auto Debug:false Trace:false NoCACerts:false AuditLogPath:/var/log/auditlog/rancher-api-audit.log AuditLogMaxage:10 AuditLogMaxsize:100 AuditLogMaxbackup:10 AuditLevel:0 Agent:false Features:}

2023/05/10 02:45:06 [INFO] Listening on /tmp/log.sock

2023/05/10 02:45:06 [INFO] Running etcd --data-dir=management-state/etcd --heartbeat-interval=500 --election-timeout=5000

2023/05/10 02:45:10 [INFO] Waiting for server to become available: Get "https://127.0.0.1:6443/version?timeout=15m0s": dial tcp 127.0.0.1:6443: connect: connection refused

2023/05/10 02:45:12 [INFO] Waiting for server to become available: Get "https://127.0.0.1:6443/version?timeout=15m0s": x509: certificate signed by unknown authority

2023/05/10 02:45:14 [INFO] Waiting for server to become available: Get "https://127.0.0.1:6443/version?timeout=15m0s": x509: certificate signed by unknown authority

2023/05/10 02:45:16 [INFO] Waiting for server to become available: Get "https://127.0.0.1:6443/version?timeout=15m0s": x509: certificate signed by unknown authority

2023/05/10 02:45:18 [INFO] Waiting for server to become available: Get "https://127.0.0.1:6443/version?timeout=15m0s": x509: certificate signed by unknown authority

2023/05/10 02:45:20 [INFO] Waiting for server to become available: Get "https://127.0.0.1:6443/version?timeout=15m0s": x509: certificate signed by unknown authority

可以先检查所有证书过期时间(7f773c5d1d2a 为相应的rancher容器id)

docker exec -it 7f773c5d1d2a /bin/bash

root@7f773c5d1d2a:/var/lib/rancher# cd k3s/server/tls

root@7f773c5d1d2a:/var/lib/rancher/k3s/server/tls# for i in `ls *.crt`; do openssl x509 -in $i -noout -dates ; done

notBefore=May 10 02:23:23 2023 GMT

notAfter=Dec 31 16:00:20 2023 GMT

notBefore=May 10 02:23:23 2023 GMT

notAfter=Dec 31 16:00:20 2023 GMT

notBefore=May 10 02:23:23 2023 GMT

notAfter=May 7 02:23:23 2033 GMT

notBefore=May 10 02:23:23 2023 GMT

notAfter=Dec 31 16:00:20 2023 GMT

notBefore=May 10 02:23:23 2023 GMT

经过测试通过:直接清空下述文件夹,所有内容 包括*.key 、*.crt 、temporary-certs目录 /var/lib/rancher//rancher/k3s/server/tls/ 重启相应容器,web界面访问测试,检查是否有异常。

如果不行。 试试以下方法(适用rancher版本2.4.x 2.5.x )

进入容器中: docker exec -it rancher-id /bin/bash 容器内执行证书清理动作并刷新

docker exec -it 7f773c5d1d2a /bin/bash

kubectl --insecure-skip-tls-verify -n kube-system delete secrets k3s-serving

kubectl --insecure-skip-tls-verify delete secret serving-cert -n cattle-system

rm -f /var/lib/rancher/k3s/server/tls/dynamic-cert.json

curl --insecure -sfL https://172.16.11.115/v3

重启rancher server 容器 docker restart rancher

据说rancher升级至2.5.8 及以上, 其内使用k3s 1.20 版本,已无此Bug,可自行测试。

管理的集群客户端不正常可能需要重启,pod位于cattle-system命名空间下,直接删除重建即可。

Last login: Wed May 10 10:32:21 CST 2023 on pts/5

[root@prod-com-k8master1 ~]# kubectl get po -n cattle-system

NAME READY STATUS RESTARTS AGE

cattle-cluster-agent-6bb48b9896-ndlw2 1/1 Running 1 161m

[root@prod-com-k8master1 ~]# kubectl delete po -n cattle-system cattle-cluster-agent-6bb48b9896-ndlw2

精彩链接

评论可见,请评论后查看内容,谢谢!!!评论后请刷新页面。