rancher2.5.0 版本证书过期处理
通过rancher界面可以很方便查看管理kubernetes中的一些资源,有助于一些对k8s不太熟悉的人排查业务问题(如开发),运维人员看个人爱好使用。
坑:通过docker 起2.5.0版本rancher遇到的,rancher证书的有效期只有一年,一年后web管理界面无法登录,登陆宿主机查看相应容器的日志发现以下报错
2023/05/10 02:43:21 [INFO] Waiting for k3s to start
2023/05/10 02:43:22 [INFO] Starting API controllers
2023/05/10 02:45:05 [FATAL] Get "https://127.0.0.1:6443/api?timeout=15m0s": x509: certificate signed by unknown authority
2023/05/10 02:45:06 [INFO] Rancher version v2.5.0 (65f3525cd) is starting
2023/05/10 02:45:06 [INFO] Rancher arguments {ACMEDomains:[] AddLocal:true Embedded:false BindHost: HTTPListenPort:80 HTTPSListenPort:443 K8sMode:auto Debug:false Trace:false NoCACerts:false AuditLogPath:/var/log/auditlog/rancher-api-audit.log AuditLogMaxage:10 AuditLogMaxsize:100 AuditLogMaxbackup:10 AuditLevel:0 Agent:false Features:}
2023/05/10 02:45:06 [INFO] Listening on /tmp/log.sock
2023/05/10 02:45:06 [INFO] Running etcd --data-dir=management-state/etcd --heartbeat-interval=500 --election-timeout=5000
2023/05/10 02:45:10 [INFO] Waiting for server to become available: Get "https://127.0.0.1:6443/version?timeout=15m0s": dial tcp 127.0.0.1:6443: connect: connection refused
2023/05/10 02:45:12 [INFO] Waiting for server to become available: Get "https://127.0.0.1:6443/version?timeout=15m0s": x509: certificate signed by unknown authority
2023/05/10 02:45:14 [INFO] Waiting for server to become available: Get "https://127.0.0.1:6443/version?timeout=15m0s": x509: certificate signed by unknown authority
2023/05/10 02:45:16 [INFO] Waiting for server to become available: Get "https://127.0.0.1:6443/version?timeout=15m0s": x509: certificate signed by unknown authority
2023/05/10 02:45:18 [INFO] Waiting for server to become available: Get "https://127.0.0.1:6443/version?timeout=15m0s": x509: certificate signed by unknown authority
2023/05/10 02:45:20 [INFO] Waiting for server to become available: Get "https://127.0.0.1:6443/version?timeout=15m0s": x509: certificate signed by unknown authority
可以先检查所有证书过期时间(7f773c5d1d2a 为相应的rancher容器id)
docker exec -it 7f773c5d1d2a /bin/bash
root@7f773c5d1d2a:/var/lib/rancher# cd k3s/server/tls
root@7f773c5d1d2a:/var/lib/rancher/k3s/server/tls# for i in `ls *.crt`; do openssl x509 -in $i -noout -dates ; done
notBefore=May 10 02:23:23 2023 GMT
notAfter=Dec 31 16:00:20 2023 GMT
notBefore=May 10 02:23:23 2023 GMT
notAfter=Dec 31 16:00:20 2023 GMT
notBefore=May 10 02:23:23 2023 GMT
notAfter=May 7 02:23:23 2033 GMT
notBefore=May 10 02:23:23 2023 GMT
notAfter=Dec 31 16:00:20 2023 GMT
notBefore=May 10 02:23:23 2023 GMT
经过测试通过:直接清空下述文件夹,所有内容 包括*.key 、*.crt 、temporary-certs目录 /var/lib/rancher//rancher/k3s/server/tls/ 重启相应容器,web界面访问测试,检查是否有异常。
如果不行。 试试以下方法(适用rancher版本2.4.x 2.5.x )
进入容器中: docker exec -it rancher-id /bin/bash 容器内执行证书清理动作并刷新
docker exec -it 7f773c5d1d2a /bin/bash
kubectl --insecure-skip-tls-verify -n kube-system delete secrets k3s-serving
kubectl --insecure-skip-tls-verify delete secret serving-cert -n cattle-system
rm -f /var/lib/rancher/k3s/server/tls/dynamic-cert.json
curl --insecure -sfL https://172.16.11.115/v3
重启rancher server 容器 docker restart rancher
据说rancher升级至2.5.8 及以上, 其内使用k3s 1.20 版本,已无此Bug,可自行测试。
管理的集群客户端不正常可能需要重启,pod位于cattle-system命名空间下,直接删除重建即可。
Last login: Wed May 10 10:32:21 CST 2023 on pts/5
[root@prod-com-k8master1 ~]# kubectl get po -n cattle-system
NAME READY STATUS RESTARTS AGE
cattle-cluster-agent-6bb48b9896-ndlw2 1/1 Running 1 161m
[root@prod-com-k8master1 ~]# kubectl delete po -n cattle-system cattle-cluster-agent-6bb48b9896-ndlw2
精彩链接
发表评论