今天参加Redhat大客户交流会,有一个主题是Redhat的小伙伴介绍Openshift的排错技巧。这个还是很值得参考的,于是将内容发在了这里,以便更多的小伙伴能够从中受益。当然Redhat小伙伴列出的也只是一部分,同时有些排错细节PPT中也并没有具体写,这篇中我就不扩展了。以下为PPT的内容。
OpenShift排错技巧
- 环境基本信息收集
- 日志等级
- 应用程序
- OC客户端排错
- 镜像仓库
- 网络
- 路由
- Installer
- DNS
- Etcd
日志等级
openshift service log:
/etc/origin/master/master.env #同时作用于API和Controllers
DEBUG_LOGLEVEL=4
/etc/sysconfig/atomic-openshift-node
OPTIONS=--loglevel=4
错误等级
- 0 - Errors and warning only
- 2 - Normal information
- 4 - Debugging-level information
- 6 - API-level debugging information (request / response)
- 8 - Body-level API debugging information
Docker Log level
/etc/sysconfig/docker --log-level=debug
Etcd Log level
# source /etc/etcd/etcd.conf
# curl --cert $ETCD_PEER_CERT_FILE --key $ETCD_PEER_KEY_FILE --cacert $ETCD_TRUSTED_CA_FILE $ETCD_ADVERTISE_CLIENT_URLS/config/local/log -XPUT -d '{"Level":"DEBUG"}'
Get Log
# /usr/local/bin/master-logs etcd etcd > $(hostname)-etcd.log 2>&1
OpenShift Builder Pod Logs
BUILD_LOGLEVEL in BC/env ## BC的环境变量中设置
应用日志
三个不同阶段: build errors, deployment errors 和applications errors
Build Errors
# oc logs bc/<build_pod>
Deployment Errors
# oc get status -o wide -n <project>
# oc get events -o wide -n <project>
Application Errors
# oc logs pod/<pod> -p
# oc debug pod/<POD_NAME>
deploy a version of this pod without liveness and rediness probes as well as setting the entrypoint to the shell.
OC客户端
oc客户端日志等级
# oc whoami --loglevel=8
Setting this value between 6 and 8 will provide extensive logging
API requests being send (loglevel 6)
headers (loglevel 7)
responses received (loglevel 8)
OpenShift Registry
健康检查
基本的健康检查、保证Registry正常运行并且正常响应其对应的service地址
# RegistryAddr=$(oc get svc docker-registry -n default -o jsonpath={.spec.clusterIP}:{.spec.ports[0].port})
# curl -vk https://$RegistryAddr/healthz
测试镜像仓库
docker login -u openshift -p $(oc whoami -t) <registry_ip>:<port>
docker pull/tag/push
如果使用存储
# oc rsh $(oc get pods -o name -l docker-registry -n default)
OpenShift Networking
Debugging External Access to an HTTP Service
Debugging Node to Node Networking
Debugging Local Networking
使用NetWorking Diagnostics Tool检查网络状况
https://docs.openshift.com/container-platform/3.11/admin_guide/sdn_troubleshooting.html
OpenShift Routing
分段检查 定位问题 curl pod / svc
$ oc logs dc/router -n default
$ oc get dc/router -o yaml default
$ oc get route <NAME_OF_ROUTE> -n <PROJECT>
$ oc get endpoints --all-namespaces
$ oc exec -it $ROUTER_POD -- ls -la
$ oc exec -it $ROUTER_POD -- find /var/lib/haproxy -regex ".(.map|config.|.json)" -print -exec cat {} ; > haproxy_configs_and_maps
router log
Router健康状态查看
http://admin:<stats-password>@<master-ip>:1936/haproxy_stats
enable access log to syslog server
OpenShift Installer
OpenShift Ansible Playbooks
# ansible-playbook <PLAYBOOK> -vvv | tee ansible.logs
如果在某一个task上失败,可以访问github上的源码查找对应task具体操作步骤:
Access the Git Hub Install Repo: openshift/openshift-ansible
OpenShift DNS
Dnsmasq 是一个小型的DNS缓存服务器。它可以根据缓存来响应DNS查询或将其转发到外部真实的DNS服务器上,它安装在每个节点上。
Skydns是一个建立在ETCD之上的DNS服务器,它嵌入在节点的进程中,主要负责相应内部service的解析。
NetworkManager会启动origin dispatcher /etc/NetworkManager/dispatcher.d/99-origin-dns.sh以配置/etc/resolv.conf和一些其他文件
NetworkManager
请确定NetworkManger服务正常运行
请查看/etc/NetworkManager/dispatch.d/99-origin-dns.sh为可执行
请确定/etc/resolv.conf文件包含主机的私网IP,并且有正确的search域。/etc/resolv.conf是由NetworkManager服务生成的
请检查dnsmasq服务是否ok
systemctl status dnsmasq -l
OpenShift Etcd
设置etcd变量
# source /etc/etcd/etcd.conf
# export ETCDCTL_API=3
Set endpoint variable to include all etcd endpoints
# ETCD_ALL_ENDPOINTS=$(etcdctl --cert=$ETCD_PEER_CERT_FILE --key $ETCD_PEER_KEY_FILE --cacert $ETCD_TRUSTED_CA_FILE --endpoints=$ETCD_LISTEN_CLIENT_URLS --write-out=fields member list | awk '/ClientURL/{printf"%s%s", sep, $3; sep=","}')
check health of etcd
# etcdctl --cert=$ETCD_PEER_CERT_FILE --key $ETCD_PEER_KEY_FILE --cacert $ETCD_TRUSTED_CA_FILE --endpoints=$ETCD_LISTEN_CLIENT_URLS --write-out=table endpoint status
# etcdctl --cert=$ETCD_PEER_CERT_FILE --key $ETCD_PEER_KEY_FILE --cacert $ETCD_TRUSTED_CA_FILE --endpoints=$ETCD_LISTEN_CLIENT_URLS --write-out=table endpoint health
最佳实践
推荐
Red Hat OpenShift Container Platform Life Cycle Policy
https://access.redhat.com/support/policy/updates/openshift
OpenShift Container Platform Tested Integrations supported configuration
https://access.redhat.com/articles/2176281
不推荐
- Master节点和Infra节点混用
- 外部负载均衡和openshift节点混用
- 单独升级某个组件版本
- service ip
排错指南推荐
Troubleshooting OpenShift Container Platform: Cluster Metrics
https://access.redhat.com/articles/2448341
Troubleshooting OpenShift Container Platform 3.x: Aggregating Container Logging
https://access.redhat.com/articles/3136551
Troubleshooting OpenShift Container Platform: Middleware Containers
https://access.redhat.com/articles/3135421