机器集群规划
操作系统要求
IBM POWER9: RHEL-ALT 7.5 with the "Minimal" installation option and the latest packages from the Extras channel.
IBM POWER8: RHEL 7.5 with the "Minimal" installation option and the latest packages from the Extras channel.
Master :
Minimum 4 vCPU (additional are strongly recommended).
Minimum 16 GB RAM (additional memory is strongly recommended, especially if etcd is co-located on masters).
Minimum 40 GB hard disk space for the file system containing/var/.
Minimum 1 GB hard disk space for the file system containing/usr/local/bin/.
Minimum 1 GB hard disk space for the file system containing the system’s temporary directory.
Masters with a co-located etcd require a minimum of 4 cores. Two-core systems do not work.
Nodes:
NetworkManager 1.0 or later.
1 vCPU.
Minimum 8 GB RAM.
Minimum 15 GB hard disk space for the file system containing/var/.
Minimum 1 GB hard disk space for the file system containing/usr/local/bin/.
Minimum 1 GB hard disk space for the file system containing the system’s temporary directory.
An additional minimum 15 GB unallocated space per system running containers for Docker’s storage back end; seeConfiguring Docker Storage. Additional space might be required, depending on the size and number of containers that run on the node.
实验集群
Master 172.XX.XX.175
Node 172.XX.XX.182
172.XX.XX.183
操作步骤
1 Enable Security-Enhanced Linux (SELinux) on all of the nodes
a. vi /etc/selinux/config
set SELINUX=enforcing and SELINUXTYPE=targeted
b. touch /.autorelabel; reboot
2 Ensuring host access
设置master到各个Node的免密登录
2.1 Generate an SSH key on the host you run the installation playbook on:
# ssh-keygen
2.2 Distribute the key to the other cluster hosts. You can use abashloop:
# for host in master.openshift.example.com \1node1.openshift.example.com \2node2.openshift.example.com; \3do ssh-copy-id -i ~/.ssh/id_rsa.pub $host; \ done
3 更新网卡配置信息
In /etc/sysconfig/network-scripts/ifcfg-ethxx
a. Make sure that: NM_CONTROLLED=yes
b. Add following entries:
DNS1=
DNS2=
DOMAIN=
(You can get DNS values from: /etc/sysconfig/network-scripts/ifcfg-bootnet and /etc/resolv.conf)
如果都没有值DNS1=本机IP地址
(You can get DOMAIN value by this command: domainname -d)
4 每台机器设置/etc/hosts
[root@node1 network-scripts]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.xx.xx.175 master.openshift.example.com
172.xx.xx.182 node1.openshift.example.com
172.xx.xx.183 node2.openshift.example.com
5 yum 设置代理
如果机器不能直接上网,需要设置上网代理服务器
vi /etc/yum.conf
set proxy=http://xx.xx.xx.xx:xxxx
6 Registering hosts(需要有红帽的订阅)
在每台机器执行
# subscription-manager register --username=<user_name> --password=<password>
# subscription-manager refresh
# subscription-manager list --available --matches '*OpenShift*'
# subscription-manager attach --pool=<pool_id>
6 注册yum 源
For on-premise installations on IBM POWER8 servers, run the following command
subscription-manager repos \
--enable="rhel-7-for-power-le-rpms" \
--enable="rhel-7-for-power-le-extras-rpms" \
--enable="rhel-7-for-power-le-optional-rpms" \
--enable="rhel-7-server-ansible-2.6-for-power-le-rpms" \
--enable="rhel-7-for-power-le-ose-3.11-rpms" \
--enable="rhel-7-for-power-le-fast-datapath-rpms" \
--enable="rhel-7-server-for-power-le-rhscl-rpms"
For on-premise installations on IBM POWER9 servers, run the following command:
# subscription-manager repos \
--enable="rhel-7-for-power-9-rpms" \
--enable="rhel-7-for-power-9-extras-rpms" \
--enable="rhel-7-for-power-9-optional-rpms" \
--enable="rhel-7-server-ansible-2.6-for-power-9-rpms" \
--enable="rhel-7-server-for-power-9-rhscl-rpms" \
--enable="rhel-7-for-power-9-ose-3.11-rpms"
7 安装基础包
7.1 每台机器都执行
# yum -y install wget git net-tools bind-utils iptables-services bridge-utils bash-completion kexec-tools sos psacct
# yum -y update
# reboot
# yum install atomic-openshift-excluder-3.11.141*
Now install a container engine:
To install CRI-O:
# yum -y install cri-o
To install Docker:
# yum -y install docker
7.2在master执行
# yum -y install openshift-ansible
# yum install atomic-openshift atomic-openshift-clients atomic-openshift-hyperkube atomic-openshift-node flannel glusterfs-fuse (可以不执行此命令)
# yum install cockpit-docker cockpit-kubernetes
7.3 在node执行
# yum install atomic-openshift atomic-openshift-node flannel glusterfs-fuse (可以不执行此命令)
8 开始安装openshift 在master节点上执行
8.1 安装前检查
$ cd /usr/share/ansible/openshift-ansible
$ ansible-playbook -i <inventory_file> playbooks/prerequisites.yml
8.2 执行安装
$ cd /usr/share/ansible/openshift-ansible
$ ansible-playbook -i <inventory_file> playbooks/deploy_cluster.yml
9 inventory_file 示例(1 master +2 node )
[root@master openshift-ansible]# ls
ansible.cfg host.311 inventory playbooks roles
[root@master openshift-ansible]# cat host.311
# Create an OSEv3 group that contains the masters, nodes, and etcd groups
[OSEv3:children]
masters
nodes
etcd
# Set variables common for all OSEv3 hosts
[OSEv3:vars]
# SSH user, this user should allow ssh based auth without requiring a password
ansible_ssh_user=root
openshift_deployment_type=openshift-enterprise
# If ansible_ssh_user is not root, ansible_become must be set to true
#ansible_become=true
openshift_master_default_subdomain=master.openshift.example.com
debug_level=2
# default selectors for router and registry services
# openshift_router_selector='node-role.kubernetes.io/infra=true'
# openshift_registry_selector='node-role.kubernetes.io/infra=true'
# uncomment the following to enable htpasswd authentication; defaults to DenyAllPasswordIdentityProvider
#openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]
openshift_master_htpasswd_users={'my-rhel-icp-admin': '$apr1$6eO/grkf$9jRafb0tw/2KQEAejT8Lc.'}
# supposedly encrypted password of: S3cure-icp-wordP*s?
openshift_disable_check=memory_availability,disk_availability,docker_image_availability
openshift_master_cluster_hostname=master.openshift.example.com
openshift_master_cluster_public_hostname=master.openshift.example.com
# false
#ansible_service_broker_install=false
#openshift_enable_service_catalog=false
#template_service_broker_install=false
#openshift_logging_install_logging=false
# registry passwd
oreg_url=registry.redhat.io/openshift3/ose-${component}:${version}
oreg_auth_user=****@xxx
oreg_auth_password=*******
openshift_http_proxy=http://xxx.xxx.xxx.xxx:3130
#openshift_https_proxy=https://xx.xxx.xxx.xxx:3130
openshift_no_proxy=".openshift.example.com"
# docker config
openshift_docker_additional_registries=registry.redhat.io
#openshift_docker_insecure_registries
#openshift_docker_blocked_registries
openshift_docker_options="--log-driver json-file --log-opt max-size=1M --log-opt max-file=3"
# openshift_cluster_monitoring_operator_install=false
# openshift_metrics_install_metrics=true
# openshift_enable_unsupported_configurations=True
#openshift_logging_es_nodeselector='node-role.kubernetes.io/infra: "true"'
#openshift_logging_kibana_nodeselector='node-role.kubernetes.io/infra: "true"'
# host group for masters
[masters]
master.openshift.example.com openshift_public_hostname="master.openshift.example.com"
# host group for etcd
[etcd]
master.openshift.example.com openshift_public_hostname="master.openshift.example.com"
# host group for nodes, includes region info
[nodes]
master.openshift.example.com openshift_public_hostname="master.openshift.example.com" openshift_node_group_name='node-config-master-infra'
node[1:2].openshift.example.com openshift_public_hostname="node-[1:2].openshift.example.com" openshift_node_group_name='node-config-compute'
10 安装过程中可能出现的错误情况
10.1 如果安装openshift_cluster_monitoring_operator_install ,对master需要设置openshift_node_group_name='node-config-master-infra'
参考https://github.com/vorburger/opendaylight-coe-kubernetes-openshift/issues/5
10.2 对于代理设置,需要设置no_proxy
参考https://github.com/openshift/openshift-ansible/issues/11365
10.3 https://github.com/openshift/openshift-ansible/issues/10427
10.3.1 FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created #10427
File /etc/sysconfig/network-scripts/ifcfg-eth0 (CentOS)
There is a flag NM_CONTROLLED=no
10.3.2 FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created #10427
I have the same issue, but what I did was....
Add NM_CONTROLLED=yes to ifcfg-eth0 to all my nodes
Verify my pods with $oc get pods --all-namespaces
$oc describe [pod cluster-monitoring-operator-WXYZ-ASDF] -n openshift-monitoring ==> With this command, in last part I could see reason with my pod didn't initiate, I have this message....
Warning FailedCreatePodSandBox 1h kubelet, infra-openshift-nuuptech Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "70719b9ee2bb9c54fc1d866a6134b229b3c1c151148c9558ea0a4ef8cb66526a" network for pod "cluster-monitoring-operator-67579f5cb5-gxmwc": NetworkPlugin cni failed to set up pod "cluster-monitoring-operator-67579f5cb5-gxmwc_openshift-monitoring" network:failed to find plugin "bridge" in path [/opt/cni/bin], failed to clean up sandbox container "70719b9ee2bb9c54fc1d866a6134b229b3c1c151148c9558ea0a4ef8cb66526a" network for pod "cluster-monitoring-operator-67579f5cb5-gxmwc": NetworkPlugin cni failed to teardown pod "cluster-monitoring-operator-67579f5cb5-gxmwc_openshift-monitoring" network: failed to find plugin "bridge" in path [/opt/cni/bin]]
I searched what is in bold, and I find a next solution.....
$ls -l /etc/cni/net.d ==> Normally the only file should be 80-openshift-network.conf, and I had 3 files
$ ls -l /etc/cni/net.d
-rw-r--r--. 1 root root 294 Mar 12 16:46 100-crio-bridge.conf
-rw-r--r--. 1 root root 54 Mar 12 16:46 200-loopback.conf
-rw-r--r--. 1 root root 83 May 15 16:15 80-openshift-network.conf
Red Hat suggest delete extra files and only keep 80-openshift-network.conf, but I only move 100-crio-bridge.conf and 200-loopback.conf to other directory. After do that, I reboot all my nodes, and in master node I executeplaybooks/openshift-monitoring/config.ymlagain and it worked.
11 安装成功后登陆用户创建
由于admin无法直接登陆,需要创建用户
11.1 用htpasswd创建dev/dev的用户
htpasswd -b /etc/origin/master/htpasswd dev dev
11.2 给dev用户添加集群管理员权限,这样可以访问集群内所有项目
# oc login -u system:admin
# htpasswd -b /etc/origin/master/htpasswd dev dev
# oc adm policy add-cluster-role-to-user cluster-admin dev
[root@master openshift-ansible]# oc get clusterrolebindings |grep dev
cluster-admin-0 /cluster-admin dev
11.3 访问https://master.openshift.example.com:8443
输入用户名dev 密码dev
12 卸载 openshift
ansible-playbook -i hosts.311 /usr/share/ansible/openshift-ansible/playbooks/adhoc/uninstall.yml