kubernetes系列之ETCD部署

念宗 6年前 ( 2019-04-27 ) 6178 抢沙发

默认

摘要： etcd 是一个分布式一致性k-v存储系统，可用于服务注册发现与共享配置如果是线下或者实验环境，部署一台etcd即可。生产环境则需要考虑高可用，3台或者5台进行集群部署，以3台为例...

etcd 是一个分布式一致性k-v存储系统，可用于服务注册发现与共享配置

如果是线下或者实验环境，部署一台etcd即可。生产环境则需要考虑高可用，3台或者5台进行集群部署，以3台为例：

下载地址：https://github.com/etcd-io/etcd/releases

安装方式有两种，一种是下载完整二进制包，但是里面没有标准的配置模板，有些文档。另外是yum安装，centos7已经支持到3.2.22了。

关于配置文件和系统启动脚本都可以参考yum安装之后的文件，然后下载自己需要的版本的二进制包即可。

ETCD读写性能

按照官网给出的[Benchmark], 在2CPU，1.8G内存，SSD磁盘这样的配置下，单节点的写性能可以达到16K QPS, 而先写后读也能达到12K QPS。这个性能还是相当可观的。

[root@node01 ~]# grep -v '^#' /etc/etcd/etcd.conf 
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_LISTEN_CLIENT_URLS="http://localhost:2379"
ETCD_NAME="default"
ETCD_ADVERTISE_CLIENT_URLS="http://localhost:2379"
把相关值改成实际环境：[单机环境这几个参数为必选]
ETCD_NAME="etcd01"
ETCD_DATA_DIR="/var/lib/etcd/"
ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://localhost:2379"

默认2379是对外开放服务端口，2380则是etcd集群部署内部通信端口，比如选举时。

ETCD_LISTEN_CLIENT_URLS里的127.0.0.1:2379单机部署可以不加，但是etcdctl member list会报错。

启动脚本：注意配置文件路径与data dir的目录要和启动脚本里的保持一致

注意启动用户，如果是etcd则对/var/lib/etcd/目录进行权限修改。

[root@node01 ~]# cat /usr/lib/systemd/system/etcd.service 
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
EnvironmentFile=-/etc/etcd/etcd.conf
User=etcd
# set GOMAXPROCS to number of processors
# ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd --name=\"${ETCD_NAME}\" --data-dir=\"${ETCD_DATA_DIR}\" --listen-client-urls=\"${ETCD_LISTEN_CLIENT_URLS}\""
ExecStart=/usr/bin/etcd
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

useradd -s /sbin/nologin etcd
mkdir -p /var/lib/etcd
chown -R etcd.etcd /var/lib/etcd
systemctl daemon-reload
systemctl start etcd
systemctl enable etcd

测试：

列出目录：etcdctl ls

创建目录：etcdctl mkdir /nianzong

添加键值：etcdctl mk /nianzong/foo 'bar'

查：etcdctl get /nianzong/foo

etcdctl远程访问

etcdctl --endpoints=http://192.168.10.100:2379 ls /

[root@etcd01 ~]# etcdctl mkdir /nianzong
[root@etcd01 ~]# etcdctl mk /nianzong/foo 'bar'
bar
[root@etcd01 ~]# etcdctl get /nianzong/foo
bar

详细的增删改查使用方法etcdctl --help

2. etcd集群部署：

当可以预估etcd集群的使用量以及明确知道集群的成员的时候，可以静态方式部署集群。但大部分情况下这两个无法确定的时候，可以使用动态方式部署集群。

etcd.conf配置：

[root@etcd03 ~]# cat /etc/etcd/etcd.conf 
ETCD_NAME="etcd03"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_LISTEN_CLIENT_URLS="http://192.168.10.107:2379,http://127.0.0.1:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.10.107:2379"
# cluster :
ETCD_LISTEN_PEER_URLS="http://192.168.10.107:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.10.107:2380"
ETCD_INITIAL_CLUSTER="etcd01=http://192.168.10.105:2380,etcd02=http://192.168.10.106:2380,etcd03=http://192.168.10.107:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="nianzong-etcd-cluster"

跟集群有关的参数说明一下：

initial-cluster-state说明单节点启动时候的状态，第一次的时候可以设置为new，以后节点重启时这个参数值改为 existing；

initial-cluster列出了cluster的初始成员，cluster启动后可通过命令 etcdctl member update 进行更改；

initial-cluster-token用于标识集群的名称，initial-cluster则给出了静态cluster的各个成员的名称以及地址

选举之后的几种角色：

leader-相当于主节点

candidate-似乎是成为leader之前的过度角色

follower- 除了称为leader之外最终都会称为follower

启动脚本实际上跟单机部署脚本是一样的

[root@etcd03 ~]# cat /usr/lib/systemd/system/etcd.service 
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
EnvironmentFile=-/etc/etcd/etcd.conf
User=etcd
# set GOMAXPROCS to number of processors
ExecStart=/usr/bin/etcd
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

# 每个节点上都可以查看集群成员状态：
[root@etcd01 ~]# etcdctl member list
2693e19032c4f545: name=etcd02 peerURLs=http://192.168.10.106:2380 clientURLs=http://192.168.10.106:2379 isLeader=false
71c371de4823cb8a: name=etcd03 peerURLs=http://192.168.10.107:2380 clientURLs=http://192.168.10.107:2379 isLeader=false
924dd8784841a270: name=etcd01 peerURLs=http://192.168.10.105:2380 clientURLs=http://192.168.10.105:2379 isLeader=true

# 查看集群健康状态：
[root@etcd01 ~]# etcdctl cluster-health
member 2693e19032c4f545 is healthy: got healthy result from http://192.168.10.106:2379
member 71c371de4823cb8a is healthy: got healthy result from http://192.168.10.107:2379
member 924dd8784841a270 is healthy: got healthy result from http://192.168.10.105:2379
cluster is healthy

etcdctl member list
etcdctl cluster-health    #集群健康状态检查
curl http://192.168.10.106:2379/health    #单独查看某个成员健康状态

[root@node01 ~]# for h in etcd01 etcd02 etcd03;do curl -s http://$h:2379/health|xargs printf "$h %s %s\n";done
etcd01 {health: true}
etcd02 {health: true}
etcd03 {health: true}

典型错误：Jan 27 01:10:52 etcd01 etcd: request cluster ID mismatch (got 1cfa861e7e5adb72 want cdf818194e3a8c32)

删除etcd数据目录，重启每一个节点etcd服务，注意3个节点重启的间隔时间不要太长，否则可能找不到。[new状态启动服务的间隔时间尽量短一点,否则就得改状态加进集群里去了]

读写测试：

1）先在被选举为leader的etcd01上进行写操作：

[root@etcd01 ~]# etcdctl ls /
[root@etcd01 ~]# etcdctl mk /www 'pyops.net'
pyops.net
[root@etcd01 ~]# etcdctl ls /
/www
[root@etcd01 ~]# etcdctl get www
pyops.net

然后到etcd02/etcd03上进行查询，验证数据是否同步：

# etcd02:
[root@etcd02 ~]# etcdctl ls /
[root@etcd02 ~]# etcdctl ls /    
/www
[root@etcd02 ~]# etcdctl get /www
pyops.net
# etcd03:
[root@etcd03 ~]# etcdctl get /www
pyops.net

2）在非leader节点上进行写操作：

# 在etcd02上添加k/v:
[root@etcd02 ~]# etcdctl mk /etcd02 'write in etcd02'   
write in etcd02
[root@etcd01 ~]# etcdctl get /etcd02
write in etcd02
[root@etcd03 ~]# etcdctl get /etcd02
write in etcd02
# 在etcd03上添加k/v:
[root@etcd03 ~]# etcdctl mk /etcd03 'write in etcd03'   
write in etcd03
[root@etcd02 ~]# etcdctl get /etcd03
write in etcd03
[root@etcd01 ~]# etcdctl get /etcd03
write in etcd03
实验过程可以证明数据同步应该是毫秒级别,对于K8S应用来说足以应对。

# 每个节点上都可以查看集群成员状态：
[root@etcd01 ~]# etcdctl member list
2693e19032c4f545: name=etcd02 peerURLs=http://192.168.10.106:2380 clientURLs=http://192.168.10.106:2379 isLeader=false
71c371de4823cb8a: name=etcd03 peerURLs=http://192.168.10.107:2380 clientURLs=http://192.168.10.107:2379 isLeader=false
924dd8784841a270: name=etcd01 peerURLs=http://192.168.10.105:2380 clientURLs=http://192.168.10.105:2379 isLeader=true
[root@etcd02 ~]# etcdctl member list    
2693e19032c4f545: name=etcd02 peerURLs=http://192.168.10.106:2380 clientURLs=http://192.168.10.106:2379 isLeader=false
71c371de4823cb8a: name=etcd03 peerURLs=http://192.168.10.107:2380 clientURLs=http://192.168.10.107:2379 isLeader=false
924dd8784841a270: name=etcd01 peerURLs=http://192.168.10.105:2380 clientURLs=http://192.168.10.105:2379 isLeader=true
[root@etcd03 ~]# etcdctl member list   
2693e19032c4f545: name=etcd02 peerURLs=http://192.168.10.106:2380 clientURLs=http://192.168.10.106:2379 isLeader=false
71c371de4823cb8a: name=etcd03 peerURLs=http://192.168.10.107:2380 clientURLs=http://192.168.10.107:2379 isLeader=false
924dd8784841a270: name=etcd01 peerURLs=http://192.168.10.105:2380 clientURLs=http://192.168.10.105:2379 isLeader=true
# 查看集群健康状态：
[root@etcd01 ~]# etcdctl cluster-health
member 2693e19032c4f545 is healthy: got healthy result from http://192.168.10.106:2379
member 71c371de4823cb8a is healthy: got healthy result from http://192.168.10.107:2379
member 924dd8784841a270 is healthy: got healthy result from http://192.168.10.105:2379
cluster is healthy
[root@etcd02 ~]# etcdctl cluster-health
member 2693e19032c4f545 is healthy: got healthy result from http://192.168.10.106:2379
member 71c371de4823cb8a is healthy: got healthy result from http://192.168.10.107:2379
member 924dd8784841a270 is healthy: got healthy result from http://192.168.10.105:2379
cluster is healthy
[root@etcd03 ~]# etcdctl cluster-health
member 2693e19032c4f545 is healthy: got healthy result from http://192.168.10.106:2379
member 71c371de4823cb8a is healthy: got healthy result from http://192.168.10.107:2379
member 924dd8784841a270 is healthy: got healthy result from http://192.168.10.105:2379
cluster is healthy

节点删除

[root@etcd01 ~]# etcdctl member remove 71c371de4823cb8a
Removed member 71c371de4823cb8a from cluster
[root@etcd01 ~]# etcdctl member list
2693e19032c4f545: name=etcd02 peerURLs=http://192.168.10.106:2380 clientURLs=http://192.168.10.106:2379 isLeader=false
924dd8784841a270: name=etcd01 peerURLs=http://192.168.10.105:2380 clientURLs=http://192.168.10.105:2379 isLeader=true
执行remove其实就是向改节点发送服务停止的指令，并将节点从列表中删除。此时去对应节点查看服务状态显示stop

添加节点：

[root@etcd01 ~]# etcdctl member add etcd03 http://192.168.10.107:2380
Added member named etcd03 with ID 608f768e1268b434 to cluster

ETCD_NAME="etcd03"
ETCD_INITIAL_CLUSTER="etcd02=http://192.168.10.106:2380,etcd03=http://192.168.10.107:2380,etcd01=http://192.168.10.105:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
[root@etcd01 ~]# etcdctl member list
2693e19032c4f545: name=etcd02 peerURLs=http://192.168.10.106:2380 clientURLs=http://192.168.10.106:2379 isLeader=false
608f768e1268b434[unstarted]: peerURLs=http://192.168.10.107:2380
924dd8784841a270: name=etcd01 peerURLs=http://192.168.10.105:2380 clientURLs=http://192.168.10.105:2379 isLeader=true
[root@etcd03 ~]# rm -fr /var/lib/etcd/member/*
[root@etcd03 ~]# systemctl start etcd   
# 如果之前节点已经启动过，记得删除数据再启动，否则起不来
rm -fr /var/lib/etcd/member/*

注意：如果etcd服务不开启双向证书认证，那么运维/dba要严格控制好etcd服务器的登录访问权限。

ETCD采用证书认证方式部署

创建证书脚本：

[root@master01 ssl]# cat create-etcd.sh 
#!/bin/bash
# Author：daihaijun

cd /root/k8s/ssl
# 创建证书签名请求
cat > etcd-csr.json << EOF
{
    "CN": "etcd",
    "hosts": [
      "127.0.0.1",
      "192.168.10.105",
      "192.168.10.106",
      "192.168.10.107"
    ],
    "key": {
        "algo": "rsa",
        "size": 2048
    },
    "names": [
        {
            "C": "CN",
            "L": "Hangzhou",
            "ST": "ZJ",
            "O": "k8s",
            "OU": "System"
        }
    ]
}
EOF

# 生成证书和私钥
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes etcd-csr.json | cfssljson -bare etcd

# check pem
echo "=====确认证书内容属性!!!====="
openssl x509  -noout -text -in etcd.pem|egrep "Issuer:|Subject:|DNS"

hosts 字段指定授权使用该证书的 etcd 节点 IP 或域名列表，这里将 etcd 集群的三个节点 IP 都列在其中；127.0.0.1可用于本地测试
除了CN名称，其他信息保持跟kubectl/master等相同

分发证书和私钥到各个etcd节点：[不要忘了ca证书]

[root@master01 ssl]# for host in etcd01 etcd02 etcd03;do scp ca.pem etcd-key.pem etcd.pem $host:/etc/kubernetes/ssl/;done

配置文件和启动脚本，可以参考kubernetes/cluster/centos/master/scripts/etcd.sh

创建 etcd配置文件：

[root@etcd03 system]# cat /etc/etcd/etcd.conf
#[member]
ETCD_NAME="etcd03"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_LISTEN_CLIENT_URLS="https://192.168.10.107:2379,https://127.0.0.1:2379"
#[cluster]
ETCD_ADVERTISE_CLIENT_URLS="https://192.168.10.107:2379"
ETCD_LISTEN_PEER_URLS="https://192.168.10.107:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.10.107:2380"
ETCD_INITIAL_CLUSTER="etcd01=https://192.168.10.105:2380,etcd02=https://192.168.10.106:2380,etcd03=https://192.168.10.107:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="nianzong-etcd-cluster"
#[security]
CLIENT_CERT_AUTH="true"
ETCD_CERT_FILE="/etc/kubernetes/ssl/etcd.pem"
ETCD_KEY_FILE="/etc/kubernetes/ssl/etcd-key.pem"
ETCD_TRUSTED_CA_FILE="/etc/kubernetes/ssl/ca.pem"
ETCD_PEER_CERT_FILE="/etc/kubernetes/ssl/etcd.pem"
ETCD_PEER_KEY_FILE="/etc/kubernetes/ssl/etcd-key.pem"
ETCD_PEER_CLIENT_CERT_AUTH="true"
ETCD_PEER_TRUSTED_CA_FILE="/etc/kubernetes/ssl/ca.pem"

--client-cert-auth: When this is set etcd will check all incoming HTTPS requests for a client certificate signed by the trusted CA, requests that don't supply a valid client certificate will fail. If authentication is enabled, the certificate provides credentials for the user name given by the Common Name field.
--peer-client-cert-auth: When set, etcd will check all incoming peer requests from the cluster for valid client certificates signed by the supplied CA.

上面这两个参数不用设值,出现这两个参数就表示true。

将配置同步到其他etcd节点，并修改ip地址即可。参考：https://coreos.com/etcd/docs/latest/v2/security.html

systemd文件：

[root@etcd03 system]# cat etcd.service 
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
EnvironmentFile=-/etc/etcd/etcd.conf
User=root
# set GOMAXPROCS to number of processors
ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd"
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

启动文件所有etcd节点相同，直接同步即可，无需修改。

启动服务：

启动服务需要几台etcd节点在短时间内同时启动，因为启动过程中会检测集群中个节点的健康状态和进行选举。可以在跳板机上执行：

for etcd_svr in etcd01 etcd02 etcd03;do
    (ssh $etcd_svr "systemctl daemon-reload && systemctl enable etcd && systemctl start etcd") &
done
wait
for etcd_svr in etcd01 etcd02 etcd03;do
    echo "===$etcd_svr==="
    ssh $etcd_svr "systemctl status etcd"
done

查看集群状态：

[root@etcd03 system]# etcdctl --ca-file=/etc/kubernetes/ssl/ca.pem --cert-file=/etc/kubernetes/ssl/etcd.pem --key-file=/etc/kubernetes/ssl/etcd-key.pem --endpoints=https://127.0.0.1:2379/ cluster-health
member 12b8881fd61b4ada is healthy: got healthy result from https://192.168.10.105:2379
member 5dd60f65dbd70740 is healthy: got healthy result from https://192.168.10.107:2379
member f1c6a878c78e49eb is healthy: got healthy result from https://192.168.10.106:2379
cluster is healthy
[root@etcd03 system]# etcdctl --ca-file=/etc/kubernetes/ssl/ca.pem --cert-file=/etc/kubernetes/ssl/etcd.pem --key-file=/etc/kubernetes/ssl/etcd-key.pem --endpoints=https://127.0.0.1:2379/ member list
12b8881fd61b4ada: name=etcd01 peerURLs=https://192.168.10.105:2380 clientURLs=https://192.168.10.105:2379 isLeader=true
5dd60f65dbd70740: name=etcd03 peerURLs=https://192.168.10.107:2380 clientURLs=https://192.168.10.107:2379 isLeader=false
f1c6a878c78e49eb: name=etcd02 peerURLs=https://192.168.10.106:2380 clientURLs=https://192.168.10.106:2379 isLeader=false

使用证书认证之后，客户端访问每次都必须带上证书，非常麻烦，后续文章会分享一个小技巧。

查询apiserver写入到etcd集群的数据：

这里需要注意的是，etcd的api目前有v2和v3两个版本，两个版本之间是看不到数据的。kubernetes使用的是v3，所以使用etcdctl ls /去查询是看不到有/registry目录的。

切换到v3：export ETCDCTL_API=3etcdctl get / --prefix --keys-only[V3不支持etcdctl ls反倒没那么方便了]

etcdctl3 version

[root@etcd01 ~]# etcdctl3 member list

查看根目录：

[root@etcd01 ~]# etcdctl3 get / --prefix --keys-only

文章版权及转载声明：

作者:念宗本文地址：http://pyops.net/?id=52发布于 6年前 ( 2019-04-27 )
文章转载或复制请以超链接形式并注明出处运维之道

分享到：网站分享代码

打赏