etcd被设计为一个提供高可用、强一致的小型KeyValue数据存储服务,在Kubernetes集群中是一个非常重要的组件,用于保存Kubernetes的网络配置信息和Kubernetes对象的元数据信息,一旦etcd挂了,那么也就意味着整个集群就不可用了,所以在实际情况生产环境中我们会部署多个etcd节点,组成etcd高可用集群,在etcd高可用集群中可挂节点数为(n-1)/2个,比如你有3个节点做etcd-ha那么最多可以挂掉(3-1)/2个节点,虽然有HA集群可以保证一定的高可用,但在实际生产环境中还是建议将etcd数据进行备份,方便在出现问题时进行还原。
Enviroment Versiom:
OS:debian 9
docker:18.09.6
Kubernetes:v1.14.6
etcd:v3.4
192.168.2.14 etcd-01
192.168.2.15 etcd-02
192.168.2.16 etcd-03
备份 etcd 集群可以通过两种方式完成:etcd 内置快照和卷快照
etcd 支持内置快照,因此备份 etcd 集群很容易。快照可以从使用 etcdctl snapshot save
命令的活动成员中获取,也可以通过从 etcd 数据目录 复制 member/snap/db
文件,该 etcd 数据目录目前没有被 etcd 进程使用。datadir
位于 $DATA_DIR/member/snap/db
。获取快照通常不会影响成员的性能。
root@debian:~# mkdir -p /var/lib/etcd_backup/
root@debian:~#
root@debian:/var/lib# ETCDCTL_API=3 etcdctl snapshot save /var/lib/etcd_backup/etcd_$(date "+%Y%m%d%H%M%S").db
{"level":"info","ts":1568093500.2938907,"caller":"snapshot/v3_snapshot.go:109","msg":"created temporary db file","path":"/var/lib/etcd_backup/etcd_20190910133139.db.part"}
{"level":"warn","ts":"2019-09-10T13:31:40.448+0800","caller":"clientv3/retry_interceptor.go:116","msg":"retry stream intercept"}
{"level":"info","ts":1568093500.4698584,"caller":"snapshot/v3_snapshot.go:120","msg":"fetching snapshot","endpoint":"127.0.0.1:2379"}
{"level":"info","ts":1568093501.8322825,"caller":"snapshot/v3_snapshot.go:133","msg":"fetched snapshot","endpoint":"127.0.0.1:2379","took":1.535686665}
{"level":"info","ts":1568093501.833143,"caller":"snapshot/v3_snapshot.go:142","msg":"saved","path":"/var/lib/etcd_backup/etcd_20190910133139.db"}
Snapshot saved at /var/lib/etcd_backup/etcd_20190910133139.db
root@debian:/var/lib# cd etcd_backup/
root@debian:/var/lib/etcd_backup# ETCDCTL_API=3 etcdctl --write-out=table snapshot status etcd_20190910133139.db
+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| dd8d9bc7 | 1325029 | 1101 | 7.3 MB |
+----------+----------+------------+------------+
etcd_backup.sh
#!/bin/bash
# etcd v3.4 backup script
# create: 2019/09/10 11:00
BACKUP_DIR="/var/lib/etcd_backup"
ENDPOINTS='https://127.0.0.1:2379'
DATA=`date +%Y%m%d%H%M`
export ETCDCTL_API=3
etcdctl --endpoints=$ENDPOINTS \
--cert=/etc/etcd/ssl/etcd.pem \
--key=/etc/etcd/ssl/etcd-key.pem \
--cacert=/etc/etcd/ssl/etcd-ca.pem snapshot save $BACKUP_DIR/etcd_snapshot_$DATA.db
# backup 15 days snapshot file
if [ -d $BACKUP_DIR ]; then
cd ${BACKUP_DIR} && find . -mtime +15 |xargs rm -f {} \;
fi
将脚本添加到cronjob中,定时每2小时执行一次
在Kubernetes中我们可以以cronjob的形式在集群中运行,并在pod中挂载外部存储,以做到数据的持久化
在创建Cronjob之前得把etcd certificates 通过secrets的形式挂载
D="$(mktemp -d)"
cp /etc/kubernetes/ssl/{ca,kubernetes,kubernetes-key}.pem $D
kubectl -n kube-system create secret generic etcd-certs --from-file="$D"
rm -fr "$D"
etcd-backup-cronjob.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: etcd-backup-config
namespace: kube-system
data:
endpoints: "https://192.168.2.14:2379,https://192.168.2.15:2379,https://192.168.2.16:2379"
---
apiVersion: batch/v1beta1
kind: Cronjob
metadata:
name: etcd_backup
namespace: kube-system
spec:
schedule: "*/30 * * * *"
concurrencyPolicy: Allow
failedJobsHistoryLimit: 3
successfulJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
containers:
- name: etcd-backup
imagePullPolicy: IfNotPresent
image: lwieske/etcdctl:v3.4.0
env:
- name: ENDPOINTS
valueFrom:
configMapKeyRef:
name: etcd-backup-config
key: endpoints
command:
- /bin/sh
- -c
- |
set -ex
ETCDCTL_API=3 /usr/local/bin/etcdctl \
--endpoints=$ENDPOINTS \
--cacert=/certs/ca.pem \
--cert=/certs/kubernetes.pem \
--key=/certs/kubernetes-key.pem \
snapshot save /data/backup/snapshot-$(date +"%Y%m%d").db
volumeMounts:
- name: etcd-backup
mountPath: /data/backup
- name: etcd-certs
mountPath: /certs
restartPolicy: OnFailure
volumes:
- name: etcd-backup
persistentVolumeClaim:
claimName: etcd-backup
- name: etcd-certs
secret:
secretName: etcd-certs
适用公有云环境Amazon S3存储
Set up basic RBAC rules for etcd operator:
$ example/rbac/create_role.sh
Create a deployment for etcd operator:
$ kubectl create -f example/deployment.yaml
etcd operator will automatically create a Kubernetes Custom Resource Definition (CRD):
$ kubectl get customresourcedefinitions
NAME KIND
etcdclusters.etcd.database.coreos.com CustomResourceDefinition.v1beta1.apiextensions.k8s.io
Note that the etcd clusters managed by etcd operator will NOT be deleted even if the operator is uninstalled.
This is an intentional design to prevent accidental operator failure from killing all the etcd clusters.
To delete all clusters, delete all cluster CR objects before uninstalling the operator.
Clean up etcd operator:
kubectl delete -f example/deployment.yaml
kubectl delete endpoints etcd-operator
kubectl delete crd etcdclusters.etcd.database.coreos.com
kubectl delete clusterrole etcd-operator
kubectl delete clusterrolebinding etcd-operator
etcd operator is available as a Helm chart. Follow the instructions on the chart to install etcd operator on clusters.Alejandro Escobar is the active maintainer.
查看etcd集群信息
root@debian:~# export ETCDCTL_API=3
root@debian:~# etcdctl member list
610581aec980ff7, started, etcd1, https://192.168.2.14:2380, https://192.168.2.14:2379, false
d250eba9d73e634f, started, etcd2, https://192.168.2.15:2380, https://192.168.2.15:2379, false
de4be0d409c6cd2d, started, etcd3, https://192.168.2.16:2380, https://192.168.2.16:2379, false
停止Kubernetes集群中的kube-apiserver相关组件和etcd组件
systemctl stop kube-apiserver
systemctl stop kube-scheduler
systemctl stop kube-controller-manager
systemctl stop etcd
使用mv命令对etcd数据目录进行改名
mv /var/lib/etcd/data.etcd /var/lib/etcd/data.etcd_bak
分别在各个节点恢复数据,首先需要拷贝数据到每个etcd节点,假设备份数据存储在/var/lib/etcd_backup/etcd_20190910133139.db
scp /var/lib/etcd_backup/etcd_20190910133139.db root@etcd01:/var/lib/etcd_backup/
scp /var/lib/etcd_backup/etcd_20190910133139.db root@etcd02:/var/lib/etcd_backup/
scp /var/lib/etcd_backup/etcd_20190910133139.db root@etcd03:/var/lib/etcd_backup/
在需要恢复的所有etcd实例上执行恢复命令:
# ETCDCTL_API=3 etcdctl snapshot --ca-file=/etc/etcd/ssl/ca.pem --cert-file=/etc/etcd/ssl/etcd.pem --key-file=/etc/etcd/ssl/etcd-key.pem restore <备份数据> --name=<ETCD_NAME> --data-dir=<元数据存储路径> --initial-cluster=<ETCD_CLUSTER> --initial-cluster-token=<ETCD_INITIAL_CLUSTER_TOKEN>
同时启动etcd集群的所有etcd实例
# systemctl start etcd
检查etcd集群member及健康状态
# etcdctl --ca-file=/etc/etcd/ssl/ca.pem --cert-file=/etc/etcd/ssl/etcd.pem --key-file=/etc/etcd/ssl/etcd-key.pem member list
# etcdctl --ca-file=/etc/etcd/ssl/ca.pem --cert-file=/etc/etcd/ssl/etcd.pem --key-file=/etc/etcd/ssl/etcd-key.pem cluster-health
启动master节点的所有kube-apiserver kube-scheduler kube-controller-manager服务:
# systemctl start kube-apiserver
systemctl start kube-scheduler
systemctl start kube-controller-manager