背景

公司采用的是华为的 CCE 容器引擎,之前的一个集群版本为 1.0,而且规划的可用区比较分散,于是新建了一个 turbo 集群,准备将之前旧集群的服务迁移到新的集群中。下面介绍的迁移方案同样适用大部分其它环境的 k8s 集群。

简介:

Velero 是一种云原生的 Kubernetes 容灾解决方案,支持标准的K8S集群,既可以是私有云平台也可以是公有云。除了灾备之外它还能做资源移转,支持把容器应用从一个集群迁移到另一个集群。它是一个开源的安全备份和恢复工具,执行灾难恢复,迁移 Kubernetes 集群资源和持久的卷。
该方案功能强大,但是操作稍微复杂些,还需要安装相应的客户端和服务端,如果你只需要迁移简单的 deployment、configmap、secret,不涉及 pv 等复杂迁移,可以参考 k8s 集群简单迁移

Velero工作流程:

  1. 流程图
    image.png

    image.png

  2. 备份过程
    本地 Velero 客户端发送备份指令。
    Kubernetes 集群内就会创建一个 Backup 对象。
    BackupController 监测 Backup 对象并开始备份过程。
    BackupController 会向 API Server 查询相关数据。
    BackupController 将查询到的数据备份到远端的对象存储。

    从以上过程中能看出,使用velero备份k8s集群,需要一个store来存储备份数据,这就需要用到velero-plugin,velero-plugi中,有aws以及阿里云的插件,但是没有华为
    云直接使用的,经过与华为云沟通,说obs也是s3协议的,可以直接用aws插件。

    好了闲话不多说,开始实操。实操前多说一句,不要看百度里任何关于 velero 迁移的资料,像什么用 minio,用 ymal 创建服务端啥的,各种抄来抄去,容易被误导。

安装篇

velero一共分为两部分,分别是服务端和客户端
  • 服务端:运行在源以及目标两个k8s集群中
  • 客户端:运行在本地的命令行工具,本地需要配置好kubectl。

velero和velero-plugin版本对应关系如下:
image.png

客户端安装

[root@ecs-prod-abite-dm-0001 ~]# wget https://github.com/vmware-tanzu/velero/releases/download/v1.6.3/velero-v1.6.3-linux-amd64.tar.gz
[root@ecs-prod-abite-dm-0001 ~]# tar zxvf velero-v1.6.3-linux-amd64.tar.gz
[root@ecs-prod-abite-dm-0001 ~]# cp velero-v1.6.3-linux-amd64/velero /usr/local/bin/
velero version 查看版本

创建华为云credentials文件

添加华为云的ak,sk,后边安装时会调用
[root@ecs-prod-abite-dm-0001 ~]# cat abite/velero-credentials

[default]
aws_access_key_id = 这里填写华为云账号ak
aws_secret_access_key = 这里填写华为云账号sk

创建华为云obs

红框里后边安装会调用
image.png

源k8s集群安装

需要安装velero,velero-plugin,velero-restic-helper(迁移pvc使用)
使用restic方式迁移pvc

安装velero+插件

velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.2.1 \
--bucket abite-velero-backup \
--secret-file ./velero-credentials \
--use-restic \
--use-volume-snapshots=false \
--backup-location-config region=cn-north-4,s3ForcePathStyle="true",s3Url=http://obs.cn-north-4.myhuaweicloud.com --image=swr.cn-north-4.myhuaweicloud.com/prom/velero:v1.6.3 --plugins=swr.cn-north-4.myhuaweicloud.com/prom/velero-plugin-for-aws:v1.2.1

下边贴一下执行结果【源集群】

[root@ecs-prod-abite-dm-0001 abite]# velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.2.1 --bucket abite-velero-backup --secret-file ./velero-credentials --use-restic --use-volume-snapshots=false --backup-location-config region=cn-north-4,s3ForcePathStyle="true",s3Url=http://obs.cn-north-4.myhuaweicloud.com --image=swr.cn-north-4.myhuaweicloud.com/prom/velero:v1.6.3 --plugins=swr.cn-north-4.myhuaweicloud.com/prom/velero-plugin-for-aws:v1.2.1
CustomResourceDefinition/backups.velero.io: attempting to create resource
CustomResourceDefinition/backups.velero.io: attempting to create resource client
CustomResourceDefinition/backups.velero.io: created
CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource
CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource client
CustomResourceDefinition/backupstoragelocations.velero.io: created
CustomResourceDefinition/deletebackuprequests.velero.io: attempting to create resource
CustomResourceDefinition/deletebackuprequests.velero.io: attempting to create resource client
CustomResourceDefinition/deletebackuprequests.velero.io: created
CustomResourceDefinition/downloadrequests.velero.io: attempting to create resource
CustomResourceDefinition/downloadrequests.velero.io: attempting to create resource client
CustomResourceDefinition/downloadrequests.velero.io: created
CustomResourceDefinition/podvolumebackups.velero.io: attempting to create resource
CustomResourceDefinition/podvolumebackups.velero.io: attempting to create resource client
CustomResourceDefinition/podvolumebackups.velero.io: created
CustomResourceDefinition/podvolumerestores.velero.io: attempting to create resource
CustomResourceDefinition/podvolumerestores.velero.io: attempting to create resource client
CustomResourceDefinition/podvolumerestores.velero.io: created
CustomResourceDefinition/resticrepositories.velero.io: attempting to create resource
CustomResourceDefinition/resticrepositories.velero.io: attempting to create resource client
CustomResourceDefinition/resticrepositories.velero.io: created
CustomResourceDefinition/restores.velero.io: attempting to create resource
CustomResourceDefinition/restores.velero.io: attempting to create resource client
CustomResourceDefinition/restores.velero.io: created
CustomResourceDefinition/schedules.velero.io: attempting to create resource
CustomResourceDefinition/schedules.velero.io: attempting to create resource client
CustomResourceDefinition/schedules.velero.io: created
CustomResourceDefinition/serverstatusrequests.velero.io: attempting to create resource
CustomResourceDefinition/serverstatusrequests.velero.io: attempting to create resource client
CustomResourceDefinition/serverstatusrequests.velero.io: created
CustomResourceDefinition/volumesnapshotlocations.velero.io: attempting to create resource
CustomResourceDefinition/volumesnapshotlocations.velero.io: attempting to create resource client
CustomResourceDefinition/volumesnapshotlocations.velero.io: created
Waiting for resources to be ready in cluster...
Namespace/velero: attempting to create resource
Namespace/velero: attempting to create resource client
Namespace/velero: created
ClusterRoleBinding/velero: attempting to create resource
ClusterRoleBinding/velero: attempting to create resource client
ClusterRoleBinding/velero: created
ServiceAccount/velero: attempting to create resource
ServiceAccount/velero: attempting to create resource client
ServiceAccount/velero: created
Secret/cloud-credentials: attempting to create resource
Secret/cloud-credentials: attempting to create resource client
Secret/cloud-credentials: created
BackupStorageLocation/default: attempting to create resource
BackupStorageLocation/default: attempting to create resource client
BackupStorageLocation/default: created
Deployment/velero: attempting to create resource
Deployment/velero: attempting to create resource client
Deployment/velero: created
DaemonSet/restic: attempting to create resource
DaemonSet/restic: attempting to create resource client
DaemonSet/restic: created
Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.

安装过程中的小问题

安装过程出现了无法拉取海外镜像的问题,最终通过本地下载,然后传到华为云swr方式进行处理。
当 restic,velero 以及 restic-helper 三个容器启动成功后,就可以进行备份工作了。
目标集群只需安装 velero 以及 resitc 即可。至此,客户端、服务端安装工作完成。

备份篇

备份源集群数据。

只备份default,gitlab-runner,monitoring三个namespace中的pod

[root@ecs-prod-abite-dm-0001 abite]# velero backup create abite-backup --include-namespaces default,gitlab-runner,monitoring --default-volumes-to-restic

Backup request "abite-backup" submitted successfully.
Run `velero backup describe abite-backup` or `velero backup logs abite-backup` for more details.

查看备份状态。

通过如下命令进行备份进度查看,当出现complete的时候,表示备份完成,错误数据在obs的logs中查看

[root@ecs-prod-abite-dm-0001 abite]# velero backup describe abite-backup
[root@ecs-prod-abite-dm-0001 abite]# kubectl get backup -n velero abite-backup -o yaml

查看备份过程中的错误。

解压如下目录下的文件,查看备份过程中出现的错误
image.png

还原篇

目标集群安装

安装 velero 以及 retics 即可
创建velero (kubectl切换至目的集群)

[root@ecs-prod-abite-dm-0001 abite]# velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.2.1 --bucket abite-velero-backup --secret-file ./velero-credentials --use-restic --use-volume-snapshots=false --backup-location-config region=cn-north-4,s3ForcePathStyle="true",s3Url=http://obs.cn-north-4.myhuaweicloud.com
CustomResourceDefinition/backups.velero.io: attempting to create resource
CustomResourceDefinition/backups.velero.io: attempting to create resource client
CustomResourceDefinition/backups.velero.io: created
CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource
CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource client
CustomResourceDefinition/backupstoragelocations.velero.io: created
CustomResourceDefinition/deletebackuprequests.velero.io: attempting to create resource
CustomResourceDefinition/deletebackuprequests.velero.io: attempting to create resource client
CustomResourceDefinition/deletebackuprequests.velero.io: created
CustomResourceDefinition/downloadrequests.velero.io: attempting to create resource
CustomResourceDefinition/downloadrequests.velero.io: attempting to create resource client
CustomResourceDefinition/downloadrequests.velero.io: created
CustomResourceDefinition/podvolumebackups.velero.io: attempting to create resource
CustomResourceDefinition/podvolumebackups.velero.io: attempting to create resource client
CustomResourceDefinition/podvolumebackups.velero.io: created
CustomResourceDefinition/podvolumerestores.velero.io: attempting to create resource
CustomResourceDefinition/podvolumerestores.velero.io: attempting to create resource client
CustomResourceDefinition/podvolumerestores.velero.io: created
CustomResourceDefinition/resticrepositories.velero.io: attempting to create resource
CustomResourceDefinition/resticrepositories.velero.io: attempting to create resource client
CustomResourceDefinition/resticrepositories.velero.io: created
CustomResourceDefinition/restores.velero.io: attempting to create resource
CustomResourceDefinition/restores.velero.io: attempting to create resource client
CustomResourceDefinition/restores.velero.io: created
CustomResourceDefinition/schedules.velero.io: attempting to create resource
CustomResourceDefinition/schedules.velero.io: attempting to create resource client
CustomResourceDefinition/schedules.velero.io: created
CustomResourceDefinition/serverstatusrequests.velero.io: attempting to create resource
CustomResourceDefinition/serverstatusrequests.velero.io: attempting to create resource client
CustomResourceDefinition/serverstatusrequests.velero.io: created
CustomResourceDefinition/volumesnapshotlocations.velero.io: attempting to create resource
CustomResourceDefinition/volumesnapshotlocations.velero.io: attempting to create resource client
CustomResourceDefinition/volumesnapshotlocations.velero.io: created
Waiting for resources to be ready in cluster...
Namespace/velero: attempting to create resource
Namespace/velero: attempting to create resource client
Namespace/velero: already exists, proceeding
Namespace/velero: created
ClusterRoleBinding/velero: attempting to create resource
ClusterRoleBinding/velero: attempting to create resource client
ClusterRoleBinding/velero: created
ServiceAccount/velero: attempting to create resource
ServiceAccount/velero: attempting to create resource client
ServiceAccount/velero: created
Secret/cloud-credentials: attempting to create resource
Secret/cloud-credentials: attempting to create resource client
Secret/cloud-credentials: created
BackupStorageLocation/default: attempting to create resource
BackupStorageLocation/default: attempting to create resource client
BackupStorageLocation/default: created
Deployment/velero: attempting to create resource
Deployment/velero: attempting to create resource client
Deployment/velero: created
DaemonSet/restic: attempting to create resource
DaemonSet/restic: attempting to create resource client
DaemonSet/restic: created
Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.

还原k8s资源

kubectl切换至目的集群

[root@ecs-prod-abite-dm-0001 abite]# velero restore create abite-restore --from-backup=abite-backup

查看还原状态

[root@ecs-prod-abite-dm-0001 abite]# velero restore describe abite-restore
[root@ecs-prod-abite-dm-0001 abite]# kubectl get restore -n velero abite-restore -o ymal

还原原则。

velero还原的时候,是按照镜像的模式还原的,就是说源集群长什么样,目的集群就是什么样。包括所有service、pvc等一切资源,这就造成的一些后续的坑。

踩坑篇

  1. pvc
    问题:源群的pvc使用磁盘在可用区1,迁移到目的集群还是可用区1。但是目的集群的节点是可用区7,这就造成了pvc无法正常挂载pod的问题。
    解决方法:删掉pvc,之后新建yaml文件进行新建。新建后,要重新执行velero还原命令,重新执行后,已经还原的资源会跳过。

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
     name: pvc-prometheus-server-0
     namespace: monitoring
    spec:
     accessModes:
    ReadWriteOnce
     resources:
     requests:
     storage: 10Gi
     storageClassName: csi-disk-topology
     volumeMode: Filesystem
  2. service(仅限于lb类型的)
    问题:源集群的lb类型的service迁移到目的集群后,一个elb地址等于是挂载了两个端口,迁移后是否会造成域名不可访问到后端服务器组的问题。
    解决方法:目的集群只是源集群中service的镜像,虽然创建出来了,但是不提供任何服务。只需要把源集群中的lb类型的service删除,3-5分钟后,service会自动注册到新
    集群上。
  3. 网络问题
    问题:北京、厦门、vpn无法访问开发环境内pod资源。例如dubbo等服务
    解决昂发:华为云创建turbo集群的时候,会自动创建3个安全组,其中cni安全组负责pod入方向的访问控制,在其中的入方向加入需要访问的网段即可。

点赞(13) 打赏

评论列表 共有 0 条评论

暂无评论
立即
投稿
发表
评论
返回
顶部