一、概述
1、为什么选择 KubeVirt
在做技术选型的时候,我们评估过几个方案:
最终选择 KubeVirt 的核心原因:
2、KubeVirt 技术架构
简单说一下 KubeVirt 的核心组件:
┌─────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ virt-api │ │virt-controller│ │ virt-handler (DaemonSet)│
│ │ (Deployment)│ │ (Deployment) │ │ (per node) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────────────────────────────────────────────────┐│
│ │ libvirt + QEMU/KVM ││
│ └─────────────────────────────────────────────────────────┘│
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Node (Linux with KVM support) ││
│ └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘
3、环境要求
这是我们生产环境的配置,供参考:
硬件配置:
软件版本:
节点规模:
二、详细步骤
1、集群准备
检查硬件虚拟化支持
在每个计算节点上执行:
# Check if CPU supports virtualization
cat /proc/cpuinfo | grep -E "(vmx|svm)"
# Check if KVM module is loaded
lsmod | grep kvm
# If not loaded, load it manually
modprobe kvm
modprobe kvm_intel # For Intel CPU
# modprobe kvm_amd # For AMD CPU
# Make it persistent
echo "kvm" >> /etc/modules-load.d/kvm.conf
echo "kvm_intel" >> /etc/modules-load.d/kvm.conf
节点标签和污点配置
我们把虚机负载和容器负载隔离开,避免资源争抢:
# Label nodes for VM workloads
kubectl label node node-vm-{01..18} node-role.kubernetes.io/virtualization=true
kubectl label node node-vm-{01..18} kubevirt.io/schedulable=true
# Add taint to prevent regular pods from scheduling
kubectl taint node node-vm-{01..18} virtualization=true:NoSchedule
部署 KubeVirt Operator
# Set KubeVirt version
export KUBEVIRT_VERSION=v1.1.1
# Deploy the KubeVirt operator
kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${KUBEVIRT_VERSION}/kubevirt-operator.yaml
# Wait for operator to be ready
kubectl wait --for=condition=available --timeout=300s deployment/virt-operator -n kubevirt
# Create KubeVirt CR to deploy the components
kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${KUBEVIRT_VERSION}/kubevirt-cr.yaml
# Verify all components are running
kubectl get pods -n kubevirt
期望看到的输出:
NAME READY STATUS RESTARTS AGE
virt-api-7d5c9b8c8b-4x7k9 1/1 Running 0 5m
virt-api-7d5c9b8c8b-8j2m3 1/1 Running 0 5m
virt-controller-6c7d8f9b7-2k4n5 1/1 Running 0 5m
virt-controller-6c7d8f9b7-9x8m2 1/1 Running 0 5m
virt-handler-4k2j8 1/1 Running 0 4m
virt-handler-7m3n9 1/1 Running 0 4m
... (每个节点一个 virt-handler)
KubeVirt 配置优化
这是我们生产环境用的配置,针对大规模虚机做了专门优化:
apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
name: kubevirt
namespace: kubevirt
spec:
certificateRotateStrategy: {}
configuration:
developerConfiguration:
featureGates:
- LiveMigration
- HotplugVolumes
- HotplugNICs
- Snapshot
- VMExport
- ExpandDisks
- GPU
- HostDevices
- Macvtap
- Passt
migrations:
parallelMigrationsPerCluster: 10
parallelOutboundMigrationsPerNode: 4
bandwidthPerMigration: 1Gi
completionTimeoutPerGiB: 800
progressTimeout: 300
allowAutoConverge: true
allowPostCopy: true
network:
defaultNetworkInterface: bridge
permitBridgeInterfaceOnPodNetwork: true
permitSlirpInterface: false
smbios:
manufacturer: "KubeVirt"
product: "None"
version: "1.1.1"
supportedGuestAgentVersions:
- "4.*"
- "5.*"
permittedHostDevices:
pciHostDevices:
- pciVendorSelector: "10DE:1EB8"
resourceName: "nvidia.com/T4"
externalResourceProvider: true
customizeComponents:
patches:
- resourceType: Deployment
resourceName: virt-controller
patch: '{"spec":{"replicas":3}}'
type: strategic
- resourceType: Deployment
resourceName: virt-api
patch: '{"spec":{"replicas":3}}'
type: strategic
imagePullPolicy: IfNotPresent
workloadUpdateStrategy:
workloadUpdateMethods:
- LiveMigrate
2、存储配置
存储是整个迁移过程中最头疼的部分。我们最开始用的是 Longhorn,跑了两个月发现性能扛不住,后来切换到了 Rook-Ceph。
Rook-Ceph 集群部署
# Clone Rook repository
git clone --single-branch --branch v1.12.9 https://github.com/rook/rook.git
cd rook/deploy/examples
# Create Rook operator
kubectl create -f crds.yaml -f common.yaml -f operator.yaml
# Wait for operator to be ready
kubectl -n rook-ceph wait --for=condition=available --timeout=600s deployment/rook-ceph-operator
Ceph 集群配置文件(cluster.yaml):
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
dataDirHostPath: /var/lib/rook
cephVersion:
image: quay.io/ceph/ceph:v18.2.1
allowUnsupported: false
mon:
count: 3
allowMultiplePerNode: false
volumeClaimTemplate:
spec:
storageClassName: local-storage
resources:
requests:
storage: 50Gi
mgr:
count: 2
allowMultiplePerNode: false
modules:
- name: pg_autoscaler
enabled: true
- name: rook
enabled: true
- name: prometheus
enabled: true
dashboard:
enabled: true
ssl: true
crashCollector:
disable: false
storage:
useAllNodes: false
useAllDevices: false
config:
osdsPerDevice: "1"
encryptedDevice: "false"
nodes:
- name: "storage-node-01"
devices:
- name: "nvme0n1"
- name: "nvme1n1"
- name: "nvme2n1"
- name: "nvme3n1"
- name: "storage-node-02"
devices:
- name: "nvme0n1"
- name: "nvme1n1"
- name: "nvme2n1"
- name: "nvme3n1"
- name: "storage-node-03"
devices:
- name: "nvme0n1"
- name: "nvme1n1"
- name: "nvme2n1"
- name: "nvme3n1"
resources:
mgr:
limits:
cpu: "2"
memory: "2Gi"
requests:
cpu: "1"
memory: "1Gi"
mon:
limits:
cpu: "2"
memory: "2Gi"
requests:
cpu: "1"
memory: "1Gi"
osd:
limits:
cpu: "4"
memory: "8Gi"
requests:
cpu: "2"
memory: "4Gi"
priorityClassNames:
mon: system-node-critical
osd: system-node-critical
mgr: system-cluster-critical
disruptionManagement:
managePodBudgets: true
osdMaintenanceTimeout: 30
pgHealthCheckTimeout: 0
创建 RBD StorageClass 供虚机使用:
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicapool-vm
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 3
requireSafeReplicaSize: true
parameters:
compression_mode: aggressive
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block-vm
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
clusterID: rook-ceph
pool: replicapool-vm
imageFormat: "2"
imageFeatures: layering,fast-diff,object-map,deep-flatten,exclusive-lock
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate
3、CDI 部署和镜像导入
CDI(Containerized Data Importer)负责虚机镜像的导入和管理。
# Deploy CDI
export CDI_VERSION=v1.58.1
kubectl apply -f https://github.com/kubevirt/containerized-data-importer/releases/download/${CDI_VERSION}/cdi-operator.yaml
kubectl apply -f https://github.com/kubevirt/containerized-data-importer/releases/download/${CDI_VERSION}/cdi-cr.yaml
# Wait for CDI to be ready
kubectl wait --for=condition=available --timeout=300s deployment/cdi-deployment -n cdi
CDI 配置优化:
apiVersion: cdi.kubevirt.io/v1beta1
kind: CDI
metadata:
name: cdi
spec:
config:
uploadProxyURLOverride: "https://cdi-uploadproxy.kubevirt.svc:443"
scratchSpaceStorageClass: "rook-ceph-block-vm"
podResourceRequirements:
limits:
cpu: "4"
memory: "4Gi"
requests:
cpu: "1"
memory: "1Gi"
filesystemOverhead:
global: "0.1"
preallocation: true
honorWaitForFirstConsumer: true
importProxy:
HTTPProxy: ""
HTTPSProxy: ""
noProxy: "*.cluster.local,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
workload:
nodeSelector:
node-role.kubernetes.io/virtualization: "true"
4、网络配置
网络配置是另一个复杂的部分。我们需要虚机能够使用原来 VMware 环境的 VLAN 网络,这就需要用到 Multus 和 OVN。
安装 Multus CNI
# Deploy Multus
kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/v4.0.2/deployments/multus-daemonset-thick.yml
# Verify Multus is running
kubectl get pods -n kube-system -l app=multus
配置 OVN-Kubernetes
我们使用 OVN-Kubernetes 来实现虚机的二层网络。这里贴一下核心配置:
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: vlan100-production
namespace: vm-production
spec:
config: |
{
"cniVersion": "0.3.1",
"name": "vlan100-production",
"type": "ovn-k8s-cni-overlay",
"topology": "localnet",
"netAttachDefName": "vm-production/vlan100-production",
"vlanID": 100,
"mtu": 9000,
"subnets": "10.100.0.0/16"
}
---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: vlan200-database
namespace: vm-production
spec:
config: |
{
"cniVersion": "0.3.1",
"name": "vlan200-database",
"type": "ovn-k8s-cni-overlay",
"topology": "localnet",
"netAttachDefName": "vm-production/vlan200-database",
"vlanID": 200,
"mtu": 9000,
"subnets": "10.200.0.0/16"
}
在 OVS 上配置对应的 bridge mapping:
# On each node, configure OVS bridge mapping
ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="physnet1:br-ex,physnet-vlan100:br-vlan100,physnet-vlan200:br-vlan200"
# Create VLAN bridges
ovs-vsctl add-br br-vlan100
ovs-vsctl add-br br-vlan200
# Add physical interface to bridges
ovs-vsctl add-port br-vlan100 ens2f0.100 tag=100
ovs-vsctl add-port br-vlan200 ens2f0.200 tag=200
三、VMware 虚机迁移实操
这是本文的重点。迁移 2000 台虚机绝对不是简单的导出导入,我们开发了一套完整的迁移工具和流程。
1、迁移工具选型
我们评估了几种迁移方法:
最终我们选择了 virt-v2v + 自研批量脚本 的组合。
2、迁移前准备
VMware 环境信息收集
先收集所有虚机的信息,我写了个脚本来导出:
#!/usr/bin/env python3
# vm_inventory_export.py
# Export VMware VM inventory for migration planning
from pyVim.connect import SmartConnect, Disconnect
from pyVmomi import vim
import ssl
import csv
import argparse
from datetime import datetime
def get_vm_info(vm):
"""Extract detailed VM information"""
summary = vm.summary
config = vm.config
# Get disk information
disks = []
total_disk_size = 0
for device in config.hardware.device:
if isinstance(device, vim.vm.device.VirtualDisk):
disk_size_gb = device.capacityInKB / 1024 / 1024
disks.append({
'label': device.deviceInfo.label,
'size_gb': round(disk_size_gb, 2),
'thin': getattr(device.backing, 'thinProvisioned', False)
})
total_disk_size += disk_size_gb
# Get network information
networks = []
for device in config.hardware.device:
if isinstance(device, vim.vm.device.VirtualEthernetCard):
network_name = ""
if hasattr(device.backing, 'network'):
network_name = device.backing.network.name if device.backing.network else ""
elif hasattr(device.backing, 'port'):
network_name = device.backing.port.portgroupKey
networks.append({
'label': device.deviceInfo.label,
'network': network_name,
'mac': device.macAddress
})
return {
'name': summary.config.name,
'power_state': summary.runtime.powerState,
'cpu': summary.config.numCpu,
'memory_gb': summary.config.memorySizeMB / 1024,
'guest_os': summary.config.guestFullName,
'guest_id': summary.config.guestId,
'vmware_tools': summary.guest.toolsStatus if summary.guest else 'unknown',
'ip_address': summary.guest.ipAddress if summary.guest else '',
'hostname': summary.guest.hostName if summary.guest else '',
'total_disk_gb': round(total_disk_size, 2),
'disks': disks,
'networks': networks,
'folder': get_folder_path(vm),
'resource_pool': vm.resourcePool.name if vm.resourcePool else '',
'cluster': get_cluster_name(vm),
'datastore': summary.config.vmPathName.split()[0].strip('[]'),
'uuid': summary.config.instanceUuid,
'annotation': config.annotation if config.annotation else ''
}
def get_folder_path(vm):
"""Get full folder path of VM"""
path = []
parent = vm.parent
while parent:
if hasattr(parent, 'name'):
path.insert(0, parent.name)
parent = getattr(parent, 'parent', None)
return '/'.join(path)
def get_cluster_name(vm):
"""Get cluster name where VM resides"""
host = vm.runtime.host
if host and host.parent:
return host.parent.name
return ''
def main():
parser = argparse.ArgumentParser(description='Export VMware VM inventory')
parser.add_argument('--host', required=True, help='vCenter hostname')
parser.add_argument('--user', required=True, help='vCenter username')
parser.add_argument('--password', required=True, help='vCenter password')
parser.add_argument('--output', default='vm_inventory.csv', help='Output CSV file')
args = parser.parse_args()
# Disable SSL certificate verification (for lab environments)
context = ssl.create_default_context()
context.check_hostname = False
context.verify_mode = ssl.CERT_NONE
# Connect to vCenter
si = SmartConnect(host=args.host, user=args.user, pwd=args.password, sslContext=context)
try:
content = si.RetrieveContent()
container = content.viewManager.CreateContainerView(
content.rootFolder, [vim.VirtualMachine], True
)
vms = []
for vm in container.view:
try:
vm_info = get_vm_info(vm)
vms.append(vm_info)
print(f"Collected: {vm_info['name']}")
except Exception as e:
print(f"Error collecting {vm.name}: {str(e)}")
container.Destroy()
# Write to CSV
with open(args.output, 'w', newline='', encoding='utf-8') as f:
fieldnames = ['name', 'power_state', 'cpu', 'memory_gb', 'total_disk_gb',
'guest_os', 'guest_id', 'ip_address', 'hostname', 'folder',
'cluster', 'datastore', 'uuid', 'vmware_tools', 'networks',
'disks', 'annotation']
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for vm in vms:
vm['networks'] = str(vm['networks'])
vm['disks'] = str(vm['disks'])
writer.writerow(vm)
print(f"\nExported {len(vms)} VMs to {args.output}")
finally:
Disconnect(si)
if __name__ == '__main__':
main()
运行后生成的 CSV 包含了所有虚机的详细信息,我们据此制定迁移计划。
虚机分类和优先级
根据收集到的信息,我们把虚机分成几类:
# migration_priority.yaml
priority_1_low_risk:
criteria:
- power_state: poweredOff
- no_shared_disks: true
- disk_size: < 100GB
estimated_count: 342
migration_method: batch_offline
priority_2_stateless:
criteria:
- tags: ["web-frontend", "api-server"]
- can_recreate: true
estimated_count: 567
migration_method: recreate_from_template
priority_3_standard:
criteria:
- power_state: poweredOn
- disk_size: 100GB - 500GB
- sla: standard
estimated_count: 845
migration_method: live_migration_with_downtime
priority_4_critical:
criteria:
- tags: ["database", "critical-app"]
- sla: high
estimated_count: 198
migration_method: carefully_planned_window
priority_5_complex:
criteria:
- shared_disks: true
- raw_device_mapping: true
- gpu_passthrough: true
estimated_count: 48
migration_method: manual_with_verification
3、批量迁移脚本
这是我们实际用的迁移脚本核心部分:
#!/bin/bash
# vm_migration.sh
# Batch migration script for VMware to KubeVirt
set -euo pipefail
# Configuration
VCENTER_HOST="vcenter.internal.company.com"
VCENTER_USER="administrator@vsphere.local"
VCENTER_PASSWORD="${VCENTER_PASSWORD:?VCENTER_PASSWORD not set}"
ESXI_HOSTS=("esxi01.internal" "esxi02.internal" "esxi03.internal")
NFS_EXPORT_PATH="/mnt/migration-staging"
K8S_NAMESPACE="vm-production"
STORAGE_CLASS="rook-ceph-block-vm"
PARALLEL_JOBS=4
LOG_DIR="/var/log/vm-migration"
# Color output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
log_info() { echo -e "${GREEN}[INFO]${NC} $(date '+%Y-%m-%d %H:%M:%S') $1"; }
log_warn() { echo -e "${YELLOW}[WARN]${NC} $(date '+%Y-%m-%d %H:%M:%S') $1"; }
log_error() { echo -e "${RED}[ERROR]${NC} $(date '+%Y-%m-%d %H:%M:%S') $1"; }
# Create log directory
mkdir -p "${LOG_DIR}"
# Export VM from VMware using virt-v2v
export_vm() {
local vm_name=$1
local esxi_host=$2
local output_dir="${NFS_EXPORT_PATH}/${vm_name}"
log_info "Starting export of ${vm_name} from ${esxi_host}"
mkdir -p "${output_dir}"
# Run virt-v2v to convert VMware VM to KVM format
virt-v2v -ic "vpx://${VCENTER_USER}@${VCENTER_HOST}/${esxi_host}?no_verify=1" \
"${vm_name}" \
-o local -os "${output_dir}" \
-of qcow2 \
--password-file <(echo "${VCENTER_PASSWORD}") \
2>&1 | tee "${LOG_DIR}/${vm_name}_export.log"
if [ $? -eq 0 ]; then
log_info "Successfully exported ${vm_name}"
return 0
else
log_error "Failed to export ${vm_name}"
return 1
fi
}
# Generate KubeVirt VM manifest
generate_vm_manifest() {
local vm_name=$1
local cpu=$2
local memory_gb=$3
local disk_path=$4
local network_name=$5
local mac_address=$6
local manifest_file="${NFS_EXPORT_PATH}/${vm_name}/${vm_name}-vm.yaml"
cat > "${manifest_file}" << EOF
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: ${vm_name}
namespace: ${K8S_NAMESPACE}
labels:
app: ${vm_name}
migration-source: vmware
migration-date: $(date +%Y-%m-%d)
spec:
running: false
template:
metadata:
labels:
kubevirt.io/vm: ${vm_name}
spec:
nodeSelector:
node-role.kubernetes.io/virtualization: "true"
tolerations:
- key: "virtualization"
operator: "Equal"
value: "true"
effect: "NoSchedule"
domain:
cpu:
cores: ${cpu}
sockets: 1
threads: 1
memory:
guest: ${memory_gb}Gi
resources:
requests:
memory: ${memory_gb}Gi
limits:
memory: $((memory_gb + 1))Gi
devices:
disks:
- name: rootdisk
disk:
bus: virtio
bootOrder: 1
- name: cloudinitdisk
disk:
bus: virtio
interfaces:
- name: default
masquerade: {}
- name: production-net
bridge: {}
macAddress: "${mac_address}"
networkInterfaceMultiqueue: true
rng: {}
machine:
type: q35
features:
acpi: {}
smm:
enabled: true
firmware:
bootloader:
efi:
secureBoot: false
networks:
- name: default
pod: {}
- name: production-net
multus:
networkName: ${network_name}
terminationGracePeriodSeconds: 180
volumes:
- name: rootdisk
dataVolume:
name: ${vm_name}-rootdisk
- name: cloudinitdisk
cloudInitNoCloud:
userData: |
#cloud
-config
preserve_hostname: true
manage_etc_hosts: false
---
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: ${vm_name}-rootdisk
namespace: ${K8S_NAMESPACE}
spec:
source:
upload: {}
pvc:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: $(get_disk_size "${disk_path}")
storageClassName: ${STORAGE_CLASS}
EOF
log_info "Generated manifest: ${manifest_file}"
}
# Get disk size from qcow2 file
get_disk_size() {
local disk_path=$1
local size_bytes=$(qemu-img info --output json "${disk_path}" | jq '.["virtual-size"]')
local size_gb=$((size_bytes / 1024 / 1024 / 1024))
# Add 10% buffer
echo "$((size_gb * 110 / 100))Gi"
}
# Upload disk image to DataVolume
upload_disk_image() {
local vm_name=$1
local disk_path=$2
log_info "Uploading disk image for ${vm_name}"
# Wait for DataVolume to be ready for upload
kubectl wait --for=condition=UploadReady datavolume/${vm_name}-rootdisk \
-n ${K8S_NAMESPACE} --timeout=300s
# Get upload token
local upload_url=$(kubectl get dv ${vm_name}-rootdisk -n ${K8S_NAMESPACE} \
-o jsonpath='{.status.uploadProxy}')
# Upload using virtctl
virtctl image-upload dv ${vm_name}-rootdisk \
--namespace=${K8S_NAMESPACE} \
--image-path="${disk_path}" \
--insecure \
--uploadproxy-url="https://cdi-uploadproxy.cdi.svc:443" \
2>&1 | tee -a "${LOG_DIR}/${vm_name}_upload.log"
if [ $? -eq 0 ]; then
log_info "Successfully uploaded disk for ${vm_name}"
return 0
else
log_error "Failed to upload disk for ${vm_name}"
return 1
fi
}
# Full migration workflow for a single VM
migrate_single_vm() {
local vm_name=$1
local vm_config=$2
# Parse VM configuration
local cpu=$(echo "${vm_config}" | jq -r '.cpu')
local memory=$(echo "${vm_config}" | jq -r '.memory_gb')
local esxi_host=$(echo "${vm_config}" | jq -r '.esxi_host')
local network=$(echo "${vm_config}" | jq -r '.network')
local mac=$(echo "${vm_config}" | jq -r '.mac_address')
log_info "Starting migration of ${vm_name}"
# Step 1: Export from VMware
if ! export_vm "${vm_name}" "${esxi_host}"; then
return 1
fi
# Step 2: Find the converted disk
local disk_path=$(find "${NFS_EXPORT_PATH}/${vm_name}" -name "*.qcow2" | head -1)
if [ -z "${disk_path}" ]; then
log_error "No qcow2 disk found for ${vm_name}"
return 1
fi
# Step 3: Generate manifest
generate_vm_manifest "${vm_name}" "${cpu}" "${memory}" "${disk_path}" "${network}" "${mac}"
# Step 4: Apply manifest
kubectl apply -f "${NFS_EXPORT_PATH}/${vm_name}/${vm_name}-vm.yaml"
# Step 5: Upload disk
if ! upload_disk_image "${vm_name}" "${disk_path}"; then
return 1
fi
# Step 6: Verify VM
kubectl wait --for=condition=Ready vm/${vm_name} -n ${K8S_NAMESPACE} --timeout=600s
log_info "Migration completed for ${vm_name}"
return 0
}
# Batch migration with parallel execution
batch_migrate() {
local vm_list_file=$1
log_info "Starting batch migration from ${vm_list_file}"
# Read VM list and run migrations in parallel
cat "${vm_list_file}" | while read line; do
local vm_name=$(echo "${line}" | jq -r '.name')
echo "${line}" | migrate_single_vm "${vm_name}" &
# Control parallel jobs
while [ $(jobs -r | wc -l) -ge ${PARALLEL_JOBS} ]; do
sleep 10
done
done
# Wait for all jobs to complete
wait
log_info "Batch migration completed"
}
# Main entry point
main() {
case "${1:-}" in
single)
shift
migrate_single_vm "$@"
;;
batch)
shift
batch_migrate "$@"
;;
*)
echo "Usage: $0 {single|batch} [args...]"
exit 1
;;
esac
}
main "$@"
4、Windows 虚机迁移特殊处理
Windows 虚机的迁移比 Linux 复杂得多,主要问题在于驱动。VMware 用的是 VMware Tools 里的 PVSCSI 和 VMXNET3 驱动,迁移到 KubeVirt 后需要换成 VirtIO 驱动。
迁移前需要在 Windows 虚机里安装 VirtIO 驱动:
# Download VirtIO drivers ISO
# https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/stable-virtio/virtio-win.iso
# Mount ISO and install drivers
# Install via Device Manager or use these PowerShell commands:
# Install VirtIO storage driver
pnputil.exe /add-driver E:\vioscsi\w10\amd64\*.inf /install
# Install VirtIO network driver
pnputil.exe /add-driver E:\NetKVM\w10\amd64\*.inf /install
# Install VirtIO balloon driver
pnputil.exe /add-driver E:\Balloon\w10\amd64\*.inf /install
# Install QEMU guest agent
msiexec.exe /i E:\guest-agent\qemu-ga-x86_64.msi /qn
# Verify drivers are installed
Get-WindowsDriver -Online | Where-Object { $_.ProviderName -like "*Red Hat*" }
迁移后的 Windows 虚机配置:
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: windows-server-2022
namespace: vm-production
spec:
running: true
template:
spec:
domain:
clock:
timer:
hpet:
present: false
hyperv: {}
pit:
tickPolicy: delay
rtc:
tickPolicy: catchup
utc: {}
cpu:
cores: 4
sockets: 1
threads: 2
devices:
disks:
- bootOrder: 1
disk:
bus: sata
name: rootdisk
inputs:
- bus: usb
name: tablet
type: tablet
interfaces:
- masquerade: {}
model: e1000e
name: default
tpm: {}
features:
acpi: {}
apic: {}
hyperv:
frequencies: {}
ipi: {}
relaxed: {}
reset: {}
runtime: {}
spinlocks:
spinlocks: 8191
synic: {}
synictimer:
direct: {}
tlbflush: {}
vapic: {}
vpindex: {}
smm:
enabled: true
firmware:
bootloader:
efi:
secureBoot: true
machine:
type: q35
memory:
guest: 8Gi
resources:
requests:
memory: 8Gi
networks:
- name: default
pod: {}
volumes:
- dataVolume:
name: windows-server-2022-rootdisk
name: rootdisk
四、最佳实践和注意事项
1、性能优化
CPU 优化
# Enable CPU pinning for latency-sensitive workloads
spec:
template:
spec:
domain:
cpu:
cores: 4
sockets: 1
threads: 1
dedicatedCpuPlacement: true
isolateEmulatorThread: true
model: host-passthrough
numa:
guestMappingPassthrough: {}
我们实测,开启 CPU 绑定后,数据库虚机的 P99 延迟降低了 40%。
内存优化
# Enable hugepages for memory-intensive workloads
spec:
template:
spec:
domain:
memory:
guest: 32Gi
hugepages:
pageSize: 1Gi
使用大页内存需要在节点上预先配置:
# Configure 1GB hugepages on node
echo 34 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
# Make it persistent
cat >> /etc/sysctl.conf << EOF
vm.nr_hugepages = 34
EOF
# Label nodes with hugepages
kubectl label node node-vm-01 kubevirt.io/hugepages-1Gi=true
存储 IO 优化
# Tune IO for database workloads
spec:
template:
spec:
domain:
devices:
disks:
- name: datadisk
disk:
bus: virtio
io: native
cache: none
dedicatedIOThread: true
对于 SSD 存储,使用 cache: none 配合 io: native 可以获得最佳的随机 IO 性能。我们的 MySQL 虚机在这个配置下,IOPS 达到了 85000(4K 随机读)。
2、高可用配置
虚机反亲和性
spec:
template:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: mysql-cluster
topologyKey: kubernetes.io/hostname
实时迁移配置
# Global migration settings in KubeVirt CR
spec:
configuration:
migrations:
parallelMigrationsPerCluster: 10
parallelOutboundMigrationsPerNode: 4
bandwidthPerMigration: 1Gi
completionTimeoutPerGiB: 800
progressTimeout: 300
allowAutoConverge: true
allowPostCopy: false # Disable post-copy for safety
迁移带宽要根据网络情况调整。我们是 100Gbps 网络,设置 1Gi 带宽可以在不影响生产流量的情况下快速完成迁移。
3、安全加固
启用 SELinux
确保节点上 SELinux 是 enforcing 模式:
# Check SELinux status
getenforce
# Set to enforcing
setenforce 1
# Make permanent
sed -i 's/SELINUX=permissive/SELINUX=enforcing/' /etc/selinux/config
网络策略
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: vm-production-isolation
namespace: vm-production
spec:
podSelector:
matchLabels:
kubevirt.io/domain: database-server
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: backend
ports:
- protocol: TCP
port: 3306
egress:
- to:
- podSelector:
matchLabels:
app: backend
ports:
- protocol: TCP
port: 3306
- to:
- namespaceSelector:
matchLabels:
name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
资源限制
apiVersion: v1
kind: ResourceQuota
metadata:
name: vm-production-quota
namespace: vm-production
spec:
hard:
requests.cpu: "200"
requests.memory: "800Gi"
limits.cpu: "400"
limits.memory: "1Ti"
persistentvolumeclaims: "100"
requests.storage: "50Ti"
4、常见错误和解决方案
问题1:虚机启动失败,报错 "Guest agent is not connected"
原因:QEMU Guest Agent 未安装或服务未启动。
解决:
# Linux
yum install qemu-guest-agent -y
systemctl enable --now qemu-guest-agent
# Windows
# Install qemu-ga from virtio-win ISO
# Start service
sc start QEMU-GA
问题2:网络不通,虚机无法获取 IP
原因:Multus 配置错误或 VLAN 未正确透传。
排查步骤:
# Check Multus pod logs
kubectl logs -n kube-system -l app=multus
# Check network attachment
kubectl get net-attach-def -n vm-production
# Enter virt-launcher pod to check
kubectl exec -it virt-launcher-myvm-xxx -n vm-production -- ip addr
kubectl exec -it virt-launcher-myvm-xxx -n vm-production -- bridge link
问题3:磁盘性能差
原因:使用了错误的缓存策略或 IO 模式。
优化:
# Change from default to optimized settings
devices:
disks:
- name: rootdisk
disk:
bus: virtio
cache: none # Was: writethrough
io: native # Was: threads
问题4:实时迁移失败
常见原因和解决:
# 1. Check source/destination node connectivity
virtctl migrate myvm -n vm-production
# 2. Check migration status
kubectl get vmim -n vm-production
# 3. Common fixes:
# - Increase migration bandwidth
# - Enable allowAutoConverge for busy VMs
# - Check storage is accessible from both nodes
五、故障排查和监控
1、日志查看
# KubeVirt operator logs
kubectl logs -n kubevirt -l kubevirt.io=virt-operator
# virt-controller logs (VM scheduling issues)
kubectl logs -n kubevirt -l kubevirt.io=virt-controller
# virt-handler logs (VM lifecycle on node)
kubectl logs -n kubevirt -l kubevirt.io=virt-handler --all-containers
# virt-launcher logs (specific VM)
kubectl logs -n vm-production virt-launcher-myvm-xxxxx -c compute
# libvirt logs inside virt-launcher
kubectl exec -n vm-production virt-launcher-myvm-xxxxx -- cat /var/log/libvirt/qemu/vm-production_myvm.log
2、常用排查命令
# Check VM status
kubectl get vm,vmi -n vm-production
# Describe VM for events
kubectl describe vm myvm -n vm-production
# Check VMI conditions
kubectl get vmi myvm -n vm-production -o jsonpath='{.status.conditions}'
# Enter VM console
virtctl console myvm -n vm-production
# SSH to VM (if SSH is configured)
virtctl ssh user@myvm -n vm-production
# VNC access
virtctl vnc myvm -n vm-production
# Stop/Start VM
virtctl stop myvm -n vm-production
virtctl start myvm -n vm-production
# Restart VM
virtctl restart myvm -n vm-production
# Force stop (like pulling power cord)
virtctl stop myvm -n vm-production --grace-period=0 --force
3、监控配置
我们使用 Prometheus + Grafana 监控 KubeVirt。
ServiceMonitor 配置:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kubevirt
namespace: monitoring
spec:
namespaceSelector:
matchNames:
- kubevirt
selector:
matchLabels:
prometheus.kubevirt.io: "true"
endpoints:
- port: metrics
interval: 15s
scrapeTimeout: 10s
关键监控指标:
# VM CPU usage
kubevirt_vmi_cpu_system_usage_seconds_total
kubevirt_vmi_cpu_user_usage_seconds_total
# VM memory usage
kubevirt_vmi_memory_available_bytes
kubevirt_vmi_memory_used_bytes
# VM network IO
kubevirt_vmi_network_receive_bytes_total
kubevirt_vmi_network_transmit_bytes_total
# VM disk IO
kubevirt_vmi_storage_read_traffic_bytes_total
kubevirt_vmi_storage_write_traffic_bytes_total
kubevirt_vmi_storage_iops_read_total
kubevirt_vmi_storage_iops_write_total
# Migration metrics
kubevirt_vmi_migration_data_processed_bytes
kubevirt_vmi_migration_data_remaining_bytes
kubevirt_vmi_migration_phase_transition_time_seconds
Grafana Dashboard 配置就不贴了,可以直接导入社区的 Dashboard ID: 11748。
4、备份恢复
使用 KubeVirt 的快照功能进行备份:
# Create snapshot
apiVersion: snapshot.kubevirt.io/v1alpha1
kind: VirtualMachineSnapshot
metadata:
name: myvm-snapshot-20241219
namespace: vm-production
spec:
source:
apiGroup: kubevirt.io
kind: VirtualMachine
name: myvm
---
# Restore from snapshot
apiVersion: snapshot.kubevirt.io/v1alpha1
kind: VirtualMachineRestore
metadata:
name: myvm-restore
namespace: vm-production
spec:
target:
apiGroup: kubevirt.io
kind: VirtualMachine
name: myvm
virtualMachineSnapshotName: myvm-snapshot-20241219
定时备份脚本:
#!/bin/bash
# vm_backup.sh - Automated VM snapshot script
NAMESPACE="vm-production"
RETENTION_DAYS=7
# Create snapshots for all VMs
for vm in $(kubectl get vm -n ${NAMESPACE} -o jsonpath='{.items[*].metadata.name}'); do
snapshot_name="${vm}-snapshot-$(date +%Y%m%d-%H%M%S)"
cat <
六、总结
1、迁移成果
经过 6 个月的努力,我们完成了 2000 台虚机从 VMware 到 KubeVirt 的迁移:
2、经验教训
3、进阶学习方向
4、参考资料
附录
命令速查表
# KubeVirt Management
virtctl start # Start VM
virtctl stop # Stop VM
virtctl restart # Restart VM
virtctl pause # Pause VM
virtctl unpause # Unpause VM
virtctl migrate # Trigger live migration
virtctl console # Serial console access
virtctl vnc # VNC access
virtctl ssh @ # SSH access
virtctl guestfs # Access VM filesystem
# Disk Management
virtctl image-upload dv --image-path= # Upload disk image
virtctl addvolume --volume-name= # Hotplug volume
virtctl removevolume --volume-name= # Hot-unplug volume
# Snapshot
kubectl get vmsnapshot # List snapshots
kubectl get vmrestore # List restores
# Troubleshooting
kubectl get vm,vmi # VM status overview
kubectl describe vmi # VM instance details
kubectl logs virt-launcher--xxx # VM launcher logs
配置参数详解
术语表
来源丨公众号:马哥Linux运维(ID:magedu-Linux)
dbaplus社群欢迎广大技术人员投稿,投稿邮箱:editor@dbaplus.cn
如果字段的最大可能长度超过255字节,那么长度值可能…
只能说作者太用心了,优秀
感谢详解
一般干个7-8年(即30岁左右),能做到年入40w-50w;有…
230721