Oracle Goldengate高可用测试

梁铭图 2020-03-12 11:54:00

作者介绍

梁铭图,新炬网络首席架构师,十多年数据库运维、数据库设计、数据治理以及系统规划建设经验,拥有Oracle OCM、Togaf企业架构师(鉴定级)、IBM CATE等认证,曾获dbaplus年度MVP以及华为云MVP等荣誉,并参与数据资产管理国家标准的编写工作。在数据库运维管理和架构设计、运维体系规划、数据资产管理方面有深入研究。

 

一、 Oracle GoldenGate介绍

 


GoldenGate软件是一种基于日志的结构化数据复制软件。GoldenGate 能够实现大量交易数据的实时捕捉、变换和投递,实现源数据库与目标数据库的数据同步,保持亚秒级的数据延迟。

 

在我们的客户里,GoldenGate使用场景非常广泛,从数据迁移割接、容灾应急系统构建、数据同步到大数据平台实时数据流的建设,都会看到GoldenGate的身影。但是随着应用的广泛,其可用性的问题也随之而来,如何保障GoldenGate数据同步的持续有效,建设GoldenGate同可用成为要解决的问题。

 

二、 GoldenGate高可用测试

 

在Oracle RAC集群中,是通过集群软件统一管理多实例及其他资源的方式实现Oracle数据库的高可用。OGG的高可用原理也与此类似,只不过同时运行的OGG进程只有一套。在当前运行的节点发生故障时,可以自动切换到其他节点。

 

利用acfs共享文件系统配置ogg,使用集群中的每个节点都能访问到ogg目录,并未ogg在集群中配置vip地址,保证pump进程与目标主机正常通信。最后将OGG作为资源注册到Oracle集群中,使用集群软件管理OGG。这样在一个节点发生故障时,就能通过集群的relocate resource功能自动切换到其他节点。

 

测试环境

源端: rhel 5.5 + oracle 11.2.0.3 rac

目标端:rhel 5.5 + oracle 11.2.0.3 单机

ogg版本:11.2.1.0.1

 

搭建步骤

前期的数据库准备以及ogg详细安装步骤在此文不与讨论,下面进入正式ogg高可用配置环节。

 

步骤1:建立acfs共享文件系统

 

使用grid用户,asmca命令进入asm配置助手

 

创建卷组:

 


 

创建集群文件系统

 

 

创建成功之后,检查在集群两个节点都能看到新建的共享文件系统

 

[root@yyh1 app]# ls -l /oracle/app/ggs

total 64

drwx------ 2 root root 65536 Mar 30 17:44 lost+found

[root@yyh2 ~]# ls -l /oracle/app/ggs

total 64

drwx------ 2 root root 65536 Mar 30 17:44 lost+found

 

步骤2:安装ogg
 

过程略

 

步骤3:源端添加ogg vip

 

$crsctl stat res -p | grep SUBNET

USR_ORA_SUBNET=10.10.10.0

# /oracle/app/11.2.0/grid/bin/appvipcfg create -network=1 -ip=10.10.10.106 -vipname=oggvip -user=root

Production Copyright 2007, 2008, Oracle.All rights reserved

2018-03-30 19:22:58: Creating Resource Type

2018-03-30 19:22:58: Executing /oracle/app/11.2.0/grid/bin/crsctl add type app.appvip_net1.type -basetype ora.cluster_vip_net1.type -file /oracle/app/11.2.0/grid/crs/template/appvip.type

2018-03-30 19:22:58: Executing cmd: /oracle/app/11.2.0/grid/bin/crsctl add type app.appvip_net1.type -basetype ora.cluster_vip_net1.type -file /oracle/app/11.2.0/grid/crs/template/appvip.type

2018-03-30 19:22:58: Create the Resource

2018-03-30 19:22:58: Executing /oracle/app/11.2.0/grid/bin/crsctl add resource oggvip -type app.appvip_net1.type -attr "USR_ORA_VIP=10.10.10.106,START_DEPENDENCIES=hard(ora.net1.network) pullup(ora.net1.network),STOP_DEPENDENCIES=hard(ora.net1.network),ACL='owner:root:rwx,pgrp:root:r-x,other::r--,user:root:r-x',HOSTING_MEMBERS=yyh1,APPSVIP_FAILBACK="

2018-03-30 19:22:58: Executing cmd: /oracle/app/11.2.0/grid/bin/crsctl add resource oggvip -type app.appvip_net1.type -attr "USR_ORA_VIP=10.10.10.106,START_DEPENDENCIES=hard(ora.net1.network) pullup(ora.net1.network),STOP_DEPENDENCIES=hard(ora.net1.network),ACL='owner:root:rwx,pgrp:root:r-x,other::r--,user:root:r-x',HOSTING_MEMBERS=yyh1,APPSVIP_FAILBACK="

可以使用下面命令修改资源属性:

 

crsctl modify resource ggsvip -attr"RESTART_ATTEMPTS=3,START_TIMEOUT=300,STOP_TIMEOUT=300, CHECK_INTERVAL=10"

给grid用户授权

 
# /oracle/app/11.2.0/grid/bin/crsctl setperm resource oggvip -u user:grid:r-x

在集群中查看新添加的资源

 

$crsctl stat res -t

--------------------------------------------------------------------------------

NAME TARGET STATE SERVER STATE_DETAILS

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ora.DATADG.dg

ONLINE ONLINE yyh1

ONLINE ONLINE yyh2

ora.LISTENER.lsnr

ONLINE ONLINE yyh1

ONLINE ONLINE yyh2

ora.LISTENER_1522.lsnr

ONLINE ONLINE yyh1

ONLINE ONLINE yyh2

ora.OCR_VOTE.dg

ONLINE ONLINE yyh1

ONLINE ONLINE yyh2

ora.asm

ONLINE ONLINE yyh1 Started

ONLINE ONLINE yyh2 Started

ora.gsd

OFFLINE OFFLINE yyh1

OFFLINE OFFLINE yyh2

ora.net1.network

ONLINE ONLINE yyh1

ONLINE ONLINE yyh2

ora.ons

ONLINE ONLINE yyh1

ONLINE ONLINE yyh2

ora.registry.acfs

ONLINE ONLINE yyh1

ONLINE ONLINE yyh2

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

oggvip

1 OFFLINE OFFLINE

ora.LISTENER_SCAN1.lsnr

1 ONLINE ONLINE yyh1

ora.cvu

1 ONLINE ONLINE yyh2

ora.oc4j

1 ONLINE ONLINE yyh2

ora.scan1.vip

1 ONLINE ONLINE yyh1

ora.yyh.db

1 ONLINE ONLINE yyh1 Open

2 ONLINE ONLINE yyh2 Open

ora.yyh.yyh1.svc

1 ONLINE ONLINE yyh1

ora.yyh1.vip

1 ONLINE ONLINE yyh1

ora.yyh2.vip

1 ONLINE ONLINE yyh2

查看资源的状态

 

$crsctl status resource oggvip

NAME=oggvip

TYPE=app.appvip_net1.type

TARGET=OFFLINE

STATE=OFFLINE

启动资源

 

$crsctl start resource oggvip

CRS-2672: Attempting to start 'oggvip' on 'yyh2'

CRS-2676: Start of 'oggvip' on 'yyh2' succeeded

再次查看资源状态

 

# /oracle/app/11.2.0/grid/bin/crsctl status resource oggvip

NAME=oggvip

TYPE=app.appvip_net1.type

TARGET=ONLINE

STATE=ONLINE on yyh2

在集群资源中可以看到新添加的资源已经启动

 

$crsctl stat res -t

--------------------------------------------------------------------------------

NAME TARGET STATE SERVER STATE_DETAILS

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ora.DATADG.dg

ONLINE ONLINE yyh1

ONLINE ONLINE yyh2

ora.LISTENER.lsnr

ONLINE ONLINE yyh1

ONLINE ONLINE yyh2

ora.LISTENER_1522.lsnr

ONLINE ONLINE yyh1

ONLINE ONLINE yyh2

ora.OCR_VOTE.dg

ONLINE ONLINE yyh1

ONLINE ONLINE yyh2

ora.asm

ONLINE ONLINE yyh1 Started

ONLINE ONLINE yyh2 Started

ora.gsd

OFFLINE OFFLINE yyh1

OFFLINE OFFLINE yyh2

ora.net1.network

ONLINE ONLINE yyh1

ONLINE ONLINE yyh2

ora.ons

ONLINE ONLINE yyh1

ONLINE ONLINE yyh2

ora.registry.acfs

ONLINE ONLINE yyh1

ONLINE ONLINE yyh2

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

oggvip

1 ONLINE ONLINE yyh2

ora.LISTENER_SCAN1.lsnr

1 ONLINE ONLINE yyh1

ora.cvu

1 ONLINE ONLINE yyh2

ora.oc4j

1 ONLINE ONLINE yyh2

ora.scan1.vip

1 ONLINE ONLINE yyh1

ora.yyh.db

1 ONLINE ONLINE yyh1 Open

2 ONLINE ONLINE yyh2 Open

ora.yyh.yyh1.svc

1 ONLINE ONLINE yyh1

ora.yyh1.vip

1 ONLINE ONLINE yyh1

ora.yyh2.vip

1 ONLINE ONLINE yyh2

步骤4:创建代理ACTION脚本

Oracle集群通过agent代理运行与资源相关的命令。通过创建代理ACTION脚本,用于后续集群的调用。该脚本包含start,stop,check,clean,abort几个函数,可以被grid infrastructure调用,这个脚本建议放到acfs共享文件系统中。

 

cat > /oracle/app/ggs/11gr2_ogg_action.scr

#!/bin/sh

#set the Oracle Goldengate installation directory

export OGG_HOME=/oracle/app/ggs

#set the oracle home to the database to ensure GoldenGate will get the

#right environment settings to be able to connect to the database

export ORACLE_HOME=/oracle/app/oracle/product/11.2.0/db_1

#specify delay after start before checking for successful start

start_delay_secs=5

#Include the GoldenGate home in the library path to start GGSCI

export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/oracle/app/ggs:${LD_LIBRARY_PATH}

#check_process validates that a manager process is running at the PID

#that GoldenGate specifies.

check_process () {

if ( [ -f "${OGG_HOME}/dirpcs/MGR.pcm" ] )

then

pid=`cut -f8 "${OGG_HOME}/dirpcs/MGR.pcm"`

if [ ${pid} = `ps -e |grep ${pid} |grep mgr |cut -d " " -f2` ]

then

#manager process is running on the PID exit success

exit 0

else

if [ ${pid} = `ps -e |grep ${pid} |grep mgr |cut -d " " -f1` ]

then

#manager process is running on the PID exit success

exit 0

else

#manager process is not running on the PID

exit 1

fi

fi

else

#manager is not running because there is no PID file

exit 1

fi

}

#call_ggsci is a generic routine that executes a ggsci command

call_ggsci () {

ggsci_command=$1

ggsci_output=`${OGG_HOME}/ggsci<

${ggsci_command}

exit

EOF`

}

case $1 in

'start')

#start manager

call_ggsci 'start manager'

#there is a small delay between issuing the start manager command

#and the process being spawned on the OS. wait before checking

sleep ${start_delay_secs}

#check whether manager is running and exit accordingly

check_process

;;

'stop')

#attempt a clean stop for all non-manager processes

#call_ggsci 'stop er *'

#ensure everything is stopped

call_ggsci 'stop er *!'

#call_ggsci 'kill er *'

#stop manager without (y/n) confirmation

call_ggsci 'stop manager!'

#exit success

exit 0

;;

'check')

check_process

;;

'clean')

#attempt a clean stop for all non-manager processes

#call_ggsci 'stop er *'

#ensure everything is stopped

#call_ggsci 'stop er *!'

#in case there are lingering processes

call_ggsci 'kill er *'

#stop manager without (y/n) confirmation

call_ggsci 'stop manager!'

#exit success

exit 0

;;

'abort')

#ensure everything is stopped

call_ggsci 'stop er *!'

#in case there are lingering processes

call_ggsci 'kill er *'

#stop manager without (y/n) confirmation

call_ggsci 'stop manager!'

#exit success

exit 0

;;

esac

注:脚本只启动manager进程,所以务必在manager进程中配置auto start参数。

赋权:

 

$chmod a+x /oracle/app/ggs/11gr2_ogg_action.scr

$chown grid:oinstall /oracle/app/ggs/11gr2_ogg_action.scr

步骤5:添加oggapp资源

 

$crsctl add resource oggapp -type cluster_resource \

> -attr "ACTION_SCRIPT=/oracle/app/ggs/11gr2_ogg_action.scr, \

> CHECK_INTERVAL=30, START_DEPENDENCIES='hard(oggvip,ora.asm) \

> pullup(oggvip)', STOP_DEPENDENCIES='hard(oggvip)'"

注:START_DEPENDENCIES:启动oggapp时与oggvip、ora.asm设置为强关联关系;

STOP_DEPENDENCIES:停止oggapp时与oggvip设置为强关联关系。
 

如果不使用VIP(源端与目标端在同一主机),则以以下方式注册:

 

$GRID_HOME/bin/crsctl add resource oggapp\

-type cluster_resource\

-attr"ACTION_=/u01/app/11.2.0/grid/crs//11gr2_ogg_action.scr, CHECK_INTERVAL=30, START_DEPENDENCIES='hard(ora.asm)' _TIMEOUT=300"

#/oracle/app/11.2.0/grid/bin/crsctl status resource oggapp

NAME=oggapp

TYPE=cluster_resource

TARGET=OFFLINE

STATE=OFFLINE


步骤6:启动资源
 

启动oggapp resource,检查源端的ogg进程是否成功启动

启动前源端ogg进程状态

 

GGSCI (yyh2) 8> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER STOPPED

EXTRACT STOPPED DP_HR 00:00:00 00:00:10

EXTRACT STOPPED EXT_HR 00:00:00 00:00:15

[grid@yyh2:/home/grid]$crsctl start resource oggapp

CRS-2672: Attempting to start 'oggapp' on 'yyh2'

CRS-2676:Start of'oggapp'on'yyh2'succeeded

启动后源端ogg进程状态

 

GGSCI (yyh2) 18> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING

EXTRACT RUNNING DP_HR 00:00:00 00:55:59

EXTRACT RUNNING EXT_HR 00:00:01 00:00:03

 

步骤7:管理OGG资源
 

要手工切换OGG到其他节点上,可使用带force选项的relocate resource命令。可以在集群中任何节点上以grid用户身份运行此命令。
 

1)执行前查看资源目前运行的节点:

 

$crsctl status resource oggapp

NAME=oggapp

TYPE=cluster_resource

TARGET=ONLINE

STATE=ONLINE on yyh2

2)执行relocate操作:

 

# crsctl relocate resource oggapp -f

CRS-2673:Attempting to stop'oggapp'on'yyh2'

CRS-2677:Stop of'oggapp'on'yyh2'succeeded

CRS-2673:Attempting to stop'oggvip'on'yyh2'

CRS-2677:Stop of'oggvip'on'yyh2'succeeded

CRS-2672:Attempting to start'oggvip'on'yyh1'

CRS-2676:Start of'oggvip'on'yyh1'succeeded

CRS-2672:Attempting to start'oggapp'on'yyh1'

CRS-2676:Start of'oggapp'on'yyh1'succeeded

3)在节点1查看ogg状态:

 

 

GGSCI (yyh1) 41> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING

EXTRACT RUNNING DP_HR 00:00:00 00:00:18

EXTRACT RUNNING EXT_HR 00:00:01 00:00:03

 

三、 故障模拟

 

ogg运行在1节点,模拟故障关闭1节点主机,然后观察ogg是否会自动切换到2节点。

 

关闭1节点主机,查看oggapp状态:

 

 

# crsctl status resource oggapp

NAME=oggapp

TYPE=cluster_resource

TARGET=ONLINE

STATE=ONLINE on yyh2

 

发现已经自动relocate至2节点。进入ggsci命令行查看进程状态:

 

GGSCI (yyh2) 12> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING

EXTRACT RUNNING DP_HR 00:00:00 00:06:51

EXTRACT RUNNING EXT_HR 00:00:02 00:00:00

 

测试完成。

最新评论
访客 2023年08月20日

230721

访客 2023年08月16日

1、导入Mongo Monitor监控工具表结构(mongo_monitor…

访客 2023年08月04日

上面提到: 在问题描述的架构图中我们可以看到,Click…

访客 2023年07月19日

PMM不香吗?

访客 2023年06月20日

如今看都很棒

活动预告