oracle data guard Failover切换(官方文档方案)
步骤1:将主库中所有未发送的归档日志和redo日志刷新到目标备库。
如果可以mount主库,则可以将主库中任何未发送的存档日志和current redo日志刷新到备库。
如果此操作成功,即使主库没有使用零数据丢失数据保护模式,也可以进行零数据丢失故障转移。
确保Redo Apply在目标备库中处于活动状态。
mount,但不要open主库。
如果无法mount主库,请转到步骤2。
在主库中执行以下SQL语句:
SQL> ALTER SYSTEM FLUSH REDO TO ‘target_db_name’;
(注:target_db_name备库的db_unique_name,需要用引号引起。)
此SQL语句将所有未发送的redo日志从主库刷新到备库,并等待该redo应用到备库。
如果此语句完成时没有错误,请转到步骤5。
如果此语句完成时出现错误,或者由于无法再等待语句完成而必须停止该语句,请继续执行步骤2。
步骤2:验证备库是否有每个主库redo线程最新的存档日志文件。
查询备库上的VKaTeX parse error: Expected \’EOF\’, got \’#\’ at position 62: …T UNIQUE THREAD#̲ AS THREAD, MAX…ARCHIVED_LOG;
从主库复制归档日志到备库,并注册
SQL> ALTER DATABASE REGISTER PHYSICAL LOGFILE ‘归档日志’;
步骤3:检查,并解决存在的归档日志GAP。
查询备库上的V$ARCHIVE_GAP视图,以检查备库上是否存在归档日志GAP。
SQL> SELECT THREAD#, LOW_SEQUENCE#, HIGH_SEQUENCE# FROM V$ARCHIVE_GAP;
如果存在归档日志GAP,则从主库复制归档日志到备库,并注册
SQL> ALTER DATABASE REGISTER PHYSICAL LOGFILE ‘归档日志’;
步骤4:重复步骤3,直到备库上所有的归档日志GAP都消除。
在解决归档日志GAP后,必须重复查询,直到返回no rows selected。(no rows selected,表示没有归档日志GAP。)
如果在执行步骤2~步骤4之后,无法解决归档日志GAP(例如,因为您无权访问失败的主数据库的系统),则在failover切换期间将发生某些数据丢失。
步骤5:停止redo日志应用。
在备库中执行以下SQL语句:
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
步骤6:完成应用所有接收到的redo日志数据。
在备库中执行以下SQL语句:
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH;
如果此语句完成时没有任何错误,请继续执行步骤7。
如果发生错误,则是备库没有应用某些接收到的redo数据。在继续下一步骤之前,请尝试解决错误并重新执行该语句。
请注意,如果在步骤3和步骤4中存在没有解决的归档日志GAP问题,您将收到一个错误,提示存在归档日志GAP。
如果该错误无法解决,仍可以通过在备库上执行以下SQL语句来进行failover故障转移(但是会丢失一些数据):
SQL> ALTER DATABASE ACTIVATE PHYSICAL STANDBY DATABASE;
当ACTIVATE语句完成时,请继续执行步骤9。
步骤7:验证备库是否已准备好成为主库。
查询备库上V$DATABASE视图中的SWITCHOVER_STATUS。
SQL> SELECT SWITCHOVER_STATUS FROM V$DATABASE;
如果SWITCHOVER_STATUS的值为TO PRIMARY或SESSIONS ACTIVE,表示备库已准备好切换到主角色。
如果这两个值都没有返回,请验证备库中Redo Apply是否处于活动状态,并继续查询该视图,直到返回TO PRIMARY或SESSIONS ACTIVE。
步骤8:将备库切换到主角色。
在备库中执行以下SQL语句:
SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN;
注:如果在步骤7中执行的SWITCHOVER_STATUS查询返回值为TO PRIMARY,则可以省略switchover语句中的WITH SESSION SHUTDOWN。
步骤9:打开新的主数据库。
SQL> ALTER DATABASE OPEN;
步骤10:备份新的主数据库。
对新的主数据库进行完全备份。
————————–以下为在vmware测试步骤———————————-
实践步骤:
1、初始状态
master库:
SYS@master> select database_role,switchover_status,protection_mode from v$database;
DATABASE_ROLE SWITCHOVER_STATUS PROTECTION_MODE
PRIMARY TO STANDBY MAXIMUM PERFORMANCE
初始数据:
SYS@master> select * from wen;
NAME TIME
king 16-MAR-20
slave库:
SYS@slave> select database_role,switchover_status,protection_mode from v$database;
DATABASE_ROLE SWITCHOVER_STATUS PROTECTION_MODE
PHYSICAL STANDBY NOT ALLOWED MAXIMUM PERFORMANCE
初始数据:
SYS@slave> select * from wen;
NAME TIME
king 16-MAR-20
2、制造数据差异
2.1、关闭slave网卡
[root@slave ~]# systemctl stop network
[root@slave ~]# ping master
connect: Network is unreachable
master的日志报错:
Mon Mar 16 02:49:36 2020
ORA-16198: LGWR received timedout error from KSR
LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16198)
LGWR: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Error 16198 for archive log file 3 to ‘slave’
ORA-16198: LGWR received timedout error from KSR
LGWR: Error 16198 disconnecting from destination LOG_ARCHIVE_DEST_2 standby host ‘slave’ —-网络已断开
2.2、master库插入数据,并切redo日志,最后shutdown abort
SYS@master> insert into wen values(‘before-1’,sysdate);
1 row created.
SYS@master> commit;
Commit complete.
SYS@master> alter system switch logfile;
System altered.
SYS@master> insert into wen values(‘before-2’,sysdate);
1 row created.
SYS@master> commit;
Commit complete.
SYS@master> alter system switch logfile;
System altered.
SYS@master> insert into wen values(‘after-1’,sysdate);
1 row created.
SYS@master> commit;
Commit complete.
SYS@master> shu abort;
ORACLE instance shut down.
master库的alter日志:
Mon Mar 16 02:53:08 2020
Shutting down instance (abort)
License high water mark = 9
USER (ospid: 49190): terminating the instance
Instance terminated by USER, pid = 49190
Mon Mar 16 02:53:09 2020
Instance shutdown complete —-master库已shutdown abort
2.3、开启slave的网卡,并确认表 wen 的数据
–开启网卡
[root@slave ~]# systemctl start network
[root@slave ~]# ping master
PING master (192.168.189.39) 56(84) bytes of data.
64 bytes from master (192.168.189.39): icmp_seq=1 ttl=64 time=1.24 ms
64 bytes from master (192.168.189.39): icmp_seq=2 ttl=64 time=0.664 ms
^C
— master ping statistics —
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.664/0.954/1.245/0.292 ms
–确认表wen的数据
SYS@slave> select * from wen;
NAME TIME
king 16-MAR-20
—-slave库和master库相差数据;
—-相差的三条数据:两条数据在归档日志,一条数据在current redo日志。
3、failover切换,并使用flush功能
3.1、使用flush将未发送的归档日志、redo日志同步到slave库
–master开启mount状态
SYS@master> startup mount;
ORACLE instance started.
Total System Global Area 605450240 bytes
Fixed Size 2255632 bytes
Variable Size 398460144 bytes
Database Buffers 197132288 bytes
Redo Buffers 7602176 bytes
Database mounted.
–使用flush
SYS@master> alter system flush redo to ‘slave’;
System altered.
master的alert日志:
Completed: ALTER DATABASE MOUNT
Mon Mar 16 02:59:57 2020
ALTER SYSTEM FLUSH REDO TO ‘slave’ CONFIRM APPLY
ALTER SYSTEM FLUSH REDO TO slave CONFIRM APPLY [Process Id: 50824] (master)
ARCH: STARTING ARCH PROCESSES
Mon Mar 16 02:59:57 2020
ARC0 started with pid=22, OS id=50860
ARC0: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Flush redo: No wait for non-current ORLs to be archived
Waiting for all FAL entries to be archived…
All FAL entries have been archived.
Waiting for dest_id 2 to become synchronized…
Mon Mar 16 02:59:58 2020
ARC1 started with pid=23, OS id=50863
Mon Mar 16 02:59:58 2020
ARC2 started with pid=24, OS id=50865
Mon Mar 16 02:59:58 2020
ARC3 started with pid=25, OS id=50867
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the ‘no FAL’ ARCH
ARC1: Becoming the ‘no SRL’ ARCH
ARC2: Becoming the heartbeat ARCH
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
ARC0: Standby redo logfile selected for thread 1 sequence 41 for destination LOG_ARCHIVE_DEST_2
Active, synchronized flush redo target has been identified
Managed Real Time Apply recovery running at physical standby ‘LOG_ARCHIVE_DEST_2’
Flush End-Of-Redo Log thread 1 sequence 43 has been fixed
Flush Redo: Primary highest seen SCN set to 0x0.0x3472a9
ARCH: Noswitch archival of thread 1, sequence 43
ARCH: End-Of-Redo Branch archival of thread 1 sequence 43
ARCH: LGWR is scheduled to archive destination LOG_ARCHIVE_DEST_2 after log switch
ARCH: Standby redo logfile selected for thread 1 sequence 43 for destination LOG_ARCHIVE_DEST_2
Flush End-Of-Redo Log thread 1 sequence 43
Archived Log entry 64 added for thread 1 sequence 43 ID 0xb887939c dest 1:
ARCH: Archiving is disabled due to current logfile archival
Primary will wait for slave standby to have applied all redo
Final check for a target standby that has recovered all redo. Check will be made a few times.
LOG_ARCHIVE_DEST_2 is a potential flush redo target
LOG_ARCHIVE_DEST_2 is a potential flush redo target
LOG_ARCHIVE_DEST_2 has also applied all redo from primary
Active, synchronized target has been identified that has applied all the redo from the primary. —-slave库已应用所有redo日志
Flush Redo: Primary redo moved to standby
Mon Mar 16 03:00:58 2020
ARC1: Archiving disabled
slave的alert日志:
Archived Log entry 103 added for thread 1 sequence 40 ID 0xb887939c dest 1:
Mon Mar 16 02:41:32 2020
Media Recovery Log /u01/app/arch/slave/1_39_1034744850.dbf
Media Recovery Log /u01/app/arch/slave/1_40_1034744850.dbf
Media Recovery Waiting for thread 1 sequence 41 (in transit)
Recovery of Online Redo Log: Thread 1 Group 4 Seq 41 Reading mem 0
Mem# 0: /u01/app/oracle/oradata/slavelog/sla_redo04.log
Mon Mar 16 02:59:59 2020
Killing 1 processes with pids 57050 (idle RFS by thread/sequence) in order to allow the in-transit gap to be queued. Requested by OS process 58635
Mon Mar 16 03:00:01 2020
RFS[4]: Assigned to RFS process 58664
RFS[4]: Selected log 4 for thread 1 sequence 41 dbid -1211978201 branch 1034744850
Mon Mar 16 03:00:01 2020
RFS[5]: Assigned to RFS process 58662
RFS[5]: Opened log for thread 1 sequence 42 dbid -1211978201 branch 1034744850
Archived Log entry 104 added for thread 1 sequence 42 rlc 1034744850 ID 0xb887939c dest 2:
Mon Mar 16 03:00:02 2020
Archived Log entry 105 added for thread 1 sequence 41 ID 0xb887939c dest 1:
Mon Mar 16 03:00:02 2020
Media Recovery Log /u01/app/arch/slave/1_42_1034744850.dbf
Media Recovery Waiting for thread 1 sequence 43
Mon Mar 16 03:00:03 2020
RFS[6]: Assigned to RFS process 58667
RFS[6]: Selected log 4 for thread 1 sequence 43 dbid -1211978201 branch 1034744850
Mon Mar 16 03:00:03 2020
Archived Log entry 106 added for thread 1 sequence 43 ID 0xb887939c dest 1:
Mon Mar 16 03:00:03 2020
Standby switchover readiness check: Checking whether recoveryapplied all redo…
Database not available for switchover
End-Of-REDO archived log file has not been recovered
Incomplete recovery SCN:0:3417156 archive SCN:0:3437225
Physical Standby did not apply all the redo from the primary.
Media Recovery Log /u01/app/arch/slave/1_43_1034744850.dbf
Identified End-Of-Redo (move redo) for thread 1 sequence 43 at SCN 0x0.3472a9
Resetting standby activation ID 3095892892 (0xb887939c)
Media Recovery Waiting for thread 1 sequence 44
Mon Mar 16 03:00:04 2020
Standby switchover readiness check: Checking whether recoveryapplied all redo…
Physical Standby applied all the redo from the primary. —-slave库已应用所有redo日志
3.2、slave库停止redo日志应用
SYS@slave> alter database recover managed standby database cancel;
Database altered.
3.3、slave库完成所有redo日志应用
SYS@slave> alter database recover managed standby database finish;
Database altered.
3.4、验证slave库是否已准备好成为主库
SYS@slave> select switchover_status from v$database;
SWITCHOVER_STATUS
TO PRIMARY
3.5、将slave库切换到主角色
SYS@slave> alter database commit to switchover to primary;
Database altered.
3.6、打开新的主数据库
SYS@slave> alter database open;
Database altered.
4、确认状态及数据
–确认状态
SYS@slave> select database_role,switchover_status,protection_mode from v$database;
DATABASE_ROLE SWITCHOVER_STATUS PROTECTION_MODE
PRIMARY FAILED DESTINATION MAXIMUM PERFORMANCE
–确认数据
SYS@slave> select * from wen;
NAME TIME
king 16-MAR-20
before-1 16-MAR-20
before-2 16-MAR-20
after-1 16-MAR-20
–使用flush功能的failover切换完成–
总结:
1、官方文档对failover切换操作描述很细致,能应对日常的数据库维护。
2、缺少对主库中redo数据的操作描述,(虽然有flush功能,但主库需要mount)
3、后续将讨论在failover切换时,如何正确处理主库中的redo数据。
谨记:心存敬畏,行有所止。