头条焦点:问题:Duplicate报错RMAN-03009, ORA-17628, ORA-19505
2023-05-15 06:15:36 来源: 博客园
前面文章提到,这周末帮一个客户测试报错场景:客户通过duplicate生产备库的方式创建cascade备库。发现每次都会遇到两个文件报错,ORA-17628: Oracle error 19505错误,且每一次跑,报错文件不一样。 现在想帮客户验证,这属于是正常现象还是bug。
(资料图片)
- 一、问题具体分析
- 二、解决方案验证
一、问题具体分析
最开始查看相关MOS文档,锁定的是这两篇文章:- ORA-17628, ORA-19505 during RMAN DUPLICATE FROM ACTIVE (Doc ID 1331986.1)
One data file is not using OMF name while the rest of the data files are using OMF name.怀疑是对应数据文件名字的问题,但后来排除了此原因。因为比对报错数据文件和正常的数据文件并无特殊之处,且每次报错文件并不一样。
另外一篇文档是:
- Known issues on ORA-17628: Oracle error 19505 returned by remote Oracle server Reported in Rman Active Duplicate (Doc ID 1401333.1)
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 03/28/2011 19:17:43ORA-17628: Oracle error 19505 returned by remote Oracle server
这篇文档比较全,把已知问题枚举了下,可作为参考。得到的启示是,不但要看duplicate的日志,还要去看auxiliary数据库的alert日志,有可能会有更详细的日志描述。
另外我重新审视,增加关键字,还匹配到更精确的一篇文档:
- Duplicate from Active Database Failing with: RMAN-03009, ORA-17628, and ORA-19505 (Doc ID 1439632.1)
结论是说目标路径有重复文件,这...其实上篇文档也有提到。另外,1401333.1也说了,
- ORA-17628 / ORA-19505 is a very generic error and we need more debugging to get the exact cause .
这个错误其实是很通用的一个错误号,如果已知问题都不能匹配,那么就需要更多的诊断来找到确切的原因。也就是又强调了下,我们还要去看比如auxiliary数据库的alert日志等详细信息。
二、解决方案验证
下面按客户要求,想测试当只有部分几个文件报错,是否可以单独备份过去?这其实算workaround的范畴,我们通常不建议这样来解决一个实际问题。但客户实在是不想反复重跑这个duplicate,因为库太大了,老这么重跑时间上确实有些伤不起...所以本着客户关怀角度,也就协助测试下这个可能性:因为这种问题确实不好直接回答是或否,只能构造场景实际测试下:
首先从一级备库duplicate到二级备库会报错:
RMAN-00571: ===========================================================RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============RMAN-00571: ===========================================================RMAN-03002: failure of Duplicate Db command at 05/14/2023 23:54:15RMAN-05501: aborting duplication of target databaseRMAN-05531: a mounted database cannot be duplicated while datafiles are fuzzy
先改为从主库duplicate到二级备库,重新跑脚本。
RMAN-00571: ===========================================================RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============RMAN-00571: ===========================================================RMAN-03002: failure of Duplicate Db command at 05/14/2023 23:56:57RMAN-05501: aborting duplication of target databaseRMAN-03015: error occurred in stored script Memory ScriptRMAN-03009: failure of backup command on prmy1 channel at 05/14/2023 23:56:57ORA-17628: Oracle error 19505 returned by remote Oracle server
看看!这里有同样的错误号,但是并不是复现问题,因为这个错误号有太多的可能性。此时正好按上面文档习得的经验,去查看二级备库(auxiliary)的alert日志有没有更详细的信息:
Sun May 14 23:56:56 2023Errors in file /u01/app/oracle/diag/rdbms/jycas/jingyu/trace/jingyu_ora_6028.trc:ORA-19505: failed to identify file "/u01/oradata/jingyu/control01.ctl"ORA-27040: file create error, unable to create fileLinux-x86_64 Error: 2: No such file or directoryAdditional information: 1Sun May 14 23:57:41 2023destination database instance is "started" not "mounted"
果然,是因为一个简单的文件创建失败的错误,应该是对应的文件夹不存在,直接创建即可:
[oracle@db11gcas ~]$ cd /u01/oradata/jingyu-bash: cd: /u01/oradata/jingyu: No such file or directory[oracle@db11gcas ~]$ cd /u01/oradata[oracle@db11gcas oradata]$ ls -lddrwxr-xr-x. 2 oracle oinstall 6 May 14 23:49 .[oracle@db11gcas oradata]$ mkdir jingyu[oracle@db11gcas oradata]$ cd /u01/oradata/jingyu[oracle@db11gcas jingyu]$
创建对应的文件夹后再次跑脚本成功。
但是为了构造客户的问题,我这里将成功的文件故意删除掉1个。
...channel prmy1: starting datafile copyinput datafile file number=00004 name=/u01/oradata/jingyu/users01.dbfoutput file name=/u01/oradata/JYCAS/datafile/o1_mf_dbs_i_ji_1r1s5uap_.dbf tag=TAG20230515T000118channel prmy2: datafile copy complete, elapsed time: 00:00:01output file name=/u01/oradata/JYCAS/datafile/o1_mf_t2_1s1s5uap_.dbf tag=TAG20230515T000118channel prmy3: datafile copy complete, elapsed time: 00:00:01output file name=/u01/oradata/JYCAS/datafile/o1_mf_users_1t1s5uaq_.dbf tag=TAG20230515T000118channel prmy1: datafile copy complete, elapsed time: 00:00:01Finished backup at 2023-05-15 00:01:31...
就删除上面这个4号数据文件吧!删除之后肯定数据库是无法open的。
/u01/oradata/JYCAS/datafile/o1_mf_users_1t1s5uaq_.dbf[oracle@db11gcas datafile]$ rm o1_mf_users_1t1s5uaq_.dbfSQL> alter database open;alter database open*ERROR at line 1:ORA-10458: standby database requires recoveryORA-01157: cannot identify/lock data file 4 - see DBWR trace fileORA-01110: data file 4: "/u01/oradata/JYCAS/datafile/o1_mf_users_1t1s5uaq_.dbf"
而且,这个时候开启mrp进程,也是不成功的:
SQL> recover managed standby database disconnect;Media recovery complete.SQL> !ps -ef|grep mrporacle 7079 6943 0 00:15 pts/1 00:00:00 /bin/bash -c ps -ef|grep mrporacle 7081 7079 0 00:15 pts/1 00:00:00 grep mrp
没有成功启动mrp进程,去看alert日志,清楚的告诉你是因为4号文件找不到:
Mon May 15 00:15:04 2023ALTER DATABASE RECOVER managed standby database disconnect Attempt to start background Managed Standby Recovery process (jingyu)Mon May 15 00:15:04 2023MRP0 started with pid=26, OS id=7056 MRP0: Background Managed Standby Recovery process started (jingyu) started logmerger processMon May 15 00:15:09 2023Managed Standby Recovery not using Real Time ApplyMon May 15 00:15:09 2023Errors in file /u01/app/oracle/diag/rdbms/jycas/jingyu/trace/jingyu_dbw0_5520.trc:ORA-01157: cannot identify/lock data file 4 - see DBWR trace fileORA-01110: data file 4: "/u01/oradata/JYCAS/datafile/o1_mf_users_1t1s5uaq_.dbf"ORA-27037: unable to obtain file statusLinux-x86_64 Error: 2: No such file or directoryAdditional information: 3MRP0: Background Media Recovery terminated with error 1110Errors in file /u01/app/oracle/diag/rdbms/jycas/jingyu/trace/jingyu_pr00_7062.trc:ORA-01110: data file 4: "/u01/oradata/JYCAS/datafile/o1_mf_users_1t1s5uaq_.dbf"ORA-01157: cannot identify/lock data file 4 - see DBWR trace fileORA-01110: data file 4: "/u01/oradata/JYCAS/datafile/o1_mf_users_1t1s5uaq_.dbf"Slave exiting with ORA-1110 exceptionErrors in file /u01/app/oracle/diag/rdbms/jycas/jingyu/trace/jingyu_pr00_7062.trc:ORA-01110: data file 4: "/u01/oradata/JYCAS/datafile/o1_mf_users_1t1s5uaq_.dbf"ORA-01157: cannot identify/lock data file 4 - see DBWR trace fileORA-01110: data file 4: "/u01/oradata/JYCAS/datafile/o1_mf_users_1t1s5uaq_.dbf"Completed: ALTER DATABASE RECOVER managed standby database disconnect Recovery Slave PR00 previously exited with exception 1110MRP0: Background Media Recovery process shutdown (jingyu)
此时尝试去一级备库备份这个4号数据文件,然后传输到二级备库:
RMAN> backup datafile 4 format "/tmp/dbfile4.bak"; Starting backup at 2023-05-15 00:14:27using target database control file instead of recovery catalogallocated channel: ORA_DISK_1channel ORA_DISK_1: SID=12 device type=DISKchannel ORA_DISK_1: starting full datafile backup setchannel ORA_DISK_1: specifying datafile(s) in backup setinput datafile file number=00004 name=/u01/oradata/JYADG/datafile/o1_mf_users_0r1s27bm_.dbfchannel ORA_DISK_1: starting piece 1 at 2023-05-15 00:14:27channel ORA_DISK_1: finished piece 1 at 2023-05-15 00:14:28piece handle=/tmp/dbfile4.bak tag=TAG20230515T001427 comment=NONEchannel ORA_DISK_1: backup set complete, elapsed time: 00:00:01Finished backup at 2023-05-15 00:14:28[oracle@db11gadg ~]$ scp /tmp/dbfile4.bak 192.168.1.20:/tmporacle@192.168.1.20"s password: dbfile4.bak 100% 1344KB 80.2MB/s 00:00 [oracle@db11gadg ~]$
然后在二级备库恢复这个缺失的4号数据文件:
RMAN> list backup of datafile 4;using target database control file instead of recovery catalogspecification does not match any backup in the repositoryRMAN> catalog start with "/tmp/dbfile4.bak";searching for all files that match the pattern /tmp/dbfile4.bakList of Files Unknown to the Database=====================================File Name: /tmp/dbfile4.bakDo you really want to catalog the above files (enter YES or NO)? yescataloging files...cataloging doneList of Cataloged Files=======================File Name: /tmp/dbfile4.bakRMAN> restore datafile 4;Starting restore at 2023-05-15 00:22:12using channel ORA_DISK_1channel ORA_DISK_1: starting datafile backup set restorechannel ORA_DISK_1: specifying datafile(s) to restore from backup setchannel ORA_DISK_1: restoring datafile 00004 to /u01/oradata/JYCAS/datafile/o1_mf_users_1t1s5uaq_.dbfchannel ORA_DISK_1: reading from backup piece /tmp/dbfile4.bakchannel ORA_DISK_1: piece handle=/tmp/dbfile4.bak tag=TAG20230515T001427channel ORA_DISK_1: restored backup piece 1channel ORA_DISK_1: restore complete, elapsed time: 00:00:01Finished restore at 2023-05-15 00:22:14RMAN>
此时查看alert日志,也是正常恢复:
Mon May 15 00:22:13 2023Full restore complete of datafile 4 /u01/oradata/JYCAS/datafile/o1_mf_users_l622onyr_.dbf. Elapsed time: 0:00:00 checkpoint is 5941162 last deallocation scn is 3
需要注意下因为使用OMF,恢复的4号数据文件跟之前的名字不一样了!但是不影响。现在再尝试启动mrp进程恢复就OK了。
SQL> select file#, CHECKPOINT_CHANGE#, STATUS from v$datafile; FILE# CHECKPOINT_CHANGE# STATUS---------- ------------------ ------- 1 5941162 SYSTEM 2 5941162 ONLINE 3 5941162 ONLINE 4 5941162 ONLINE 5 5941162 ONLINE 6 5941162 ONLINE 7 5541098 ONLINE 8 5541108 ONLINE8 rows selected.SQL> recover managed standby database disconnect;Media recovery complete.SQL> !ps -ef|grep mrporacle 7687 1 0 00:26 ? 00:00:00 ora_mrp0_jingyuoracle 7713 7606 0 00:27 pts/1 00:00:00 /bin/bash -c ps -ef|grep mrporacle 7715 7713 0 00:27 pts/1 00:00:00 grep mrpSQL> select file#, CHECKPOINT_CHANGE#, STATUS from v$datafile; FILE# CHECKPOINT_CHANGE# STATUS---------- ------------------ ------- 1 5942030 SYSTEM 2 5942030 ONLINE 3 5942030 ONLINE 4 5942030 ONLINE 5 5942030 ONLINE 6 5942030 ONLINE 7 5541098 ONLINE 8 5541108 ONLINE8 rows selected.SQL> @dgNAME VALUE UNIT TIME_COMPUTED DATUM_TIME---------------------- ---------------------- ------------------------------ ------------------------------ ------------------------------transport lag +00 00:17:22 day(2) to second(0) interval 05/15/2023 00:28:48 05/15/2023 00:27:53apply lag day(2) to second(0) interval 05/15/2023 00:28:48apply finish time day(2) to second(3) interval 05/15/2023 00:28:48estimated startup time 9 second 05/15/2023 00:28:48
此时主库切换一下日志,再次查询级联备库状态:
SQL> @dgNAME VALUE UNIT TIME_COMPUTED DATUM_TIME---------------------- ---------------------- ------------------------------ ------------------------------ ------------------------------transport lag +00 00:00:00 day(2) to second(0) interval 05/15/2023 00:29:14 05/15/2023 00:29:05apply lag +00 00:00:00 day(2) to second(0) interval 05/15/2023 00:29:14 05/15/2023 00:29:05apply finish time day(2) to second(3) interval 05/15/2023 00:29:14estimated startup time 9 second 05/15/2023 00:29:14SQL> select file#, CHECKPOINT_CHANGE#, STATUS from v$datafile; FILE# CHECKPOINT_CHANGE# STATUS---------- ------------------ ------- 1 5943460 SYSTEM 2 5943460 ONLINE 3 5943460 ONLINE 4 5943460 ONLINE 5 5943460 ONLINE 6 5943460 ONLINE 7 5541098 ONLINE 8 5541108 ONLINE8 rows selected.SQL> @dgNAME VALUE UNIT TIME_COMPUTED DATUM_TIME---------------------- ---------------------- ------------------------------ ------------------------------ ------------------------------transport lag +00 00:00:48 day(2) to second(0) interval 05/15/2023 00:29:56 05/15/2023 00:29:54apply lag +00 00:00:48 day(2) to second(0) interval 05/15/2023 00:29:56 05/15/2023 00:29:54apply finish time day(2) to second(3) interval 05/15/2023 00:29:56estimated startup time 9 second 05/15/2023 00:29:56
说明这个恢复缺失数据文件再开启mrp进程的方案可行。但出于从根本解决问题的初衷,我们还是会建议找到具体原因重新跑的方式来恢复。
标签:
[责任编辑:]
猜你喜欢
- (2023-05-15)中国春节预制菜行业数据分析:56.0%消费者购买预制年菜原因是美味可口|当前速读
- (2023-05-15)全球快播:刑法修正案八全文逐条精解_刑法修正案八全文
- (2023-05-15)AI孙燕姿攻占华语乐坛? 天天快消息
- (2023-05-14)讯息:激战2生活技能学什么好(神佑之城工艺制作台附近的背景布置)
- (2023-05-14)阿森纳本赛季英超35场已取得25场胜利 速看
- (2023-05-14)负债400亿!高99.9米,破3项纪录,中国最壕烂尾楼,终于动工了
- (2023-05-14)唯品金融逾期_唯品金融 每日报道