Resolving GoldenGate errors using the logdump utility
So far we have learned in previous recipes how to use various GoldenGate commands to manage GoldenGate processes and scan the logs. If there is an issue in the replication and the processes have abended, you can always restart them. But if the processes abended due to some data error, then a restart will not resolve the issue and you would then need to drill down further to find the exact cause of the data error. GoldenGate uses a proprietary format to write data to the trail files. Because of this you cannot just open the trail file in an editor and see its contents.
In this recipe we will learn how to use a tool called logdump using which you can view the contents of a trail file. If the replication abends due to a particular record, you can evaluate the values of the failing record and then understand why it is failing. This will help you to decide the further course of action to fix the replication.
In this recipe we will go through a simple replication setup between the tables of a
SCOTT
schema which is failing due to a missing row in the target table. GoldenGate extract has mined a DELETE
record from the source environment, but when the replicat is trying to apply this to the target environment, it is failing. Because of this missing row in the target environment the replication is unable to continue further.
The steps to resolve this issue and resume the replication are as follows:
- The last statements that ran in the source environment were:
SQL> DELETE SCOTT.EMP WHERE EMPNO=7369; 1 row deleted. SQL> INSERT INTO SCOTT.EMP VALUES (8800,'ROGER','ANALYST',7934,'23-DEC-1972',2300,0,10); 1 row created. SQL> COMMIT;
- The replicat is abended in the target environment. Let's verify its status:
GGSCI (stdby1-ol6-112.localdomain) 2> status * REPLICAT RGGTEST1: ABENDED
- The error from the replicat report file is:
2013-05-01 00:41:11 WARNING OGG-01431 Aborted grouped transaction on 'SCOTT.EMP', Mapping error. 2013-05-01 00:41:11 WARNING OGG-01003 Repositioning to rba 7874 in seqno 65. 2013-05-01 00:41:11 WARNING OGG-01151 Error mapping from SCOTT.EMP to SCOTT.EMP. 2013-05-01 00:41:11 WARNING OGG-01003 Repositioning to rba 7874 in seqno 65. Process Abending : 2013-04-30 22:57:07
- Further details of the error can be checked from the discard file as follows:
Oracle GoldenGate Delivery for Oracle process started, group RGGTEST1 discard file opened: 2013-04-30 22:59:57 Key column ENAME (1) is missing from delete on table SCOTT.EMP Key column JOB (2) is missing from delete on table SCOTT.EMP Key column MGR (3) is missing from delete on table SCOTT.EMP Key column HIREDATE (4) is missing from delete on table SCOTT.EMP Key column SAL (5) is missing from delete on table SCOTT.EMP Key column COMM (6) is missing from delete on table SCOTT.EMP Key column DEPTNO (7) is missing from delete on table SCOTT.EMP Missing 7 key columns in delete for table SCOTT.EMP. Current time: 2013-04-30 23:24:10 Discarded record from action ABEND on error 0 Aborting transaction on ./dirdat/rt beginning at seqno 63 rba 315017 error at seqno 63 rba 315017 Problem replicating SCOTT.EMP to SCOTT.EMP Mapping problem with delete record (target format)... * EMPNO = 7369 * Process Abending : 2013-04-30 23:24:10
- Let's check whether this record exists in the target database or not:
SQL> SELECT * FROM SCOTT.EMP WHERE EMPNO=7369; no rows selected
- The current position of the Replicat process is:
GGSCI (stdby1-ol6-112.localdomain) 2> INFO RGGTEST1 REPLICAT RGGTEST1 Last Started 2013-05-01 00:39 Status ABENDED Checkpoint Lag 00:00:00 (updated 00:00:23 ago) Log Read Checkpoint File ./dirdat/rt000065 2013-05-01 00:40:17.764177 RBA 7874
- Start the logdump utility:
[ggate@stdby1-ol6-112 ggate]$ cd /u01/app/ggate [ggate@stdby1-ol6-112 ggate]$ ./logdump
- Open the trail file sequence 65:
Logdump 10 >open ./dirdat/rt000065 Current LogTrail is /u01/app/ggate/dirdat/rt000063
- Set up the logdump utility to show the additional useful information:
Logdump 11 >ghdr on Logdump 12 >detail on Logdump 13 >detail data
- Position the logdump to the position of the failing record in the trail file:
Logdump 16 >pos 7874 Reading forward from RBA 7874
- Now list the failing record:
Logdump 17 >n _________________________________________________________ Hdr-Ind : E (x45) Partition : . (x04) UndoFlag : . (x00) BeforeAfter: B (x42) RecLength : 14 (x000e) IO Time : 2013/05/01 00:41:06.681.212 IOType : 3 (x03) OrigNode : 255 (xff) TransInd : . (x00) FormatType : R (x52) SyskeyLen : 0 (x00) Incomplete : . (x00) AuditRBA : 356 AuditPos : 23233552 Continued : N (x00) RecCount : 1 (x01) 2013/05/01 00:41:06.681.212 Delete Len 14 RBA 7874 Name: SCOTT.EMP Before Image: Partition 4 G b 0000 000a 0000 0000 0000 0000 1cc9 | .............. Column 0 (x0000), Len 10 (x000a)
- We want to skip this record and find the position of the next one:
Logdump 164 >n _________________________________________________________Hdr-Ind : E (x45) Partition : . (x04) UndoFlag : . (x00) BeforeAfter: A (x41) RecLength : 123 (x007b) IO Time : 2013/05/01 00:41:06.681.212 IOType : 5 (x05) OrigNode : 255 (xff) TransInd : . (x02) FormatType : R (x52) SyskeyLen : 0 (x00) Incomplete : . (x00) AuditRBA : 356 AuditPos : 23234576 Continued : N (x00) RecCount : 1 (x01) 2013/05/01 00:41:06.681.212 Insert Len 123 RBA 8002 Name: SCOTT.EMP After Image: Partition 4 G e 0000 000a 0000 0000 0000 0000 2260 0001 0009 0000 | ............"`...... 0005 524f 4745 5200 0200 0b00 0000 0741 4e41 4c59 | ..ROGER........ANALY 5354 0003 000a 0000 0000 0000 0000 1efe 0004 0015 | ST.................. 0000 3139 3732 2d31 322d 3233 3a30 303a 3030 3a30 | ..1972-12-23:00:00:0 3000 0500 0a00 0000 0000 0000 0382 7000 0600 0a00 | 0.............p..... 0000 0000 0000 0000 0000 0700 0a00 0000 0000 0000 | .................... 0000 0a | ... Column 0 (x0000), Len 10 (x000a) Column 1 (x0001), Len 9 (x0009) Column 2 (x0002), Len 11 (x000b) Column 3 (x0003), Len 10 (x000a) Column 4 (x0004), Len 21 (x0015) Column 5 (x0005), Len 10 (x000a) Column 6 (x0006), Len 10 (x000a) Column 7 (x0007), Len 10 (x000a)
- Position the replicat to the RBA of this record that is 8002:
GGSCI (stdby1-ol6-112.localdomain) 1> alter replicat rggtest1, extrba 8002 REPLICAT altered.
- Start the replicat:
GGSCI (stdby1-ol6-112.localdomain) 2> start replicat rggtest1 Sending START request to MANAGER ... REPLICAT RGGTEST1 starting
- Verify the newly inserted record in the target environment:
SQL> SELECT * FROM EMP WHERE EMPNO=8800; EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ---------- ------ ---- --------- ------ ---- ----------- 8800 ROGER ANALYST 7934 23-DEC-72 2300 0 10
This is a very simple example of a typical data mismatch issue in a GoldenGate environment. A record was deleted from the
EMP
table in the source database and this record was extracted by the GoldenGate processes. When the Replicat process tried to apply it to the target environment, it failed as the record does not exist in the target environment.
While troubleshooting such issues, the first thing that one should check is why is the data different in the target database. There could be something else modifying the data in the target database. Once you have identified the source of the second modification and taken care of that, you need to fix this data anamoly and continue the replication. For this we use a GoldenGate tool called logdump. Using logdump you can scan through the trail files to see their contents and you can move through various records in the trail file.
In the earlier example, we first check the status of the Replicat process and verify that it is abended. Since the process is abended, we check the reason from the report/discard file. From steps 3 and 4, it is quite clear that the process abended as it could not apply the delete record. Next we note the current sequence number and RBA of the Replicat process. We will then use logdump to dump the contents of this trail file and this RBA. In steps 7 and 8, we start the logdump and open this trail file. By default, logdump does not display the header record when you view a data record. We instruct logdump in step 9 using
ghdr on
and detail the data command to output the metadata information as well. After this we move the current pointer in the trail file to where the Replicat process was failing and then we dump the next record using the n
command. This is the failing record and we want to skip this, so we run then
command again to move to the next record. You can see in step 12 that this is the Insert record that was executed in the source database after this delete. This is the point from which we want the Replicat process to continue. So we alter the Replicat process to start from this RBA and resume it. Once the process is resumed it continues working fine. In step 15, we verify that the Insert record was applied to the target database by checking the Insert record from the database.
Logdump is a very handy tool when it comes to troubleshooting various data issues in a GoldenGate replication. There are a few other important commands in the logdump which we will discuss here:
Logdump 166 >count LogTrail /u01/app/ggate/dirdat/rt000065 has 33 records Total Data Bytes 5363 Avg Bytes/Record 162 Delete 1 Insert 1 FieldComp 28 RestartOK 2 Others 1 Before Images 1 After Images 31 Average of 6 Transactions Bytes/Trans ..... 1157 Records/Trans ... 5 Files/Trans ..... 1
Scan for timestamp is useful when you want to find a record that was modified at a particular timestamp in a trail file:
Logdump 167 >sfts 2013-05-01 00:33:00 Scan for timestamp >= 2013/04/30 23:33:00.000.000 GMT 2013/05/01 00:35:14.710.374 FieldComp Len 121 RBA 1920 Name: SCOTT.EMP After Image: Partition 4 G b 0000 000a 0000 0000 0000 0000 1cc9 0001 0009 0000 | .................... 0005 534d 4954 4800 0200 0900 0000 0543 4c45 524b | ..SMITH........CLERK 0003 000a 0000 0000 0000 0000 1ede 0004 0015 0000 | .................... 3139 3830 2d31 322d 3137 3a30 303a 3030 3a30 3000 | 1980-12-17:00:00:00. 0500 0a00 0000 0000 0000 0138 8000 0600 0aff ff00 | ...........8........ 0000 0000 0000 0000 0700 0a00 0000 0000 0000 0000 | .................... 14 | .
If you know the SCN from the source database, you can scan the trail file for the SCN using the logdump
filter on csn
command as follows:Logdump 210 >filter on csn 11910911 Logdump 211 >n 2013/05/01 00:11:46.633.724 FileHeader Len 1786 RBA 0 Name: *FileHeader* 3000 047d 3000 0008 4747 0d0a 544c 0a0d 3100 0002 | 0..}0...GG..TL..1... 0003 3200 0004 2000 0000 3300 0008 02f2 01cf 94d4 | ..2... ...3......... 4bfc 3400 0035 0033 7572 693a 7072 696d 312d 6f6c | K.4..5.3uri:prim1-ol 362d 3131 323a 6c6f 6361 6c64 6f6d 6169 6e3a 3a75 | 6-112:localdomain::u 3031 3a61 7070 3a67 6761 7465 3a31 3132 3130 3135 | 01:app:ggate:1121015 0002 bb35 0000 3500 3375 7269 3a70 7269 6d31 2d6f | ...5..5.3uri:prim1-o 6c36 2d31 3132 3a6c 6f63 616c 646f 6d61 696e 3a3a | l6-112:localdomain::
No comments:
Post a Comment