Friday, April 22, 2016

Resolving GoldenGate errors using the logdump utility

Resolving GoldenGate errors using the logdump utility

So far we have learned in previous recipes how to use various GoldenGate commands to manage GoldenGate processes and scan the logs. If there is an issue in the replication and the processes have abended, you can always restart them. But if the processes abended due to some data error, then a restart will not resolve the issue and you would then need to drill down further to find the exact cause of the data error. GoldenGate uses a proprietary format to write data to the trail files. Because of this you cannot just open the trail file in an editor and see its contents.
In this recipe we will learn how to use a tool called logdump using which you can view the contents of a trail file. If the replication abends due to a particular record, you can evaluate the values of the failing record and then understand why it is failing. This will help you to decide the further course of action to fix the replication.

Getting ready

In this recipe we will go through a simple replication setup between the tables of a SCOTT schema which is failing due to a missing row in the target table. GoldenGate extract has mined a DELETErecord from the source environment, but when the replicat is trying to apply this to the target environment, it is failing. Because of this missing row in the target environment the replication is unable to continue further.

How to do it...

The steps to resolve this issue and resume the replication are as follows:
  1. The last statements that ran in the source environment were:
    SQL> DELETE SCOTT.EMP WHERE EMPNO=7369;
    
    1 row deleted.
    
    SQL> INSERT INTO SCOTT.EMP VALUES (8800,'ROGER','ANALYST',7934,'23-DEC-1972',2300,0,10);
    
    1 row created.
    
    SQL> COMMIT;
    
  2. The replicat is abended in the target environment. Let's verify its status:
    GGSCI (stdby1-ol6-112.localdomain) 2> status *
    REPLICAT RGGTEST1: ABENDED
    
  3. The error from the replicat report file is:
    2013-05-01 00:41:11  WARNING OGG-01431  Aborted grouped transaction on 'SCOTT.EMP', Mapping error.
    
    2013-05-01 00:41:11  WARNING OGG-01003  Repositioning to rba 7874 in seqno 65.
    
    2013-05-01 00:41:11  WARNING OGG-01151  Error mapping from SCOTT.EMP to SCOTT.EMP.
    
    2013-05-01 00:41:11  WARNING OGG-01003  Repositioning to rba 7874 in seqno 65.
    
    Process Abending : 2013-04-30 22:57:07
    
  4. Further details of the error can be checked from the discard file as follows:
    Oracle GoldenGate Delivery for Oracle process started, group RGGTEST1 discard file opened: 2013-04-30 22:59:57
    
    Key column ENAME (1) is missing from delete on table SCOTT.EMP
    Key column JOB (2) is missing from delete on table SCOTT.EMP
    Key column MGR (3) is missing from delete on table SCOTT.EMP
    Key column HIREDATE (4) is missing from delete on table SCOTT.EMP
    Key column SAL (5) is missing from delete on table SCOTT.EMP
    Key column COMM (6) is missing from delete on table SCOTT.EMP
    Key column DEPTNO (7) is missing from delete on table SCOTT.EMP
    Missing 7 key columns in delete for table SCOTT.EMP.
    Current time: 2013-04-30 23:24:10
    Discarded record from action ABEND on error 0
    
    Aborting transaction on ./dirdat/rt beginning at seqno 63 rba 315017
                             error at seqno 63 rba 315017
    Problem replicating SCOTT.EMP to SCOTT.EMP
    Mapping problem with delete record (target format)...
    *
    EMPNO = 7369
    *
    Process Abending : 2013-04-30 23:24:10
    
  5. Let's check whether this record exists in the target database or not:
    SQL> SELECT * FROM SCOTT.EMP WHERE EMPNO=7369;
    
    no rows selected
    
  6. The current position of the Replicat process is:
    GGSCI (stdby1-ol6-112.localdomain) 2> INFO RGGTEST1
    
    REPLICAT   RGGTEST1  Last Started 2013-05-01 00:39   Status ABENDED
    Checkpoint Lag       00:00:00 (updated 00:00:23 ago)
    Log Read Checkpoint  File ./dirdat/rt000065
                    2013-05-01 00:40:17.764177  RBA 7874
    
  7. Start the logdump utility:
    [ggate@stdby1-ol6-112 ggate]$ cd /u01/app/ggate
    [ggate@stdby1-ol6-112 ggate]$ ./logdump
    
  8. Open the trail file sequence 65:
    Logdump 10 >open ./dirdat/rt000065
    Current LogTrail is /u01/app/ggate/dirdat/rt000063
    
  9. Set up the logdump utility to show the additional useful information:
    Logdump 11 >ghdr on
    Logdump 12 >detail on
    Logdump 13 >detail data
    
  10. Position the logdump to the position of the failing record in the trail file:
    Logdump 16 >pos 7874
    Reading forward from RBA 7874 
    
  11. Now list the failing record:
    Logdump 17 >n
    _________________________________________________________ 
    Hdr-Ind    :     E  (x45)     Partition  :     .  (x04)  
    UndoFlag   :     .  (x00)     BeforeAfter:     B  (x42)  
    RecLength  :    14  (x000e)   IO Time    : 2013/05/01 00:41:06.681.212   
    IOType     :     3  (x03)     OrigNode   :   255  (xff) 
    TransInd   :     .  (x00)     FormatType :     R  (x52) 
    SyskeyLen  :     0  (x00)     Incomplete :     .  (x00) 
    AuditRBA   :        356       AuditPos   : 23233552 
    Continued  :     N  (x00)     RecCount   :     1  (x01) 
    
    2013/05/01 00:41:06.681.212 Delete               Len    14 RBA 7874 
    Name: SCOTT.EMP 
    Before Image:                                             Partition 4   G  b   
     0000 000a 0000 0000 0000 0000 1cc9                | ..............  
    Column     0 (x0000), Len    10 (x000a)  
    
  12. We want to skip this record and find the position of the next one:
    Logdump 164 >n      
    _________________________________________________________Hdr-Ind    :     E  (x45)     Partition  :     .  (x04)  
    UndoFlag   :     .  (x00)     BeforeAfter:     A  (x41)  
    RecLength  :   123  (x007b)   IO Time    : 2013/05/01 00:41:06.681.212   
    IOType     :     5  (x05)     OrigNode   :   255  (xff) 
    TransInd   :     .  (x02)     FormatType :     R  (x52) 
    SyskeyLen  :     0  (x00)     Incomplete :     .  (x00) 
    AuditRBA   :        356       AuditPos   : 23234576 
    Continued  :     N  (x00)     RecCount   :     1  (x01) 
    
    2013/05/01 00:41:06.681.212 Insert               Len   123 RBA 8002 
    Name: SCOTT.EMP 
    After  Image:                                             Partition 4   G  e   
     0000 000a 0000 0000 0000 0000 2260 0001 0009 0000 | ............"`......  
     0005 524f 4745 5200 0200 0b00 0000 0741 4e41 4c59 | ..ROGER........ANALY  
     5354 0003 000a 0000 0000 0000 0000 1efe 0004 0015 | ST..................  
     0000 3139 3732 2d31 322d 3233 3a30 303a 3030 3a30 | ..1972-12-23:00:00:0  
     3000 0500 0a00 0000 0000 0000 0382 7000 0600 0a00 | 0.............p.....  
     0000 0000 0000 0000 0000 0700 0a00 0000 0000 0000 | ....................  
     0000 0a                                           | ...  
    Column     0 (x0000), Len    10 (x000a)  
    Column     1 (x0001), Len     9 (x0009)  
    Column     2 (x0002), Len    11 (x000b)  
    Column     3 (x0003), Len    10 (x000a)  
    Column     4 (x0004), Len    21 (x0015)  
    Column     5 (x0005), Len    10 (x000a)  
    Column     6 (x0006), Len    10 (x000a)  
    Column     7 (x0007), Len    10 (x000a)  
    
  13. Position the replicat to the RBA of this record that is 8002:
    GGSCI (stdby1-ol6-112.localdomain) 1> alter replicat rggtest1, extrba 8002
    REPLICAT altered.
    
  14. Start the replicat:
    GGSCI (stdby1-ol6-112.localdomain) 2> start replicat rggtest1
    
    Sending START request to MANAGER ...
    REPLICAT RGGTEST1 starting
    
  15. Verify the newly inserted record in the target environment:
    SQL> SELECT * FROM EMP WHERE EMPNO=8800;
    
    EMPNO ENAME  JOB    MGR HIREDATE    SAL  COMM     DEPTNO
    ---------- ------  ---- --------- ------ ---- -----------
    8800 ROGER ANALYST  7934 23-DEC-72  2300  0       10
    

How it works...

This is a very simple example of a typical data mismatch issue in a GoldenGate environment. A record was deleted from the EMP table in the source database and this record was extracted by the GoldenGate processes. When the Replicat process tried to apply it to the target environment, it failed as the record does not exist in the target environment.
While troubleshooting such issues, the first thing that one should check is why is the data different in the target database. There could be something else modifying the data in the target database. Once you have identified the source of the second modification and taken care of that, you need to fix this data anamoly and continue the replication. For this we use a GoldenGate tool called logdump. Using logdump you can scan through the trail files to see their contents and you can move through various records in the trail file.
In the earlier example, we first check the status of the Replicat process and verify that it is abended. Since the process is abended, we check the reason from the report/discard file. From steps 3 and 4, it is quite clear that the process abended as it could not apply the delete record. Next we note the current sequence number and RBA of the Replicat process. We will then use logdump to dump the contents of this trail file and this RBA. In steps 7 and 8, we start the logdump and open this trail file. By default, logdump does not display the header record when you view a data record. We instruct logdump in step 9 using ghdr on and detail the data command to output the metadata information as well. After this we move the current pointer in the trail file to where the Replicat process was failing and then we dump the next record using the n command. This is the failing record and we want to skip this, so we run then command again to move to the next record. You can see in step 12 that this is the Insert record that was executed in the source database after this delete. This is the point from which we want the Replicat process to continue. So we alter the Replicat process to start from this RBA and resume it. Once the process is resumed it continues working fine. In step 15, we verify that the Insert record was applied to the target database by checking the Insert record from the database.

There's more...

Logdump is a very handy tool when it comes to troubleshooting various data issues in a GoldenGate replication. There are a few other important commands in the logdump which we will discuss here:

COUNT

With the count command you can count the number of records in a GoldenGate trail file:
Logdump 166 >count
LogTrail /u01/app/ggate/dirdat/rt000065 has 33 records 
Total Data Bytes              5363 
  Avg Bytes/Record             162 
Delete                           1 
Insert                           1 
FieldComp                       28 
RestartOK                        2 
Others                           1 
Before Images                    1 
After Images                    31 

Average of 6 Transactions 
    Bytes/Trans .....       1157 
    Records/Trans ...          5 
    Files/Trans .....          1

SCAN FOR TIMESTAMP

Scan for timestamp is useful when you want to find a record that was modified at a particular timestamp in a trail file:
Logdump 167 >sfts 2013-05-01 00:33:00
Scan for timestamp >= 2013/04/30 23:33:00.000.000 GMT 

2013/05/01 00:35:14.710.374 FieldComp            Len   121 RBA 1920 
Name: SCOTT.EMP 
After  Image:                                             Partition 4   G  b   
 0000 000a 0000 0000 0000 0000 1cc9 0001 0009 0000 | ....................  
 0005 534d 4954 4800 0200 0900 0000 0543 4c45 524b | ..SMITH........CLERK  
 0003 000a 0000 0000 0000 0000 1ede 0004 0015 0000 | ....................  
 3139 3830 2d31 322d 3137 3a30 303a 3030 3a30 3000 | 1980-12-17:00:00:00.  
 0500 0a00 0000 0000 0000 0138 8000 0600 0aff ff00 | ...........8........  
 0000 0000 0000 0000 0700 0a00 0000 0000 0000 0000 | ....................  
 14                                                | .  

FILTER ON SCN

If you know the SCN from the source database, you can scan the trail file for the SCN using the logdump filter on csn command as follows:
Logdump 210 >filter on csn 11910911
Logdump 211 >n

2013/05/01 00:11:46.633.724 FileHeader           Len  1786 RBA 0 
Name: *FileHeader* 
 3000 047d 3000 0008 4747 0d0a 544c 0a0d 3100 0002 | 0..}0...GG..TL..1...  
 0003 3200 0004 2000 0000 3300 0008 02f2 01cf 94d4 | ..2... ...3.........  
 4bfc 3400 0035 0033 7572 693a 7072 696d 312d 6f6c | K.4..5.3uri:prim1-ol  
 362d 3131 323a 6c6f 6361 6c64 6f6d 6169 6e3a 3a75 | 6-112:localdomain::u  
 3031 3a61 7070 3a67 6761 7465 3a31 3132 3130 3135 | 01:app:ggate:1121015  
 0002 bb35 0000 3500 3375 7269 3a70 7269 6d31 2d6f | ...5..5.3uri:prim1-o  
 6c36 2d31 3132 3a6c 6f63 616c 646f 6d61 696e 3a3a | l6-112:localdomain::     

See also

No comments:

Post a Comment