Recovery after killing a large transaction

Summary
After killing a large running transaction, either by killing the process or aborting the database, the database may appear to hang and/or the SMON background process is taking all the available CPU. Basically, SMON is in the progress of rolling back this terminated large transaction, which can be time-consuming. As a result of this users often try to restart the database, but because a shutdown immediate will hang, the database is often aborted. Please note that stopping the database will actually make things worse and will not decrease the amount of work SMON needs to perform to complete the rollback.

The message ‘Waiting for SMON to disable tx recovery’ will be posted in the alert log as well.

First we need to determine the progress SMON is making. It is very possible the SMON process to work with one rollback segment. You can find it using the following query:

SELECT r.NAME “RB Segment Name”, dba_seg.size_mb,
DECODE(TRUNC(SYSDATE – LOGON_TIME), 0, NULL, TRUNC(SYSDATE – LOGON_TIME) || ‘ Days’ || ‘ + ‘) ||
TO_CHAR(TO_DATE(TRUNC(MOD(SYSDATE-LOGON_TIME,1) * 86400), ‘SSSSS’), ‘HH24:MI:SS’) LOGON,
v$session.SID, v$session.SERIAL#, p.SPID, v$session.process,
v$session.USERNAME, v$session.STATUS, v$session.OSUSER, v$session.MACHINE,
v$session.PROGRAM, v$session.module, action
FROM v$lock l, v$process p, v$rollname r, v$session,
(SELECT segment_name, ROUND(bytes/(1024*1024),2) size_mb FROM dba_segments
WHERE segment_type = ‘TYPE2 UNDO’ ORDER BY bytes DESC) dba_seg
WHERE l.SID = p.pid(+) AND
v$session.SID = l.SID AND
TRUNC (l.id1(+)/65536)=r.usn AND
l.TYPE(+) = ‘TX’ AND
l.lmode(+) = 6
AND r.NAME = dba_seg.segment_name
–AND v$session.username = ‘SYSTEM’
–AND status = ‘INACTIVE’
ORDER BY size_mb DESC;

Output

RB Segment Name SIZE_MB LOGON SID SERIAL# SPID PROCESS USERNAME STATUS OSUSER MACHINE PROGRAM MODULE ACTION
—————————— ———- ——————————————————– ———- ———- ———————— ———————— —————————— ——– —————————— —————————————————————- ———————————————— —————————————————————- —————————————————————-
_SYSSMU23_3928487448$ 44559.25 21:18:01 567 41203 1234 TEST KILLED rajdmin ****0123 JDBC Thin Client JDBC Thin Client

After killing the shadow process or aborting the database the v$transaction entry is lost, so you cannot estimate by examining v$transaction.used_ublk how the rollback procedure proceeds.

The following queries are available to monitor the progress of the transaction recovery:

SQL> alter session set NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI:SS';

SQL> SELECT usn, state, undoblockstotal “Total”, undoblocksdone “Done”,
undoblockstotal-undoblocksdone “ToDo”,
DECODE(cputime,0,’unknown’,SYSDATE+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400))
“Finish at”
FROM v$fast_start_transactions;

USN STATE                 Total       Done       ToDo Finish at
---------- ---------------- ---------- ---------- ---------- --------------------
        88 RECOVERED             36591      36591          0 17-NOV-2008 17:02:44
        36 RECOVERING              464        100        364 10-DEC-2008 03:32:06
        71 RECOVERING              309        193        116 20-NOV-2008 19:19:23
         8 RECOVERING          2350785     189363    2161422 18-NOV-2008 00:51:51

I prefer to use this.

SELECT TO_CHAR(s.sid)||’,’||TO_CHAR(s.serial#) sid_serial,
NVL(s.username, ‘None’) orauser,
s.program,
r.name undoseg,
t.used_ublk * TO_NUMBER(x.value)/1024||’K’ “Undo”
FROM sys.v_$rollname r,
sys.v_$session s,
sys.v_$transaction t,
sys.v_$parameter x
WHERE s.taddr = t.addr
AND r.usn = t.xidusn(+)
AND x.name = ‘db_block_size’;

SID_SERIAL ORAUSER PROGRAM UNDOSEG Undo
——————————————————————————— —————————— ———————————————— —————————— —————————————–
567,41203 TEST JDBC Thin Client _SYSSMU23_3928487448$ 17346752K

Run the above query several times in a row, this will give you a good idea on how SMON is progressing.

– In some versions, the CPU time does not work (always 0), hence the estimated completion time will not be displayed
– In some cases, the v$fast_start_transactions view will not work. If this is the case then you can query the internal data dictionary
view x$ktuxe, the ‘ktuxesiz’ column represents the remaining number of undo blocks required for rollback:

Leave a comment