Oracle

Troubleshoot dbms_scheduler by event 27402. bug 17981015 still exist in 19c: scheduler jobs can manual run but will not auto run, only in specific PDB. “Exiting jsksDBObjLock with status 942” at LAST_OBSERVED_EVENT

User reported their dbms_scheduler jobs can be manual run, but will not auto run follow by schedule.

Checked all the scheduler jobs in that particular PDB has not been running for 2 days (UAT environment). And such issue only happened in this one PDB, but not in other PDBs in the same CDB.

Search around MOS doc found one helpful with diagnose details: Scheduler Stuck Executing Jobs And CJQ0 Locking SYS.LAST_OBSERVED_EVENT (Doc ID 2341386.1)

Setup below event 27402 in CDB root container for spooling scheduler debug traces.

alter system set events '27402 trace name context forever, level 65355';

Tested this event can be setup online and it will take effect immediately. It can be easily disable anytime by below command:

alter system set events '27402 trace name context off';

Check the scheduler job coordinator process CJQ0 tracefile path:

SQL> select pname,tracefile from v$process where pname ='CJQ0';
PNAME   TRACEFILE                                                                           
CJQ0     /u01/app/oracle/diag/rdbms/c19_apollo_x6e3/C19APOLL/trace/C19APOLL_cjq0_56770.trc    

tail -f the tracefile you will see large amount of debug logs flooding. So let’s only grab abnormal status (status <> 0)

$ tail -f C19APOLL_cjq0_381904.trc|grep status|grep -v "status 0"
SCHED 12-23 17:28:25.414 1 00 381904 CJQ0 1(jsksDBObjLock_CDB):Exiting jsksDBObjLock with status 942
SCHED 12-23 17:28:25.671 1 00 381904 CJQ0 1(jsksDBObjLock_CDB):Exiting jsksDBObjLock with status 942
...

Let’s focus on those 942 errors:

$ tail -f C19APOLL_cjq0_381904.trc|grep -A6 -B4 "status 942"
--
SCHED 12-23 17:28:25.671 1 00 381904 CJQ0 1(jsksKglGetObj):Exiting with status 0
SCHED 12-23 17:28:25.671 1 00 381904 CJQ0 1(jsksDBObjLock_CDB):KGL handle: 0x1dce069820, Lock: 0x1eb4df33c0, Pin: 0x1eba418400
SCHED 12-23 17:28:25.671 1 00 381904 CJQ0 1(jsksDBObjLock_CDB):Object does not exist
SCHED 12-23 17:28:25.671 1 00 381904 CJQ0 1(jsksDBObjLock_CDB):Releasing lock 0x1eb4df33c0 and pin 0x1eba418400
SCHED 12-23 17:28:25.671 1 00 381904 CJQ0 1(jsksDBObjLock_CDB):Exiting jsksDBObjLock with status 942
*** 2020-12-23T17:28:25.671841+08:00 (CDB$ROOT(1))
SCHED 12-23 17:28:25.671 1 00 381904 CJQ0 1(jsksDBObjLock_CDB):Entering jsksDBObjLock
SCHED 12-23 17:28:25.671 1 00 381904 CJQ0 1(jsksDBObjLock_CDB):Name: SYS.LAST_OBSERVED_EVENT
SCHED 12-23 17:28:25.671 1 00 381904 CJQ0 1(jsksDBObjLock_CDB):Type: 77, NSpace: 51, Mode: 2, Dur: 0
SCHED 12-23 17:28:25.671 1 00 381904 CJQ0 1(jsksDBObjLock_CDB):Calling lock routine

Can see it’s exactly the same symptom as described in MOS Doc ID 2341386.1. The Doc and bug 17981015 and related patches started since 11.2.0.3 in 2013 and only cover up to 12.1, says the issue has been fixed in 12.2. However the issue still happens in our 19.6 version DB in 2020.

$ head -18 C19APOLL_cjq0_381904.trc
Trace file /u01/app/oracle/diag/rdbms/c19_apollo_x6e3/C19APOLL/trace/C19APOLL_cjq0_381904.trc
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.6.0.0.0
Build label:    RDBMS_19.3.0.0.0DBRU_LINUX.X64_190417
ORACLE_HOME:    /u01/app/oracle/product/19.6.0/dbhome_1
System name:	Linux
Node name:	hkx6e3db01.lfinfra.net
Release:	4.1.12-124.26.12.el7uek.x86_64
Version:	#2 SMP Wed May 8 22:25:03 PDT 2019
Machine:	x86_64
Storage:	Exadata
Instance name: C19APOLL
Redo thread mounted by this instance: 1
Oracle process number: 880
Unix process pid: 381904, image: oracle@hkx6e3db01.lfinfra.net (CJQ0)
*** 2020-12-23T17:10:21.323093+08:00 (CDB$ROOT(1))

Solution:

Tested another workaround described in the doc: to flush shared_pool in both CDB and PDB indeed cannot resolve the issue.

Then the only way left is to arrange a restart of the CDB.

Tested it works. Issue fixed for now.

Fired a SR #3-24805039671 : Bug 17981015 occur in 19.6 DB, and will follow up with Oracle support, hopefully can get a solution and a update to this blog post.