Resource Manager – CPU allocation math – Part 3

This is the last post of this mini-series regarding CPU allocation in Resource Manager. The idea behind this last post is very simple: Tracing the same test case we’ve used before and analyze trace files. This will let us understand how Oracle instrumentation works when DBRM is active and managing the CPU.
Please note that we are going to trace for only one service, that is perfectly enough for our testing.

Changing our cpu_alloc_burn.sql for tracing using 10046 event with the prefix for our traces ‘DBRM_TRACE’:

SET TERMOUT OFF
alter session set tracefile_identifier='DBRM_TRACE';
alter session set events '10046 trace name context forever, level 12';
select distinct t1.N2 from t1, t2
where t1.N1t2.N2
and t1.N3t2.N1
and t1.N2  t2.N1
and t2.N2 is not null;
[oracle@phoenix resource_manager]$  ./run_adhoc.sh
Starting 20 new executions for S_ADHOC service with tracing...

Now we have 20 new sessions connected to the service name S_ADHOC and consumer group ADHOC_QUERYS. The first thing that we will notice before digging into trace files is the wait event resmgr:cpu quantum:


      SID STATUS   RESOURCE_CONSUMER_GROUP	     SERVICE_NA EVENT
---------- -------- -------------------------------- ---------- ------------------------------
	22 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum
	24 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum
	26 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum
	28 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum
	29 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum
	32 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum
	34 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum
	35 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum
	38 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum
       134 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum
       136 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum

       SID STATUS   RESOURCE_CONSUMER_GROUP	     SERVICE_NA EVENT
---------- -------- -------------------------------- ---------- ------------------------------
       143 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum
       148 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum
       150 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum
       151 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum
       152 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum
       156 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum
       157 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum
       159 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum
       162 ACTIVE   ADHOC_QUERYS		     S_ADHOC	resmgr:cpu quantum

This wait event basically states that a session exists and is waiting for the allocation of a quantum of CPU. It is basically DBRM doing his job, throttling CPU allocation until it is according the plan directives that we have defined. It is then obvious if you want to reduce the persistence of this wait event (AWR will help you checking that), you have to increase your CPU allocation (your plan directives) to avoid waiting so much on it.
Another (and the best way to do it, since it gives you a lot of information) is to check the trace file that we’ve generated before:

*** 2014-06-13 17:06:39.844
WAIT #140096016814088: nam='resmgr:cpu quantum' ela= 807849 location=2 consumer group id=88620  =0 obj#=88623 tim=1402675599844408
WAIT #140096016814088: nam='Disk file operations I/O' ela= 5589 FileOperation=2 fileno=0 filetype=15 obj#=88623 tim=1402675599854817

*** 2014-06-13 17:06:40.778
WAIT #140096016814088: nam='resmgr:cpu quantum' ela= 821271 location=3 consumer group id=88620  =0 obj#=88623 tim=1402675600778500

*** 2014-06-13 17:06:41.736
WAIT #140096016814088: nam='resmgr:cpu quantum' ela= 917063 location=3 consumer group id=88620  =0 obj#=88623 tim=1402675601736754

*** 2014-06-13 17:06:42.605
WAIT #140096016814088: nam='resmgr:cpu quantum' ela= 859088 location=3 consumer group id=88620  =0 obj#=88623 tim=1402675602605611

*** 2014-06-13 17:06:43.612
WAIT #140096016814088: nam='resmgr:cpu quantum' ela= 905964 location=3 consumer group id=88620  =0 obj#=88623 tim=1402675603612339
WAIT #140096016814088: nam='direct path read' ela= 1332 file number=4 first dba=16130 block cnt=62 obj#=88623 tim=1402675603682243

Some interesting info here:

ela – Amount time in microseconds that the session spent waiting for a CPU quantum allocation. If we sum everything (all the microseconds) we will have the total time of the session that is “out of CPU”;
consumer group id– The consumer group id, maps with DBA_RSRC_CONSUMER_GROUPS view;
obj# – The object that is part of the wait itself. On our case, it is a table. Maps directly with view DBA_OBJECTS.

Of course if we use tkprof to help us, we can have a more broader picture showing that one of our 20 sessions waited 391,34 seconds during his lifetime and waited for a maximum of 1,10 seconds for a CPU quantum allocation.

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  SQL*Net message from client                     1        0.00          0.00
  cursor: pin S wait on X                         1        0.14          0.14
  resmgr:cpu quantum                            511        1.10        391.34
  Disk file operations I/O                        4        0.00          0.01
  direct path read                              105        0.30          0.96

Conclusions:

– Use math to define correctly your CPU allocation in DBRM plans and be careful with over and under allocations as they impact your database performance.
– Always try to test your DBRM implementation before go live. Sometimes complex plans can be tricky to test and if you can’t measure the impact you can be in trouble. Trial and error is not a problem, when you are not live.
– Understand how DBRM works! DBRM is a complex beast and i hope that this mini-series can help on that.

Advertisements

One thought on “Resource Manager – CPU allocation math – Part 3

  1. Assume you have 6 CPUs and you’re trying to run 20 concurrent sessions without any resource_manager_plan, you will still see wait event “CPU + CPU wait” in the Top activity page. Could you please run below and verify if any of the 20 sessions are running more than 2 minutes though they’re waiting on the CPU?

    declare
    l_date date :=sysdate+2/1440;
    l_num number;
    begin
    loop
    exit when sysdate > l_date;
    l_num := ln(42);
    end loop;
    end;
    /

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s