The incomplete analysis of Virtual circuit waits

I have a special case here, when a database (11.2.0.3.0 – Linux x86-64) is spending between 75% and 80% of DB Time on virtual circuit wait as you can see in this awr report for 1 day of snapshots. Since DB is using shared servers, the clients connect to one or more dispatchers processes that will create a virtual circuit for each connection. As soon as client sends any data, the dispatcher receives the data and place it into the virtual circuit. These circuit will go to a common queue to be picked up by a shared server that reads the data from the recently created virtual circuit and delivers the work to database. For the opposite operation, when shared server send data to the client, it writes into the virtual circuit and the dispatcher sends the data to the client.
As soon everything finishes, the server releases the virtual circuit back to dispatcher to handle other clients.

Image

These description regarding what is a virtual circuit will help to understand what is “virtual circuit wait” and since 11.2.0.2 (See bug 6653834) it is just how much time the shared server is blocked waiting for a specified circuit or message.

One problem that can occur is the lack of enough dispatchers to handle all client/shared server requests on send/receiving data from and back to client. Let’s look at dispatchers performance:

SQL> column PROTOCOL format a25;
SQL> SELECT NAME "NAME", SUBSTR(NETWORK,1,23) "PROTOCOL", 
OWNED, STATUS "STATUS", (BUSY/(BUSY + IDLE)) * 100 "%TIME BUSY" FROM V$dispatcher;

NAME PROTOCOL OWNED STATUS %TIME BUSY
---- ------------------------- ---------- ---------------- ----------
D000 (ADDRESS=(PROTOCOL=tcp) 184 WAIT .908543578
D001 (ADDRESS=(PROTOCOL=tcp) 183 WAIT .850628352

Or in AWR:

Image

As you can see, dispatcher performance is good, the avg queue wait time is around 0 seconds per queued request. We can also notice that %Idle is 99% on both dispatchers so for now, a dispatcher problem is outside of equation. Following the same line, let’s look at performance of Shared Servers:

Image

Once again, 97.35% of shared servers utilization is in idle mode, so no real problem here.
Now that obvious things are out of the way, let’s check for another topic that might be causing this issue: long inactive shared server connections.

I know that for its nature, the database opens lot of connections and the application leave it  running and idle for an extended period of time:

STATUS SERVER COUNT(1)
-------- --------- ----------
ACTIVE DEDICATED 38
INACTIVE NONE 383
INACTIVE DEDICATED 1

According with ID 601605.1 IDLE/ABANDONED OR INACTIVE sessions for OS processes will not be cleaned up even DCD (Dead Connection Detection) or Database Resource Limits are used, so they need to be cleaned up manually. But in _my case_ as you see all 383 connections released the shared server (SERVER = NONE).

Since i’m not satisfied with some possible conclusions i’ve decided to trace shared servers based on information given in bug 11895446 [Excessive ‘virtual circuit wait’ ]:

SQL> alter system set events='10256 trace name contex forever, level 2';

The trace file regarding one shared server shows the following

[Note: The history is displayed in reverse chronological order]

sample interval: 1 sec, max history 120 sec
---------------------------------------------------
 [30 samples, 11:33:36 - 11:34:05]
 waited for 'virtual circuit wait', seq_num: 1965
 p1: 'circuit#'=0x174
 p2: 'type'=0x2
 p3: ''=0x0
 time_waited: >= 29 sec (still in wait)
 [30 samples, 11:33:06 - 11:33:35]
 waited for 'virtual circuit wait', seq_num: 1963
 p1: 'circuit#'=0x174
 p2: 'type'=0x2
 p3: ''=0x0
 time_waited: 30.009979 sec (sample interval: 29 sec)
 [30 samples, 11:32:36 - 11:33:05]
 waited for 'virtual circuit wait', seq_num: 1961
 p1: 'circuit#'=0x174
 p2: 'type'=0x2
 p3: ''=0x0
 time_waited: 30.009597 sec (sample interval: 29 sec)
 [30 samples, 11:32:06 - 11:32:35]
 waited for 'virtual circuit wait', seq_num: 1959
 p1: 'circuit#'=0x174
 p2: 'type'=0x2
 p3: ''=0x0
 time_waited: 30.009995 sec (sample interval: 29 sec)
---------------------------------------------------
Sampled Session History Summary:
 longest_non_idle_wait: 'virtual circuit wait'
 [30 samples, 11:32:06 - 11:32:35]
 time_waited: 30.009995 sec (sample interval: 29 sec)

Each sample is around 1s and the the collection of samples is 30 which corresponds to a ~30 seconds period. As you can see, all of the wait time is spent on a virtual circuit wait (non idle event). If we look to Session Wait Stack, we can also reach the same conclusion, that most of the time on shared server is spend in virtual circuit event:

Current Wait Stack:
 0: waiting for 'SQL*Net message from client'
 driver id=0x4d545300, #bytes=0x1, =0x0
 wait_id=1299 seq_num=2002 snap_id=100
 wait times: snap=0.011873 sec, exc=0.749202 sec, total=49 min 1 sec
 wait times: max=infinite, heur=49 min 1 sec
 wait counts: calls=0 os=0
 in_wait=1 iflags=0x11a0
Wait State:
 fixed_waits=0 flags=0x23 boundary=(nil)/-1
Session Wait History:
 elapsed time of 0.000000 sec since current wait
 0: waited for 'virtual circuit wait'
 circuit#=0x174, type=0x2, =0x0
 wait_id=1398 seq_num=2001 snap_id=1
 wait times: snap=30.015382 sec, exc=30.015382 sec, total=30.015382 sec
 wait times: max=30.000000 sec
 wait counts: calls=1 os=9
 occurred after 0.000000 sec of elapsed time
 1: waited for 'SQL*Net message from client'
 driver id=0x4d545300, #bytes=0x1, =0x0
 wait_id=1299 seq_num=2000 snap_id=99
 wait times: snap=0.000018 sec, exc=0.737329 sec, total=48 min 31 sec
 wait times: max=infinite
 wait counts: calls=0 os=0
 occurred after 0.000000 sec of elapsed time
 2: waited for 'virtual circuit wait'
 circuit#=0x174, type=0x2, =0x0
 wait_id=1397 seq_num=1999 snap_id=1
 wait times: snap=30.008837 sec, exc=30.008837 sec, total=30.008837 sec
 wait times: max=30.000000 sec
 wait counts: calls=1 os=9
 occurred after 0.000000 sec of elapsed time
 3: waited for 'SQL*Net message from client'
 driver id=0x4d545300, #bytes=0x1, =0x0
 wait_id=1299 seq_num=1998 snap_id=98
 wait times: snap=0.014144 sec, exc=0.737311 sec, total=48 min 1 sec
 wait times: max=infinite
 wait counts: calls=0 os=0
 occurred after 0.000000 sec of elapsed time
 4: waited for 'virtual circuit wait'
 circuit#=0x174, type=0x2, =0x0
 wait_id=1396 seq_num=1997 snap_id=1
 wait times: snap=30.005295 sec, exc=30.005295 sec, total=30.005295 sec
 wait times: max=30.000000 sec
 wait counts: calls=1 os=9
 occurred after 0.000000 sec of elapsed time
 5: waited for 'SQL*Net message from client'
 driver id=0x4d545300, #bytes=0x1, =0x0
 wait_id=1299 seq_num=1996 snap_id=97
 wait times: snap=0.000017 sec, exc=0.723167 sec, total=47 min 31 sec
 wait times: max=infinite
 wait counts: calls=0 os=0
 occurred after 0.000000 sec of elapsed time
 6: waited for 'virtual circuit wait'
 circuit#=0x174, type=0x2, =0x0
 wait_id=1395 seq_num=1995 snap_id=1
 wait times: snap=30.007699 sec, exc=30.007699 sec, total=30.007699 sec
 wait times: max=30.000000 sec
 wait counts: calls=1 os=9
 occurred after 0.000000 sec of elapsed time
 7: waited for 'SQL*Net message from client'
 driver id=0x4d545300, #bytes=0x1, =0x0
 wait_id=1299 seq_num=1994 snap_id=96
 wait times: snap=0.015263 sec, exc=0.723150 sec, total=47 min 1 sec
 wait times: max=infinite
 wait counts: calls=0 os=0
 occurred after 0.000000 sec of elapsed time
 8: waited for 'virtual circuit wait'
 circuit#=0x174, type=0x2, =0x0
 wait_id=1394 seq_num=1993 snap_id=1
 wait times: snap=30.010335 sec, exc=30.010335 sec, total=30.010335 sec
 wait times: max=30.000000 sec
 wait counts: calls=1 os=9
 occurred after 0.000000 sec of elapsed time
 9: waited for 'SQL*Net message from client'
 driver id=0x4d545300, #bytes=0x1, =0x0
 wait_id=1299 seq_num=1992 snap_id=95
 wait times: snap=0.000016 sec, exc=0.707887 sec, total=46 min 31 sec
 wait times: max=infinite
 wait counts: calls=0 os=0
 occurred after 0.000000 sec of elapsed time

So far, all i know is that shared servers are “really waiting” for the availability of a virtual circuit and that means that i have a real impact of my system, probably not in current state, since most of shared servers and dispatchers are idle but these waits will prevent shared servers to work faster waiting for virtual circuits to be available for send/receive data from database. Another conclusion is that i can’t reproduce the results expected on  bug 11895446, since the expected output from 10256 trace is “Shared Server Held – holding instance locks”.  Maybe i’m missing something to reproduce the bug, so i’m expecting MOS help 🙂

Feel free to disagree with my incomplete analysis 🙂

Advertisements

3 thoughts on “The incomplete analysis of Virtual circuit waits

  1. Pingback: Virtual Circuits – The essay part 1 « Luís Marques

  2. I am using shared server for the first time in a production environment, it was set up before I got here. It is routinely abused since it is meant for fast queries, I often see sessions running for hours. We have seen hangs on the application side that show up as virtual circuit waits, which really skews the event. I generally regard it as an idle event as a result. In your case it may not be. We run 20 dispatchers on each instance. We don’t need to run it and would be happy if we could turn it off.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s