[Voiceglue] Voiceglue hang - debugging so far

James Green james.green at mjog.com
Fri Jun 19 16:45:59 UTC 2009


As you will have noted we have been experiencing apparently random hangs
of the voiceglue process, consuming 100% (or next to) CPU time.
 
The suspicion was that each time, the remote end (whom we call) was
hanging up before we were.
 
Test systems:
 
Machine A : Fires off up to 40 calls at a time, each using VoiceXML to
act as an IVR
Machine B : Accepts calls via SIP from Machine A, uses a dial plan to
send dtmf tones following silence between prompts.
 
This effectively replicates an IVR calling many people and asking them
to dial digits in response to questions.
 
Repeatedly fire calls off. Eventually voiceglue will consume 100% CPU.
It may take just a few seconds, sometimes up to give minutes.
 
Voiceglue, for it's part, loops here:
 
libvglue/vglue_ipc.cc Line 167 (the read()):
    /*  Read the data  */
    while ((bytes_read == 0) || (buf[bytes_read-1] != '\n'))
    {
 /*  See if we have exhausted the buffer  */
 if (bytes_read == bufsize)
 {
     /*  Store this buffer contents into msg, and start clean  */
     msg.append (buf, bytes_read);
     bytes_read = 0;
 };
 
 /*  Perform the read to fill up buf  */
 r = read (fd, buf + bytes_read, bufsize - bytes_read);
 if (r == -1)
 {
     if (errno != EINTR)
     {
  printf ("FATAL voiceglue error: thread %d failed reading from fd=%d,
errno=%d\n", (int) myThreadID, fd, errno);
  return ("");
     };
 }
 else
 {
     bytes_read += r;
 };
    };


Why? We logged before and after the read(). When the looping occurs, r
is 0 (EOF).

I have provided our copy of this file with modifications being purely
our own debugging output: http://pastebin.com/m21230283
Please be aware of a debugging error: Line 175 which ends "The message
is to be read." is wrongly prefixed "voiceglue_sendipcmsg" (it should
read "voiceglue_getipcmsg" of course).
 
Now, we are under the assumption that we should never be attempting to
collect data from an EOF file, so theory goes that somehow the thread
has used a file descriptor that should have been removed.
 
Evidence of such an occassion is here: http://pastebin.com/m6124f760
 
You are looking for thread id D4FF9950. We appear to have sent "Play" to
a file descriptor associated with Apache. This is clearly wrong!
 
Should each time voiceglue_unregisteripcfd() the file descriptor be
dropped, or can fds be re-used?

FWIW this is my #1 priority, so if I can provide anything further to
help solve the problem please let me know!

Thanks,

James

Checked by AVG - www.avg.com 
Version: 8.5.375 / Virus Database: 270.12.79/2186 - Release Date:
06/19/09 06:53:00
-------------- next part --------------

No virus found in this outgoing message.
Checked by AVG - www.avg.com 
Version: 8.5.375 / Virus Database: 270.12.80/2187 - Release Date: 06/19/09 06:53:00


More information about the Voiceglue mailing list