[Voiceglue] Voiceglue hang - debugging so far

Doug Campbell voiceglue at campbellcastle.com
Tue Jul 7 12:52:59 UTC 2009


Your work is wonderful!

Yes, you have uncovered and diagnosed a race condition
within voiceglue.

I will fix this and the Asterisk call termination bug
in the next release, coming soon.

Doug Campbell


James Green wrote:
> As always, things are more complicated than they first look.
> 
> So we followed the idea that the r=0 loop within the IPC read() was
> being caused by something disconnecting the remote end's socket.
> Interestingly sometimes we see a write() failure on the socket with
> errno 32 (broken pipe) supporting the theory.
> 
> We added logging anywhere that sockets / fds were being created or
> closed by voiceglue, Scom or VXI (via SWI). This had to include
> microsecond timings to follow the sequence of events truly.
> 
> Of specific interest we noticed the following:
> 1. Within the main VG getevents() loop:
> 2. A call would cease and VG calls IPC unregister which close()s the
> VXML FD (say 19). This we verified as close returns 0.
> 3. SWIsocket.cpp created a new FD 0.01s later and gets the now closed FD
> (19)
> 4. Scom sends a stop on the PERL_VXML_callid handle which VG now
> processes and close(19) it within fh_stopped. At this point the call is
> closed down including file descriptors however another thread's VXI
> obviously has the FD number (19) registered itself.
> 5. Another incoming is processed, Scom registers the FD (19) again. IPC
> gets a new FD (24).
> 6. VG finishes getevents() and starts again
> 7. SWI then closes the FD (19) created in step 3. The VXI instance
> created in part 5 is now read() looping on r=0 on FD 24.
> 8. Scom notes that the FD (19) is now closed and stops the pipe.
> 9. VG attempts to send_bytes to the call createfd in step 5 but write()
> on FD 19 returns that the FD is not open for writing.
> 
> So, should we be close()ing the IPC FD both within the unregister AND
> fh_stopped? There's a clearly a race condition here.
> 
> Second, after a write() comes with with a broken pipe, we return -1 but
> clearly a read() follows which doesn't sound right.
> 
> Third, within fh_stopped(), just before we close() a VXML IPC FD. At
> this point have we not unregistered ourselves at the IPC layer? I ask
> because a dup() of the FD succeeds and the unregister code clearly
> closes the FD.
> 
> It is worth noting that we repeated the same test shortly after the
> above, and exactly the same problem occurred. Given enough simultaneous
> calls and sufficient time, the race condition is repeatable.
> 
> We continue the debugging but we're confident that if the above isn't
> always the precise cause it's clearly not helping matters either... All
> evidence collected so far points to corruption of the FD maps through
> race conditions between VG/Scom and IPC/VXI and VXI/SWI.
> 
> We're going to remove the close() within the IPC unregister on Monday on
> our test kit (there being no need to close() twice and as VG opens the
> FD it might as well close it exclusively too) and hope everything does
> not blow up. Would appreciate your thoughts on that one!
> 
> Look forward to hearing from you :-)
> 
> James
> 
> -----Original Message-----
> From: voiceglue-bounces at voiceglue.org
> [mailto:voiceglue-bounces at voiceglue.org] On Behalf Of Doug Campbell
> Sent: 01 July 2009 15:11
> To: General discussion about voiceglue
> Subject: Re: [Voiceglue] Voiceglue hang - debugging so far
> 
>> Current theory is that Scom is not catching a closed file descriptor 
>> which is then re-allocated.
> 
> That's great.  If true, I should be able to replicate it.
> 
>> How's your debugging coming along, Doug?
> 
> I am currently held hostage by a 15-hour-per-day assignment.
> Hopefully I will get a reprieve soon.
> 
> Doug Campbell
> _______________________________________________
> Voiceglue mailing list
> Voiceglue at voiceglue.org
> http://www.voiceglue.org/mailman/listinfo/voiceglue
> 
> Checked by AVG - www.avg.com
> Version: 8.5.375 / Virus Database: 270.13.1/2212 - Release Date:
> 07/01/09 05:53:00
> 
> Checked by AVG - www.avg.com 
> Version: 8.5.375 / Virus Database: 270.13.2/2215 - Release Date:
> 07/03/09 05:53:00
> 
> 
> ------------------------------------------------------------------------
> 
> 
> No virus found in this outgoing message.
> Checked by AVG - www.avg.com 
> Version: 8.5.375 / Virus Database: 270.13.3/2216 - Release Date: 07/03/09 05:53:00
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Voiceglue mailing list
> Voiceglue at voiceglue.org
> http://www.voiceglue.org/mailman/listinfo/voiceglue



More information about the Voiceglue mailing list