I see you have added a mixer to "Output - Active Audio Mixer". I get the box checked but I do not see what it really does.
The main problem is to sink the voice with the picture. I believe the answer is Jack.
JACK is a low-latency audio server, written for any operating system that is reasonably POSIX compliant. It currently exists for Linux, OS X, Solaris, FreeBSD and Windows. It can connect several client applications to an audio device, and allow them to share audio with each other. Clients can run as separate processes like normal applications, or within the JACK server as "plugins".
LinuxMint 9 - 64, Kernel 2.6.32-21-generic
Intel i7 - 930
Seattle, WA USA