After a bit of research, my best guess is that VTO and VTH are tems used to refer to the outside and the inside parts of a door entry intercom (“Entryphone”) in a block of flats etc.
Something is very wrong, but I think you will need to do a wireshark capture and work out where things are going wrong. About the only thing I can think of that would produce anything like this is if you have two audio streams going to the same device, e.g. a direct media feed and a music on hold feed, from the PBX. You’d really need lots of detailed logging to see if this is happening and understanding why.