Hi,
When a caller calls another one, is it possible to preprocess the caller’s and callee’s audio streams so they are “fed” to a transcribing ASR (vosk or whisper) and a TTS so that the latter resulting audio will replace both the caller’s and callee’s original audios?
Maybe with Stasis() but any ideas / pointers?
Thanks
UPDATE: the transcription and TTS would need to be on-going, a bit like Record() in an infinite loop with silence detection. On silence → run ASR + TTS and playback.