I am working on a requirement where I need real-time bi-directional translation during a live call. The flow should be:
If the caller speaks in English, it should be translated into Urdu and passed to the callee.
If the callee speaks in Urdu, it should be translated into English and passed to the caller.
I have implemented this using an EAGI script with Google Cloud Speech-to-Text for transcription. The EAGI is invoked after the call is connected, and the dialplan is as follows:
[call_answered_agent]
; ${ARG1} - Spool ID
; ${ARG2} - Unique ID
; ${ARG3} - Exten (dialed number)
; ${ARG4} - Channel Name
exten => s,1,Set(__time_connect=${EPOCH})
same => n,Set(IBDB_ANSWERED(${ARG1})=${ARG2},${time_connect})
same => n,Set(IBDB_ANSWERED2(${ARG1})=${ARG4})
same => n,MixMonitor(${UNIQUEID}.wav)
same => n,EAGI(/usr/ictbroadcast/bin/translate_ict.eagi)
;same => n,GoSub(virtual_queue_log,s,1(${ARG1},${ARG2},${ARG3}))
same => n,Return()
EAGI Script:
The PHP script uses google/cloud-speech to process audio from FD3:
#!/usr/bin/php
<?php require '/usr/ictbroadcast/vendor/autoload.php'; use Google\Cloud\Speech\V1\SpeechClient; use Google\Cloud\Speech\V1\RecognitionConfig; use Google\Cloud\Speech\V1\StreamingRecognitionConfig; use Google\Cloud\Speech\V1\StreamingRecognizeRequest; use Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding; putenv('GOOGLE_APPLICATION_CREDENTIALS=/usr/ictbroadcast/etc/translator_google_key.json'); $speechClient = new SpeechClient(); $config = new RecognitionConfig([ 'encoding' => AudioEncoding::LINEAR16, 'sample_rate_hertz' => 8000, 'language_code' => 'en-US', ]); $streamingConfig = new StreamingRecognitionConfig([ 'config' => $config, 'interim_results' => true, ]); $requests = [ new StreamingRecognizeRequest([ 'streaming_config' => $streamingConfig ]) ]; $stream = fopen("php://fd/3", "rb"); if (!$stream) { fwrite(STDERR, "Failed to open audio stream.\n"); exit(1); } while (!feof($stream)) { $chunk = fread($stream, 320); if (!$chunk) { usleep(100000); continue; } $requests[] = new StreamingRecognizeRequest([ 'audio_content' => $chunk ]); } $responses = $speechClient->streamingRecognize($requests); foreach ($responses as $response) { foreach ($response->getResults() as $result) { if ($result->getIsFinal()) { $transcript = trim($result->getAlternatives()[0]->getTranscript()); file_put_contents('/tmp/file.txt', "[SPOKEN]: $transcript\n", FILE_APPEND); } } } fclose($stream); $speechClient->close(); Problem: When the EAGI script runs, both call legs lose audio — The agent can’t hear the receiver’s voice. The receiver can’t hear the agent’s voice. It seems the audio stream is being consumed by the EAGI script without being passed through to the other side of the call. Question: How can I capture audio for real-time processing in EAGI while keeping the audio flowing between both call participants? Is there a recommended approach in Asterisk for real-time speech translation that doesn’t break the audio path?