I am trying to set up a real-time text-to-speech (TTS) and speech-to-text (STT) interaction in Asterisk using Python. When an extension (e.g., 1001) calls another extension (e.g., 1002), the system should answer the call, play a preset TTS prompt (“Hi, thanks for calling, what can I do for you?”), and then allow for a bidirectional conversation using TTS and STT.
I’ve added the following dial plan in /etc/asterisk/extensions_custom.conf:
[custom-live-tts-stt]
exten => 1002,1,Answer()
same => n,AGI(live_tts_stt.py)
same => n,Hangup() ; End the call after AGI
Here’s my Python script (live_tts_stt.py):
#!/usr/bin/env python3
import os
import sys
import speech_recognition as sr
from gtts import gTTS
from pydub import AudioSegment
from pydub.playback import play
import openai
# Set up OpenAI API key
openai.api_key = 'your-openai-api-key'
# Initialize speech recognizer
recognizer = sr.Recognizer()
def record_audio():
with sr.Microphone() as source:
print("Listening...")
audio = recognizer.listen(source)
return audio
def speech_to_text(audio):
try:
text = recognizer.recognize_google(audio)
print(f"Recognized: {text}")
return text
except sr.UnknownValueError:
print("Could not understand audio")
return ""
except sr.RequestError as e:
print(f"Could not request results; {e}")
return ""
def generate_response(prompt):
response = openai.Completion.create(
engine="text-davinci-003",
prompt=prompt,
max_tokens=150
)
return response.choices[0].text.strip()
def text_to_speech(text):
tts = gTTS(text=text, lang='en')
tts.save("/var/lib/asterisk/agi-bin/response.mp3")
sound = AudioSegment.from_mp3("/var/lib/asterisk/agi-bin/response.mp3")
sound.export("/var/lib/asterisk/agi-bin/response.wav", format="wav")
def main():
# Answer the call
print("ANSWER")
sys.stdout.flush()
while True:
audio = record_audio()
user_text = speech_to_text(audio)
if user_text.lower() in ["exit", "quit", "bye"]:
break
response_text = generate_response(user_text)
text_to_speech(response_text)
print("STREAM FILE /var/lib/asterisk/agi-bin/response \"\"")
sys.stdout.flush()
print("HANGUP")
sys.stdout.flush()
if __name__ == "__main__":
main()
Issue: When I call from extension 1001 to 1002, instead of hearing the TTS prompt, the system seems to pick up and transmit live microphone input (i.e., human voice from the environment) rather than playing the generated TTS audio. The intended TTS response is not played back to the caller. ** What I’ve Tried:**
Verified that the AGI script is being executed.
Checked the audio file paths and confirmed the TTS audio files are being generated and saved as expected.
Ensured that the Asterisk server has access to the Python script and required libraries.
Question: Why is the TTS audio not playing to the caller, and how can I ensure the system plays the TTS response instead of capturing the live microphone input?
Any help or guidance would be greatly appreciated!