It returns an active playback that you can then control[1]. How you store the playback instance information in your ARI application is up to you. Right now it would seem as though you are storing it associated with the channel id.
If this doesn’t cover what you are asking, you’re going to need to be more explicit.
I have been trying that approach. And I found the following:
Each Asterisk channel has a read and write direction for media?
When we use ariClient.channels.play({ channelId, media: ‘sound:intro’ }) to play the audio, the channel is occupied for playing the audio recording.
Which means that it is not listening to the user audio during playback as Asterisk temporarily suspends frame forwarding to ARI, until playback has ended.
You can confirm these findings i.e. ARI’s nature during channels.play
And if this is the case, then I think using channels.play for playing an audio recording, I would not be able to produce an interruption in the audio.
Is there any method I can try besides channels.play for playing a recording for my use case i.e., allowing real-time barge-in or interruption by the caller?
Because it seems that while channels.play() is running, the channel stops forwarding inbound RTP frames (caller audio) to ARI.
If you want to have the channel continue to do other things while playing audio to it, you would use a snoop channel[1] to whisper into it and play back a recording on the snoop channel. The original channel would then be free to do as it wishes. If this doesn’t do what you want, then you’ll need to be much more specific and detailed about what you are attempting to achieve instead of bits and pieces.
So I tried using the snoop channel, but the problem is that the audio being played as a recording is interpreted by OpenAI as user input, so it gets interrupted immediately and never finishes playing.
And I apologize about this, I did not read your entire message before. Basically what I’m trying to develop is using OpenAI’s Realtime to play Prerecorded audios using ARI through OpenAI’s function calling. I’ve implemented real-time speech detection (via WebSocket) to detect when the caller starts speaking so I can interrupt ongoing audio playback (similar to “barge-in”).
A Snoop channel whispers audio in the direction you specify when you create it. If the channel you are snooping on is not OpenAI, then you’ve specified the direction incorrectly when creating the snoop channel. Both=Whispered into audio going to the channel and audio coming from the channel. To=Whispered into audio going to the channel. From=Whispered into audio coming from the channel.
Are you using AI or an LLM to write the code to implement this?
That makes sense. I have been primarily using AI as a learning tool for this rather than a coding authority because I’m still getting to know Asterisk’s architecture.