Hello, I want to stream both the parties audio separately to a web socket for real time transcription and diarization(speaker labelling). I am able to record the audio separately using monitor for both agent and costumer but i want to steam the audio

Hello, I want to stream both the parties audio separately to a web socket for real time transcription and diarization(speaker labelling). I am able to record the audio separately using monitor for both agent and costumer but i want to steam the audio

@shamnusln . Can you please help me out with this as you have achieved it

Asterisk is not able to do the job directly, as far as I know. You would most likely need a Stasis application or simply start a process that takes the dumped audio files, and streams them to the transcription service.

As an alternative you can process the files offline, after the call.

But without knowing exactly what you’re doing, it’s not easy to suggest how you can go about getting it done.

i have done this,
To do this you would need a audiosocket server running . Check audiosockets in asterisk
create a channel to audiosocket server
create a brdige
copy the channel on which one party is speaking
put them in the bridge.
Your audio socket server will start getting audio stream then forward that audio stream to a transcription service i used a third party service like deepgram.
Thanks

1 Like

Hi
I use Google STT but the quality is not so good.
I tried to use DeepGram but it worked with nova2-phonecall only in English…
Any clue to use it in other languages?? (I need it in french but the general enhanced model that supposed to work do not work for me… I use externalMedia to send the audio buffer to a websocket and I send this audio to google or deepgram)

Thanks for your help

To do this, you would need to use a raw format, as meta data, for, say, .wav, doesn’t get back filled until the file is closed.

Thanks you so much .but can I do parallel live transcription for both the speakers at the same time .

Sure, you can. You can use two snoop channels and put them into the bridge with externalMedia. Snoop channel receives only one client audio. See here for example

or here

Than you so much for the help. Please let me know if this code is correct or do i need to make some changes and is possible please tell if we required to make changes in extensions.conf `#!/usr/bin/python3

import anyio
import asyncari
import logging
import aioudp
import os
import vosk
import array

Environment variables for Asterisk ARI configuration

ast_host = os.getenv(“AST_HOST”, ‘127.0.0.1’)
ast_port = int(os.getenv(“AST_ARI_PORT”, 8088))
ast_url = os.getenv(“AST_URL”, ‘http://%s:%d/’ % (ast_host, ast_port))
ast_username = os.getenv(“AST_USER”, ‘asterisk’)
ast_password = os.getenv(“AST_PASS”, ‘asterisk’)
ast_app = os.getenv(“AST_APP”, ‘hello-world’)

Load Vosk speech recognition model

model = vosk.Model(lang=‘en-us’)
channels = {}

class SnoopChannel:
def init(self, client, parent_channel, direction):
self.client = client
self.parent_channel = parent_channel
self.direction = direction
self.rec = vosk.KaldiRecognizer(model, 16000)

async def rtp_handler(self, connection):
    async for message in connection:
        data = array.array('h', message[12:])
        data.byteswap()
        if self.rec.AcceptWaveform(data.tobytes()):
            res = self.rec.Result()
        else:
            res = self.rec.PartialResult()
        print(f"{self.direction} channel result: {res}")

async def start(self):
    self.port = 45000 + len(channels) * 2 + (0 if self.direction == 'in' else 1)
    self.udp = aioudp.serve("127.0.0.1", self.port, self.rtp_handler)
    await self.udp.__aenter__()

    snoop_channel = await self.client.channels.snoopChannel(
        channelId=self.parent_channel.id,
        app=self.client._app,
        spy=self.direction,
        whisper="none"
    )

    media_id = self.client.generate_id()
    await self.client.channels.externalMedia(
        channelId=media_id,
        app=self.client._app,
        external_host='127.0.0.1:' + str(self.port),
        format='slin16'
    )

    bridge = await self.client.bridges.create(type='mixing')
    await bridge.addChannel(channel=[media_id, snoop_channel.id])

async def stasis_handler(objs, ev, client):
channel = objs[‘channel’]
await channel.answer()

if 'UnicastRTP' in channel.name:
    return

local_channel_in = SnoopChannel(client, channel, direction='in')
local_channel_out = SnoopChannel(client, channel, direction='out')

await local_channel_in.start()
await local_channel_out.start()

channels[channel.id] = (local_channel_in, local_channel_out)

async def main():
async with asyncari.connect(ast_url, ast_app, ast_username, ast_password) as client:
async with client.on_channel_event(‘StasisStart’) as listener:
async for objs, event in listener:
await stasis_handler(objs, event, client)

if name == “main”:
logging.basicConfig(level=logging.DEBUG)
anyio.run(main)
`

Thank you so much @abhinax4991 . If Possible can you provide some sample code i have posted one code in one of the replies . please check if the code is correct or not

If possible can anybody check this code and let me know if I need to make any changes in this