Hello, I want to stream both the parties audio separately to a web socket for real time transcription and diarization(speaker labelling). I am able to record the audio separately using monitor for both agent and costumer but i want to steam the audio

Rachit · July 4, 2024, 7:38am

Rachit · July 4, 2024, 7:46am

@shamnusln . Can you please help me out with this as you have achieved it

Chano · July 5, 2024, 11:52am

Asterisk is not able to do the job directly, as far as I know. You would most likely need a Stasis application or simply start a process that takes the dumped audio files, and streams them to the transcription service.

As an alternative you can process the files offline, after the call.

But without knowing exactly what you’re doing, it’s not easy to suggest how you can go about getting it done.

abhinax4991 · July 5, 2024, 12:25pm

i have done this,
To do this you would need a audiosocket server running . Check audiosockets in asterisk
create a channel to audiosocket server
create a brdige
copy the channel on which one party is speaking
put them in the bridge.
Your audio socket server will start getting audio stream then forward that audio stream to a transcription service i used a third party service like deepgram.
Thanks

meirjdd · July 7, 2024, 7:01am

Hi
I use Google STT but the quality is not so good.
I tried to use DeepGram but it worked with nova2-phonecall only in English…
Any clue to use it in other languages?? (I need it in french but the general enhanced model that supposed to work do not work for me… I use externalMedia to send the audio buffer to a websocket and I send this audio to google or deepgram)

Thanks for your help

david551 · July 7, 2024, 12:06pm

To do this, you would need to use a raw format, as meta data, for, say, .wav, doesn’t get back filled until the file is closed.

Rachit · July 7, 2024, 3:01pm

Thanks you so much .but can I do parallel live transcription for both the speakers at the same time .

nshmyrev · July 7, 2024, 3:29pm

Sure, you can. You can use two snoop channels and put them into the bridge with externalMedia. Snoop channel receives only one client audio. See here for example

github.com

asterisk/node-ari-client/blob/master/lib/resources.js

/**
 *  First class object Resources found in ARI. Properties are attached to these
 *  instances to provide instance specific callable operations.
 *
 *  @module resources
 *
 *  @copyright 2014, Digium, Inc.
 *  @license Apache License, Version 2.0
 *  @author Samuel Fortier-Galarneau <sgalarneau@digium.com>
 */

'use strict';

var util = require('util');

var _ = require('lodash');
var uuid = require('uuid');
var Promise = require('bluebird');

var _utils = require('./utils.js');

This file has been truncated. show original

or here

github.com

alphacep/vosk-server/blob/master/client-samples/asterisk-ari/vosk_ari.py

#!/usr/bin/python3

import anyio
import asyncari
import logging
import aioudp
import os
import vosk
import array

ast_host = os.getenv("AST_HOST", '127.0.0.1')
ast_port = int(os.getenv("AST_ARI_PORT", 8088))
ast_url = os.getenv("AST_URL", 'http://%s:%d/'%(ast_host,ast_port))
ast_username = os.getenv("AST_USER", 'asterisk')
ast_password = os.getenv("AST_PASS", 'asterisk')
ast_app = os.getenv("AST_APP", 'hello-world')


model = vosk.Model(lang='en-us')
channels = {}

This file has been truncated. show original

Rachit · July 8, 2024, 4:50am

Than you so much for the help. Please let me know if this code is correct or do i need to make some changes and is possible please tell if we required to make changes in extensions.conf `#!/usr/bin/python3

import anyio
import asyncari
import logging
import aioudp
import os
import vosk
import array

Environment variables for Asterisk ARI configuration

ast_host = os.getenv(“AST_HOST”, ‘127.0.0.1’)
ast_port = int(os.getenv(“AST_ARI_PORT”, 8088))
ast_url = os.getenv(“AST_URL”, ‘http://%s:%d/’ % (ast_host, ast_port))
ast_username = os.getenv(“AST_USER”, ‘asterisk’)
ast_password = os.getenv(“AST_PASS”, ‘asterisk’)
ast_app = os.getenv(“AST_APP”, ‘hello-world’)

Load Vosk speech recognition model

model = vosk.Model(lang=‘en-us’)
channels = {}

class SnoopChannel:
def init(self, client, parent_channel, direction):
self.client = client
self.parent_channel = parent_channel
self.direction = direction
self.rec = vosk.KaldiRecognizer(model, 16000)

async def rtp_handler(self, connection):
    async for message in connection:
        data = array.array('h', message[12:])
        data.byteswap()
        if self.rec.AcceptWaveform(data.tobytes()):
            res = self.rec.Result()
        else:
            res = self.rec.PartialResult()
        print(f"{self.direction} channel result: {res}")

async def start(self):
    self.port = 45000 + len(channels) * 2 + (0 if self.direction == 'in' else 1)
    self.udp = aioudp.serve("127.0.0.1", self.port, self.rtp_handler)
    await self.udp.__aenter__()

    snoop_channel = await self.client.channels.snoopChannel(
        channelId=self.parent_channel.id,
        app=self.client._app,
        spy=self.direction,
        whisper="none"
    )

    media_id = self.client.generate_id()
    await self.client.channels.externalMedia(
        channelId=media_id,
        app=self.client._app,
        external_host='127.0.0.1:' + str(self.port),
        format='slin16'
    )

    bridge = await self.client.bridges.create(type='mixing')
    await bridge.addChannel(channel=[media_id, snoop_channel.id])

async def stasis_handler(objs, ev, client):
channel = objs[‘channel’]
await channel.answer()

if 'UnicastRTP' in channel.name:
    return

local_channel_in = SnoopChannel(client, channel, direction='in')
local_channel_out = SnoopChannel(client, channel, direction='out')

await local_channel_in.start()
await local_channel_out.start()

channels[channel.id] = (local_channel_in, local_channel_out)

async def main():
async with asyncari.connect(ast_url, ast_app, ast_username, ast_password) as client:
async with client.on_channel_event(‘StasisStart’) as listener:
async for objs, event in listener:
await stasis_handler(objs, event, client)

if name == “main”:
logging.basicConfig(level=logging.DEBUG)
anyio.run(main)
`

Rachit · July 8, 2024, 4:53am

Thank you so much @abhinax4991 . If Possible can you provide some sample code i have posted one code in one of the replies . please check if the code is correct or not

Rachit · July 11, 2024, 6:17am

If possible can anybody check this code and let me know if I need to make any changes in this

system · August 10, 2024, 6:17am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How do I get the called party's audio stream while playing audio Asterisk APIs	1	46	October 4, 2024
Realtime voice to voice Streaming Asterisk APIs	17	1462	August 25, 2024
Streaming from ARI snoop channel for Speech recognition Asterisk APIs	14	5297	October 22, 2020
Sending Raw Audio Stream over Websockets Asterisk APIs	5	2858	May 7, 2022
Record, transcribe, TTS between callers Asterisk APIs	2	36	April 23, 2025

Hello, I want to stream both the parties audio separately to a web socket for real time transcription and diarization(speaker labelling). I am able to record the audio separately using monitor for both agent and costumer but i want to steam the audio

Environment variables for Asterisk ARI configuration

Load Vosk speech recognition model

Related topics