Smoothing Out Streamed Audio from ChatGPT

为了课程需要,我正在尝试从ChatGPT API响应中流式传输音频。下面的代码基本上是有效的,当我稍后播放保存的文件时,音质很好,但如果我尝试实时播放,音频非常断断续续:

# Initialize an empty AudioSegment for concatenationfull_audio = AudioSegment.empty()stream_completion = client.chat.completions.create(    model="gpt-4o-audio-preview",    modalities=["text", "audio"],    audio={"voice": "alloy", "format": "pcm16"},    messages=[        {            "role": "user",            "content": "Can you tell me a funny short story about a pickle?"        }    ],    stream=True)# Play the audio as it comes in and concatenate itfor chunk in stream_completion:    chunk_audio = getattr(chunk.choices[0].delta, 'audio', None)    if chunk_audio is not None:        pcm_bytes = base64.b64decode(chunk_audio.get('data', ''))        if pcm_bytes:            audio_segment = AudioSegment.from_raw(                io.BytesIO(pcm_bytes),                sample_width=2,  # 16-bit PCM                frame_rate=24000,  # 24kHz sample rate                channels=1  # Mono audio            )            play(audio_segment)            # Concatenate the audio segment            full_audio += audio_segment# Save the concatenated audio to a filefull_audio.export("assets/audio/full_audio.wav", format="wav")

你有办法在实时流式播放时使音频更加流畅吗?


回答:

没有你可能使用的导入语句,具体行为不太清楚,但你可能需要在另一个线程或进程中获取音频,否则获取每个下一块的时间将会在每个音频播放块之间增加!

from queue import Queueimport threadingdef fetcher(queue, exit_event):    while True:        data = "?.method()"  # TODO get the next block here        if data is None:     # TODO suitable exiting case            break        Q.put(data)    exit_event.set()  # all blocks retrieved, begin exitingdef playback(queue, exit_event, retry_wait_seconds=0.1):    while not exit_event.is_set() and not queue.empty():        try:  # NOTE a fast network should keep this buffer filled            data = queue.get(timeout=retry_wait_seconds)        except queue.Empty:            continue  # next data block or exiting        # might be better to put this into the fetcher too and just play        audio_segment = AudioSegment.from_raw(            io.BytesIO(pcm_bytes),            sample_width=2,  # 16-bit PCM            frame_rate=24000,  # 24kHz sample rate            channels=1  # Mono audio        )        play(audio_segment)def main():    Q = Queue()    E = threading.Event()    threads = []    threads.append(threading.Thread(target=fetcher, args=(Q,E,)))    threads.append(threading.Thread(target=playback, args=(Q,E,)))    for t in threads:         t.join()

如果这样做仍然导致播放间隙,很可能是你的播放函数有问题,或者每次调用时重新实例化了播放

另一种方法是等待一次更长时间,然后一次性获取所有音频

play(full_audio)

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注