Smoothing Out Streamed Audio from ChatGPT

为了课程需要，我正在尝试从ChatGPT API响应中流式传输音频。下面的代码基本上是有效的，当我稍后播放保存的文件时，音质很好，但如果我尝试实时播放，音频非常断断续续：

# Initialize an empty AudioSegment for concatenationfull_audio = AudioSegment.empty()stream_completion = client.chat.completions.create(    model="gpt-4o-audio-preview",    modalities=["text", "audio"],    audio={"voice": "alloy", "format": "pcm16"},    messages=[        {            "role": "user",            "content": "Can you tell me a funny short story about a pickle?"        }    ],    stream=True)# Play the audio as it comes in and concatenate itfor chunk in stream_completion:    chunk_audio = getattr(chunk.choices[0].delta, 'audio', None)    if chunk_audio is not None:        pcm_bytes = base64.b64decode(chunk_audio.get('data', ''))        if pcm_bytes:            audio_segment = AudioSegment.from_raw(                io.BytesIO(pcm_bytes),                sample_width=2,  # 16-bit PCM                frame_rate=24000,  # 24kHz sample rate                channels=1  # Mono audio            )            play(audio_segment)            # Concatenate the audio segment            full_audio += audio_segment# Save the concatenated audio to a filefull_audio.export("assets/audio/full_audio.wav", format="wav")

你有办法在实时流式播放时使音频更加流畅吗？

回答：

没有你可能使用的导入语句，具体行为不太清楚，但你可能需要在另一个线程或进程中获取音频，否则获取每个下一块的时间将会在每个音频播放块之间增加！

from queue import Queueimport threadingdef fetcher(queue, exit_event):    while True:        data = "?.method()"  # TODO get the next block here        if data is None:     # TODO suitable exiting case            break        Q.put(data)    exit_event.set()  # all blocks retrieved, begin exitingdef playback(queue, exit_event, retry_wait_seconds=0.1):    while not exit_event.is_set() and not queue.empty():        try:  # NOTE a fast network should keep this buffer filled            data = queue.get(timeout=retry_wait_seconds)        except queue.Empty:            continue  # next data block or exiting        # might be better to put this into the fetcher too and just play        audio_segment = AudioSegment.from_raw(            io.BytesIO(pcm_bytes),            sample_width=2,  # 16-bit PCM            frame_rate=24000,  # 24kHz sample rate            channels=1  # Mono audio        )        play(audio_segment)def main():    Q = Queue()    E = threading.Event()    threads = []    threads.append(threading.Thread(target=fetcher, args=(Q,E,)))    threads.append(threading.Thread(target=playback, args=(Q,E,)))    for t in threads:         t.join()

如果这样做仍然导致播放间隙，很可能是你的播放函数有问题，或者每次调用时重新实例化了播放

另一种方法是等待一次更长时间，然后一次性获取所有音频

play(full_audio)

学技术

Smoothing Out Streamed Audio from ChatGPT

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复