为了课程需要,我正在尝试从ChatGPT API响应中流式传输音频。下面的代码基本上是有效的,当我稍后播放保存的文件时,音质很好,但如果我尝试实时播放,音频非常断断续续:
# Initialize an empty AudioSegment for concatenationfull_audio = AudioSegment.empty()stream_completion = client.chat.completions.create( model="gpt-4o-audio-preview", modalities=["text", "audio"], audio={"voice": "alloy", "format": "pcm16"}, messages=[ { "role": "user", "content": "Can you tell me a funny short story about a pickle?" } ], stream=True)# Play the audio as it comes in and concatenate itfor chunk in stream_completion: chunk_audio = getattr(chunk.choices[0].delta, 'audio', None) if chunk_audio is not None: pcm_bytes = base64.b64decode(chunk_audio.get('data', '')) if pcm_bytes: audio_segment = AudioSegment.from_raw( io.BytesIO(pcm_bytes), sample_width=2, # 16-bit PCM frame_rate=24000, # 24kHz sample rate channels=1 # Mono audio ) play(audio_segment) # Concatenate the audio segment full_audio += audio_segment# Save the concatenated audio to a filefull_audio.export("assets/audio/full_audio.wav", format="wav")
你有办法在实时流式播放时使音频更加流畅吗?
回答:
没有你可能使用的导入语句,具体行为不太清楚,但你可能需要在另一个线程或进程中获取音频,否则获取每个下一块的时间将会在每个音频播放块之间增加!
from queue import Queueimport threadingdef fetcher(queue, exit_event): while True: data = "?.method()" # TODO get the next block here if data is None: # TODO suitable exiting case break Q.put(data) exit_event.set() # all blocks retrieved, begin exitingdef playback(queue, exit_event, retry_wait_seconds=0.1): while not exit_event.is_set() and not queue.empty(): try: # NOTE a fast network should keep this buffer filled data = queue.get(timeout=retry_wait_seconds) except queue.Empty: continue # next data block or exiting # might be better to put this into the fetcher too and just play audio_segment = AudioSegment.from_raw( io.BytesIO(pcm_bytes), sample_width=2, # 16-bit PCM frame_rate=24000, # 24kHz sample rate channels=1 # Mono audio ) play(audio_segment)def main(): Q = Queue() E = threading.Event() threads = [] threads.append(threading.Thread(target=fetcher, args=(Q,E,))) threads.append(threading.Thread(target=playback, args=(Q,E,))) for t in threads: t.join()
如果这样做仍然导致播放间隙,很可能是你的播放函数有问题,或者每次调用时重新实例化了播放
另一种方法是等待一次更长时间,然后一次性获取所有音频
play(full_audio)