如何在OpenAI的Whisper ASR中获取词级时间戳？

我使用OpenAI的Whisper Python库进行语音识别。如何获取词级时间戳？

要使用OpenAI的Whisper进行转录（在Ubuntu 20.04 x64 LTS上使用Nvidia GeForce RTX 3090测试）：

conda create -y --name whisperpy39 python==3.9conda activate whisperpy39pip install git+https://github.com/openai/whisper.git sudo apt update && sudo apt install ffmpegwhisper recording.wavwhisper recording.wav --model large

如果使用Nvidia GeForce RTX 3090，在conda activate whisperpy39之后添加以下内容：

pip install -f https://download.pytorch.org/whl/torch_stable.htmlconda install pytorch==1.10.1 torchvision torchaudio cudatoolkit=11.0 -c pytorch

回答：

在openai-whisper版本20231117中，您可以通过在调用transcribe()时设置word_timestamps=True来获取词级时间戳：

pip install openai-whisper

import whispermodel = whisper.load_model("large")transcript = model.transcribe(    word_timestamps=True,    audio="toto.mp3")for segment in transcript['segments']:    print(''.join(f"{word['word']}[{word['start']}/{word['end']}]"                     for word in segment['words']))

输出如下：

多多,[2.98/3.4] 我[3.4/3.82] 有一种[3.82/3.96] 感觉[3.96/4.02] 我们[4.02/4.22] 不再[4.22/4.44] 在[4.44/4.56] 堪萨斯[4.56/4.72] 了。[4.72/5.14]

学技术

如何在OpenAI的Whisper ASR中获取词级时间戳？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复