Google云端转录API

我想计算双人对话通话中每个说话者的时间长度，包括说话者标签、转录内容、说话者时间戳以及转录的置信度。

例如：我有一个包含两个说话者的客户服务支持的mp3文件。我想知道每个说话者的时间长度，包括说话者标签、转录内容以及转录的置信度。

我在处理转录的结束时间和置信度方面遇到了问题。转录的置信度显示为0，并且结束时间与实际结束时间不符。

音频链接：https://drive.google.com/file/d/1OhwQ-xI7Rd-iKNj_dKP2unNxQzMIYlNW/view?usp=sharing

  **strong text**  #!pip install --upgrade google-cloud-speechfrom google.cloud import speech_v1p1beta1 as speechimport datetime     tag=1speaker=""transcript = ''client = speech.SpeechClient.from_service_account_file('#cloud_credentials')audio = speech.types.RecognitionAudio(uri=gs_uri)config = speech.types.RecognitionConfig(encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,sample_rate_hertz=16000,language_code='en-US',enable_speaker_diarization=True,enable_automatic_punctuation=True,enable_word_time_offsets=True,diarization_speaker_count=2,use_enhanced=True,model='phone_call',profanity_filter=False,enable_word_confidence=True)print('Waiting for operation to complete…')operation = client.long_running_recognize(config=config, audio=audio)response = operation.result(timeout=100000)with open('output_file.txt', "w") as text_file:    for result in response.results:        alternative = result.alternatives[0]            confidence = result.alternatives[0].confidence            current_speaker_tag=-1            transcript = ""            time = 0            for word in alternative.words:                if word.speaker_tag != current_speaker_tag:                   if (transcript != ""):                      print(u"Speaker {} - {} - {} - {}".format(current_speaker_tag, str(datetime.timedelta(seconds=time)), transcript, confidence), file=text_file)                   transcript = ""                   current_speaker_tag = word.speaker_tag                   time = word.start_time.seconds                transcript = transcript + " " + word.word     if transcript != "":         print(u"Speaker {} - {} - {} - {}".format(current_speaker_tag, str(datetime.timedelta(seconds=time)), transcript, confidence), file=text_file)     print(u"Speech to text operation is completed, output file is created: {}".format('output_file.txt'))

回答：

您的问题中的代码和截图有所不同。然而，从截图中可以理解，您正在使用speech to text speaker diarization方法来创建各个说话者的语音。

在这里，您无法为每个说话者计算不同的置信度，因为response包含每个转录和各个单词的confidence值。单个转录可能包含多个说话者的单词，也可能不包含，这取决于音频。
此外，根据文档，response在最后的结果列表中包含所有带有speaker_tag的words。文档中提到：

每个结果中的转录是分开且按顺序排列的。然而，一个替代方案中的单词列表包括迄今为止所有结果中的所有单词。因此，要获取所有带有说话者标签的单词，您只需从最后的结果中获取单词列表即可。

对于最后的结果列表，置信度为0。您可以将响应写入控制台或任何文件中，并自己调试。

# Detects speech in the audio fileoperation = client.long_running_recognize(config=config, audio=audio)response = operation.result(timeout=10000) # check the whole responsewith open('output_file.txt', "w") as text_file:   print(response,file=text_file)

或者，您也可以打印每个转录和置信度以便更好地理解。例如：

#confidence for each transcriptfor result in response.results:   alternative = result.alternatives[0]   print("Transcript: {}".format(alternative.transcript))   print("Confidence: {}".format(alternative.confidence))

对于您在计算每个说话者的持续时间时遇到的问题，您正在计算每个单词的开始时间和结束时间，而不是每个说话者的。想法应该是这样的：

获取说话者第一个单词的开始时间作为持续时间的开始时间。
始终将每个单词的结束时间设置为持续时间的结束时间，因为我们不知道下一个单词是否属于不同的说话者。
注意说话者的变化，如果说话者相同，则只需将单词添加到修改后的转录中；否则，执行相同的操作，并为新说话者重置开始时间。例如：

tag=1speaker=""transcript = ''start_time=""end_time="" for word_info in words_info:   end_time = word_info.end_time.seconds   #tracking the end time of speech   if start_time=='' :       start_time = word_info.start_time.seconds #setting the value only for first time   if word_info.speaker_tag==tag:       speaker=speaker+" "+word_info.word   else:       transcript += "speaker {}: {}-{} - {}".format(tag,str(datetime.timedelta(seconds=start_time)),str(datetime.timedelta(seconds=end_time)),speaker) + '\n'       tag=word_info.speaker_tag       speaker=""+word_info.word       start_time = word_info.start_time.seconds #resetting the starttime as we found a new speaker transcript += "speaker {}: {}-{} - {}".format(tag,str(datetime.timedelta(seconds=start_time)),str(datetime.timedelta(seconds=end_time)),speaker) + '\n'

我在修改后的转录中删除了置信度部分，因为它总是为0。还要注意，Speaker diarization仍处于beta开发阶段，您可能无法获得您想要的精确输出结果。

学技术

Google云端转录API

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复