Images by cottonbro from pexels.com |
In this blog, we share some Python code to get transcripts from video files with Microsoft Speech Service. We have mp4 files which are video files, and we need to convert them to Waveform audio format because the speech service takes only audio format.
1. Azure Cognitive Service
Get an account for Azure Cognitive Service (if you do not have one). https://azure.microsoft.com/en-us/free/cognitive-services/. Please take note of the subscription key and region of the account.
2. Install Dependencies
2.1 moviepy
This library allows us to convert mp4 file to wav file format.
pip install moviepy
2.2 azure-cognitiveservices-speech
pip install azure-cognitiveservices-speech
3. Configuration
export these environment variables.
export SPX_SUBSCRIPTION_KEY=<the subscription key to the cognitive service resides> export SPX_REGION=<the region which the cognitive service resides>
4. Python Code
Save the following to file named, spx_recognize.py. We have file path as file_path="/Users/joe/video/sample.mp4
. Please change it to refer to an mp4 file on your laptop.
import os
import tempfile
import moviepy.editor as mp
import azure.cognitiveservices.speech as speechsdk
import azure.cognitiveservices.speech.languageconfig as spxlangconfig
def convert_mp4_wav(file_path: str):
"""Convert mp4 to audio wav file in temporary folder.
Args:
file_path (str): path to the mp4 file.
Returns:
str: path to the newly created wav file.
"""
file_name = os.path.basename(file_path)
output_file_name = file_name[0 : file_name.rindex(".")] + ".wav"
tmp_folder = tempfile.gettempdir()
output_path = os.path.join(tmp_folder, output_file_name)
clip = mp.VideoFileClip(file_path)
clip.audio.write_audiofile(output_path)
return output_path
def recognize(
languages: list,
subscription_key: str,
region: str,
file_path: str,
):
"""Get the transcript and source audio language.
Recognize a file and get the transcript and source audio
language.
Args:
languages (list): List of possible languages.
subscription_key (str): subscription key of Azure Speech Service
region (str): region where the Azure Speech Service is hosted.
file_path (str): path to the mp4 file.
Raises:
SystemError: when there are no results.
SystemError: when request is cancelled because subscription key
and/or region are incorrect.
"""
wav_file = convert_mp4_wav(file_path)
auto_detect_language_config = spxlangconfig.AutoDetectSourceLanguageConfig(
languages=languages,
)
speech_config = speechsdk.SpeechConfig(
subscription=subscription_key,
region=region,
)
audio_input = speechsdk.AudioConfig(filename=wav_file)
speech_recognizer = speechsdk.SpeechRecognizer(
speech_config=speech_config,
audio_config=audio_input,
auto_detect_source_language_config=auto_detect_language_config,
)
result = speech_recognizer.recognize_once_async().get()
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
detect_lang_result = speechsdk.AutoDetectSourceLanguageResult(result)
return result.text, detect_lang_result.language
elif result.reason == speechsdk.ResultReason.NoMatch:
raise SystemError(
f"No speech could be recognized: {result.no_match_details}",
)
elif result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = result.cancellation_details
raise SystemError(
f"Speech Recognition canceled: {cancellation_details.reason}",
)
if __name__ == "__main__":
text, lang = recognize(
languages=["en-US", "es-ES"],
subscription_key=os.getenv("SPX_SUBSCRIPTION_KEY"),
region=os.getenv("SPX_REGION"),
file_path="/Users/joe/video/sample.mp4",
)
print(text)
print(lang)
Type python spx_recognize.py and the transcript and source audio language will be printed on the terminal.
Comments
Post a Comment