OpenAI Whisper is general-purpose speech recognition model that can perform multilingual speech recognition, speech translation and language identification.
The contents of this instruction:
- Supported languages
- Cluster
- Loading data to Lehmus computing environment
- Preparing the transcribe job template
- Preparing the job
- Lataustulokset
Supported languages
Supported languages are: Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese and Welsh.
Cluster
On Lehmus computing cluster you can utilize Whisper easily by using premade job template. Using cluster to do transcribe allows you to utilize faster GPUs that have enough memory to run the large model. In your personal laptop you can utilize the large model on CPU but it may be slow. Also using described method will make best use of our limited resources compared to interactive work that will require full GPU but will not utilize it fully.
Loading data to Lehmus computing environment
For non technical users, you can load data to Lehmus by using Lehmus OnDemand. In the portal, you can open the Files application and select “Home directory”. This is your Linux home directory and you can use it to store interviews and other sensitive material.
On the “Home Directory”, create new folder called “whisper” (note that file names are case-sensitive). This can be done by “New Folder” item. If you have the “whisper” folder, you don’t need to create it again. After that move to folder and you can use “Upload” button to transfer audio or video files to server. If you have problems with disk quota, you can request more from ict@oulu.fi.
Preparing the transcribe job template
Now you have dataset on the server, you can create job that will run the whisper model on the dataset. We have prepared simple job template that will run the model on your audio files.
First open on Lehmus OnDemand and from top of the page “Jobs” menu and select “Job Composer”. On There you can create new job from template. We have few templates available that you can utilize but on this case we should utilize Whisper transcribe template. When you have selected it, you can press “Create New Job” button.
Preparing the Job
Now we have new job, we should adjust it to work with our files. First select the created job and after that on right hand side, go to “Submit Script” section and select “Open Editor”. This will open the job.sh file to editor.
In Editor locate lines that contain WORKDIR
and AUDIOFILE
. We should edit these so that they contain names of the working directory, in this example it is whisper and the audio name that you uploaded. Below is example of edited job.sh that will use haastattelu.m4a file.
#!/bin/bash
#SBATCH –partition=normal
#SBATCH –gres=gpu:a30:1
#SBATCH –mem=12G
#SBATCH –cpus-per-task=2
#SBATCH –time=01:00
# Load whisper
echo “Load Whisper”
module load whisper
# Path to file that we will transcribe #
# Edit these to locate your file #
########################################
WORKDIR=$HOME/whisper/
AUDIOFILE=$WORKDIR/haastattelu.m4a
# Whisper parameters #
######################
# Select model
# – tiny
# – base
# – small
# – medium
# – large
MODEL=large
# Language
# Supported languages:
# Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian,
# Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian,
# Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian,
# Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean,
# Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali,
# Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian,
# Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog,
# Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese and Welsh.
LANGUAGE=Finnish
# Task
# – translate
# – transcribe
TASK=transcribe
# Run the task
echo “Start Whisper”
cd $WORKDIR
time whisper $AUDIOFILE –model $MODEL –task $TASK –language $LANGUAGE
echo “Done”
After this you can save the file and close the opened tab. Now you can run the job by pressing Submit button. This will launch the job on Lehmus cluster and it will take some time to run. When the Job is done, it will show status Completed and if you go back to Files > Home Directory > whisper folder, you can see the created transcribe files. If you don’t see files there, you can check slurm-id.out files on Job Composer and it may tell what did go wrong. Most likely you typed the AUDIOFILE name wrong. Note that file names are case-sensitive.
Download results
You can download results from Lehmus OnDemand portals Files > Home Directory > whisper path. There you can select files that you want to download and use Download button start the download.