In this guide we'll walk you through what to look for in a transcription software, and give you a framework to help you choose the best transcription software for your unique needs. If you're wondering "what is the best transcription software?", the answer is, like all good things in life, "it depends" on your particular needs. So this guide is here to help you make an informed decision yourself, and not to give you a subjective opinion.
But first, here's some quick context:
You have a recording of an interview, phone call or speech and you now want to get an editable transcript of the content. How do you convert your audio or video to text or subtitles, quickly? Is there a program that will transcribe audio to text?
The short answer is yes - use a transcription software - software that's specially designed to make the process of transcribing audio and video files painless. It might sound like you could just use a media player and a word processor to do it yourself alternating between windows - if we had a dollar for every time someone told us that! You'll be surprised how much time you can save transcribing your audio and video files using a transcription software - time you can spend doing something else.
Here's a quick summary of the framework we'll be covering:
This might sound like a strange thing to look for, but audio transcription software have been around for decades and some of them still only support manual transcription - you load the audio in their player and type out what you hear in their editor. The software will provide you with keyboard shortcuts to control playback of the file, which helps you save time. This workflow is still important and useful, as we describe below, however, speech to text transcription technology has improved a lot in recent years and you can get up to 90% accuracy for well-recorded audio / video files. So you want to check if the audio or video transcription software you're looking at supports the ability to create automatic machine transcripts using AI.
Like we mentioned above, even for well-recorded audio, the best automatic transcription software will still only give you about 90% accuracy. This is because automatic speech recognition is still an active field of academic Research and Development and has a long way to go before machines can produce 100% accurate results. That said, the Machine Learning models that a transcription software uses does determine the quality of the transcript. The best transcription software provide as close to 90% accuracy for well recorded audio.
Now, quality of the transcript also depends heavily on the quality of your recording. Some transcription software are easily affected by background noice than others. So you want to test the software out, by uploading different types of audio and video files and see which one gives you the best accuracy for that particular type of recording quality.
After you get the machine transcript, you want to expect to have some amount of work to edit the transcript yourself to improve its accuracy. If you absolutely need 100% accuracy, then you want to look for human-powered transcription software, where a professional transcriptionist actually does the transcribing for you. Of course, this will cost you more than machine-powered transcription software.
Machine transcripts can typically be processed by transcription software in about 30 minutes, for an hour-long audio/video. If a transcription software says they have a 24-hour turn-around time, then it's definitely a human-powered transcription software and will be more expensive.
Given that machine transcription won't give you 100% accuracy, you also want to account for the time it would take you to edit the first draft machine transcript after you get it from the transcription software.
You want to specifically look for the software's privacy policy to see how your uploaded audio/video files are handled. For eg, does a human ever touch your data? How long is your data kept on their servers? Do they use your data to train their machine learning models? Do they allow you to delete your data securely? Do they minimize the amount of data they keep on their servers?
The best and most secure transcription software will try to minimize the amount of data that is stored on their servers, for eg, by deleting your audio/video files from their servers right after processing them. The good ones also give you complete control of your data, allowing you to delete all copies of the transcript from their systems yourself, without having to jump through hoops with support.
You want to check that the transcription software supports all the languages you intend to transcribe. Now, even in the best transcription software, accuracy might vary across languages. So you want to ensure that the files you have in different languages work well with the transcription software's support for a particular language.
You also want to keep an eye for regional accent variations and how well the transcription software handles them. The best transcription software will allow you to choose different regional accents.
Timestamps like [00:32:12] will help make your transcript more easily "navigate-able", especially if you need to follow along the transcript as you're playing back the audio.
The best transcription apps will allow you to specify how often you want these timestamps to be added to the transcript, for eg: every 10s or when speakers change in the audio. When you need to edit the transcript, they then allow you to click on these timestamps to quickly jump to that portion of the audio / video, listen to it and then correct the transcript.
Speaker identification is quite a hard problem for machines to crack, but the best transcription software will allow you to enable speaker identification, which will insert speaker tags like "Speaker 1", "Speaker 2", etc in your transcript. Of course, the machine won't know the names of your speakers, so you'd have to do a search-replace of words like "Speaker 1" with the person's name, once you get the transcript.
Machine learning speech to text models are usually trained with a wide range of vocabulary. However, there might be very specific jargon that is used in your particular domain that the engine might not be able to identify (eg: medical terms, proper nouns, etc). This is where custom dictionaries come in. The best transcription software allow you to add words to a custom dictionary for the transcription engine to keep an eye out for in files you upload.
You also want to make sure that the transcription software saves these words you enter to your account, so you don't have to re-enter them every time you upload a new file.
Depending on what your workflow is and if you use one of these software, you want to check if the transcription software supports uploading files directly from these file storage services. This way you can avoid having to download your file from these services to your computer and then upload them to the service for transcription.
The best transcription software provide one or more options to upload from Google Drive, Dropbox and OneDrive, in addition to uploading files from your computer. Some transcription software also allow you to specify a public link to a media file available online - the software will download from that link and transcribe the file for you.
Like we mentioned above, automatic machine transcription is still not advanced enough to generate 100% accurate machine transcripts. Even the best transcription software only provide you about 90% accuracy for well-recorded audio / video files. So you can almost always expect work on your side, to edit the transcript by hand.
The best transcription software will provide you with an easy-to-use interface to edit the machine transcript once its done. If a transcription software does not provide this, you can expect a world of pain trying to correct mistakes yourself manually. It can be quite time-consuming! So this is a key thing to look for.
Within the transcript editor, you want to look for keyboard shortcuts. These are key to saving you time when you need to edit your transcripts. Using keyboard shortcuts, you can control playback of the audio / video while you edit the transcript.
The best transcription software also allow you to jump directly to the portion of the audio/video file that needs editing by clicking or double-clicking on the timestamps in the machine transcript. This is another time-saving feature.
Finally, a few transcription software provide integration with a foot pedal. This allows you to control the audio/video playback using your foot, freeing up your hands to do the typing. This is a huge time-saver and is something professional transcriptionists use every-day to speed up their work.
All transcription software will allow you to download your transcripts as txt or doc files. You can then open them up with Microsoft Word, Google Docs, etc and edit them further, email them, etc.
Some transcription software allow you to also download your transcripts as caption or subtitles files in SRT / VTT formats. This lets you for eg, generate subtitle files for you to use with your Youtube, Vimeo, Instagram, Facebook videos. This helps make your video accessible, which might sometimes be required by law. So if you need to generate subtitle files, you want specifically look for this in the transcription software.
Dictation is a useful feature that very few transcription software provide. Select a language, click on Dictate and speak into the microphone - and watch your words appear on the screen in real-time. This is especially useful if you don't already have a recording of your speech. For eg, let's say you're writing a book or an article and you just want to dictate your thoughts, you can use the dictation feature, to have your words be converted to text as you are thinking them out loud.
Another nifty way to use the dictation feature is when machine transcription is not able to transcribe your audio / video file accurately due to its recording quality. In cases like these, you can load the file in the transcription software's editor, slow it down, play it back in your headphones and dictate what you hear into the microphone. What you're essentially doing is converting the speech from the low quality recording, to a high quality single-speaker speech that current dictation technology can pick up quite well.
Another interesting use case you can use dictation for is transcribing lyrics from songs. You can follow the same approach in the preceding paragraph and effectively use the dictation software as a music transcription software.
Voice recognition is a computationally intensive process and takes a lot of server resources to pull off. So you'll typically find that the best transcription software are only available online and can't be downloaded. Given that this requires expensive server resources to pull off, transcription software usually either charge by the hour or have an upper cap of the number of hours you can transcribe.
Be wary of software that tell you they provide an unlimited amount of transcription hours, for a fixed price. This would bankrupt them and so they don't give you unlimited hours in reality. So keep an eye out for the fine-print, especially terms like "fair-use" policy.
The best transcription software are upfront and transparent about their pricing. You get the best deal when they charge by the hour and have a low license fee.
Before shelling out your money, you want to be sure that the transcription software works for your particular set of files. The best transcription software will allow you to signup online for a free trial and let you upload your audio / video file to get a preview transcript. Now, like we said above, since machine transcription is expensive by nature, free trials are usually limited to say 30 minutes. Some services will also place a cap of a few minutes per file during the trial.
We hope you found this guide definitive and useful. Remember that accuracy is dependent on the quality of your recordings and plan to spend time editing your audio files. If you need 100% accurate transcripts, you want to use human-powered transcription services, which are an order of magnitude more expensive than transcription software.
Full Disclosure: we build our own transcription software - Transcribe, but we've tried to keep this guide unbiased and objective to help you make an informed decision. Please reach out to us and let us know if anything below comes across as subjective. Our goal is to make this a useful guide for everyone, whether you decide to use our software or not.Feel free to reach out to us if we can clarify anything.
This is an attempt to build the definitive guide on various topics related to dictation, transcription & recording.