What is the difference between Automatic Transcription and Self Transcription?

Self Transcription

When we originally launched Transcribe in 2011, speech recognition technology was too early for us to be able to automatically convert speech from uploaded audio/video into text. So we launched with a set of features that helped make the process of Do-It-Yourself (DIY) transcription as painless as possible. We offered a tightly-integrated editor and media player that can be controlled via several types of time-saving shortcuts. We also launched the ability to use single-speaker dictation technology that was available at the time, to voice type transcripts and make the process even easier. We charge a flat annual license for this DYI part of the app, now rebranded as "Self Transcription".

Automatic Transcription

Fast forward to 2018 - machine learning research over the years allowed us to finally offer a commercial, cost-effective automatic transcription service with up to 90% accuracy. Unlike in Self Transcription, the workflow in Automatic Transcription is you upload an audio/video file, which we then process using machine learning models, to provide you a first draft transcript that you can then tweak as needed to get the final transcript. When we launched this feature, we called it Automatic Transcription and rebranded our earlier offering as Self Transcription. With Automatic Transcription, given the amount of computing resources needed, we charge an hourly rate to process transcripts, unlike in Self Transcription, where we charge a flat annual license fee.

So which one should you choose?

If your audio / video recording:

  • is in a language (and accent) we support in Automatic Transcription
  • is recorded in high quality
  • has all speakers speaking clearly and audibly, without speaking over each other
  • has no ambient background noise
  • can be uploaded to our servers

then, we recommend using Automatic Transcription and seeing how it works on your audio / video files. Accuracy of the machine transcript should be around 90% for well recorded audio. It is unfortunately still not possible to achieve 100% accuracy using machine transcription, given the current limitations of speech recognition technology. It is however an active area of academic computer science research, so we expect accuracy to increase in the coming years. The general workflow is to get a first draft machine transcript using Automatic Transcription and then edit the transcript to 100% accuracy using Self Transcription.

If your audio/video file doesn't work well with Automatic Transcription (say accuracy is below 90%), your next best bet is to use single-speaker voice recognition (aka dictation) to transcribe. This process of using Dictation to transcribe is called Voice Typing. You want to play the audio/video in Self Transcription using headphones, then dictate what you hear into your microphone after turning on our Dictation feature in Self Transcription. By doing this you're essentially "converting" the source file into single speaker dictation (your voice) without background noise, which dictation technology is good at converting to text. Here is an article that talks about this in more detail.

If that also doesn't work or if you want to edit the transcript to 100% accuracy, you can use Self Transcription without dictation. You want to play the audio/video in Self Transcription, slow it down in the player and use our nifty keyboard shortcuts, auto-loop feature, text expander and foot pedal integration to control playback of the player efficiently while typing out the text. This is still faster than playing the audio/video in a media player and using a separate word processor. Many of our users are pleasantly surprised by how much time they save using Self Transcription, so we highly recommend giving this a shot.

Strict Compliance Requirements

One more thing to consider is that in Automatic Transcription and Dictation your file / voice samples are uploaded to our servers. In Automatic Transcription, the transcript is stored on our servers. Whereas in Self Transcription, the audio/video file is directly played from your browser and the text you type is only stored in your browser. So your files and transcript never leave your computer. This nuance might be important to meet some strict compliance requirements.

We take data security and privacy extremely seriously at Transcribe and handle the data you upload with utmost care, as described in our Privacy Policy and Data Retention Policy. Industry-leading security measures and practices are baked into our fabric. Even with all these measures, some of our users are not able to let their highly sensitive files leave their computer, due to compliance reasons on their side. In these situations, Self Transcription (minus dictation) is an ideal candidate, since your data does not leave your computer. 

Cost vs Effort

The last thing to consider is cost vs effort. Automatic Transcription is charged by the hour (due to the significant computing resources it takes on our side for machine transcription), and in exchange you get a first draft machine transcript quickly. So the total time you spend will be much lesser. 

However, if cost is a concern, Self Transcription has a low flat fee for the entire year. The trade-off is that with Voice Typing and Manual Transcription, you'll be spending a little bit more time on transcriptions relatively compared to Automatic Transcription. All said, Self Transcription will still be faster than manually typing into a word processor - you'll be pleasantly surprised!

If you have any additional questions, please feel free to send us a message.

Still have questions? Contact us.

Article Categories

Still have questions?
Or email us.