Almost every industry you could think of is asking itself what the future will look like as artificial intelligence (AI) and its related technologies mature. Transcription services are a prime example, where technology aims to better human ability with translation and other complex tasks.
Yet, even the best technology we have today is nowhere near advanced enough to understand human speech to the degree necessary to accurately capture the same meaning in another language.
So what about transcription services, though? All a machine needs to do is recognise and replicate the speech of human beings (in the same language), even if it doesn’t fully understand it. Microsoft already claims its own transcription services technology is better than humans – so could artificial intelligence be ready to replace traditional transcription services?
The end for traditional transcription services?
Late last year, Microsoft made the bold claim that its transcription services AI is now better than human professionals. It sure sounds like a break-through announcement but we’ve become used to big promises like this from Google and its tech rivals – most of which, it’s fair to say, fall short of expectation.
In the case of Microsoft, it hired a third-party service to transcribe a piece of audio. A two-person process resulted in an error rate of 5.9% and 11.3% while Microsoft marginally beat their score with 5.9% and 11.1% error rates.
So, clearly, Microsoft’s transcription services AI has something going for it. We don’t know much about the team Microsoft hired for this test – or the recording it used – but its transcription technology appears to be on par with humans in certain environments.
The challenges for transcription services technology
As with any kind of language technology, there are unique challenges to consider. While humans are naturally tuned to separating speech from background noises, machines don’t enjoy the same privilege – and this is a major challenge for the technology.
There are other problems the technology hasn’t learned to overcome (at least not yet):
- Multiple speakers talking
- Identifying individual speakers
- Multiple speakers talking at the same time
- Understanding context (sarcasm, irony, etc.)
- Understanding accents
- Understanding slang, colloquialisms and style
- Understanding non-native speakers
- Translating into other languages
- Conversational nuances (eg: “uh”, pauses, etc.)
So there’s a big difference between a clean recording of one person speaking in a quiet room and transcribing a heated debate between multiple nationalities. Transcription services technology is starting to prove its worth, but it’s not providing direct competition for traditional transcription services.
It’s not about machines vs humans
Most of the artificial intelligence discussions revolve around the notion of technology vs humanity. However, the future of AI transcription services is far less apocalyptic. The technology isn’t there to replace human beings because it has different strengths and weaknesses to professional transcribers.
While transcription services technology struggles with audio quality, complex dialogue and various natures of human conversation, it never gets tired, constantly learns and can make fewer mistakes than a busy team of human transcribers.
Even if an AI transcription algorithm only hits 50% accuracy (due to any of the reasons listed above), it’s 50% that humans don’t have to worry about. Better yet, if the same algorithm can match or beat human accuracy, the professional transcriber can focus on editing the few remaining mistakes and improving the overall accuracy of the final transcription.
Speech science is certainly developing at a fast rate with machine learning and the dawn of artificial intelligence. Yet few people in the language service industry can imagine the technology replacing human professionals any time soon. What we do expect, however, is for AI platforms to help us do more, faster without compromising on quality.