The Remington 700 typewriter. VHS cassettes. Phone booths.
These are all things that have been replaced by modern technology. And if history tells us anything, it's that soon more tools, devices and even some jobs will soon become a thing of the past, in large part due to advances in artificial intelligence (AI).
Long-form transcription isn't likely to be one of them.
While technology has certainly improved the process of converting spoken words into written words, it still isn't quite smart enough to do with every day precision.
Sure, phones are smart enough to decipher simple instructions and answer a couple of questions. And there is some software available that does an OK job with voice dictation for documents.
But the task of accurately transcribing long strings of spoken words associated with actual human conversations is still left in the capable hands (ear, minds and eyes) of living, breathing human beings.
And contrary to what you might believe, technological advances take time.
Experts say that the error rate of human transcription of conversational speech is about 4 percent. It's not by any means perfect, but it's not too bad, either. On the other hand, experts surmise that the error rate of all of the best AI systems (Google, IBM and Microsoft combined) trying to transcribe conversational speech would be about 8 percent.
And the best commercially available systems will deliver an error rate closer to 12 percent.
These error rates are better than they were five years ago, but even at their best they are still at least double that of humans. It will be a while before AI is ready to tackle transcription services with absolute precision.
A professional and highly trained human being is always going to be better than a computer at understanding the context in which something is being said--the difference between whey and way, deer and dear, through and threw, for example.
At least for now, people are better at focusing on what really matters. And, when they're not sure what was actually being said, they can figure it out by considering other factors, including context.
That's something AI simply isn't able to do ... yet.
Many people talk too fast for computers to keep up with their every word. This means that AI won't work for delivering clear, clean and consistent transcriptions of quick-moving conversations.
And the problem gets even worse when more than one speaker is involved in the conversation.
Some people can't pronounce words properly because they have speech impediments. Some people choose not to pronounce words correctly because they are making a cultural statement, their pronunciations are influenced by family or friends, or they are taking artistic license.
The reason why people mispronounce words doesn't matter as much as the fact that they do it--and AI is not yet capable of identifying the trend or proper word.
For computers, these relatively simply questions can be quite vexing--because conversations can be complicated.
Who is talking, what they are really trying to say and how excited they are is something computers can't quite keep up with in their current conditions. But humans can. Human transcribers understand lively discourse. They can determine who among many participants in the conversation is speaking.
And they can capture the proper emotion of the moment.
Computers can't do this ... yet.
Real intelligence is still the best way to ensure that you are insured against easily avoidable mistakes. Right? Right!