Subtitles are improving but automated services have far to go, says Red Bee’s Hewson Maxwell
Spare a thought for the humble subtitler. Just a few years ago, they had to type text manually, separate it out into tidy subtitles, then colour it according to who was speaking and, finally, carefully time it to the video. The process was extremely time-consuming, labour intensive and expensive.
More and more TV output is legally required to be subtitled every year; for many major broadcasters, it’s almost 100%. Pressure is inevitably put on costs. As workflows are likely as efficient as they will ever be, the main improvements that are likely to come in the future will come from technology.
The single most time-consuming part of subtitling is converting dialogue into accurate text. Accurate speaker-independent voice recognition can help this. Speaker-independent systems are not trained to any one voice and are able to operate over the original soundtrack, despite the presence of background noise and music.
But currently, outside of highly specialised language areas with limited background noise like the weather, these systems are 60-65% and produce text with limited punctuation, or none at all.
The shift to speaker independent recognition is a few years away, but anyone who has seen the massive improvements to voice control systems over the last few years can be in no doubt that its time will come.
In the meantime, the current system of “respeaking”, whereby the subtitler repeats everything they hear in a clear voice and with spoken punctuation, allows for extremely quick text input and help hit the 98% target.
Using diarisation, or speaker identification, for automatic colouring of subtitles is increasingly a possibility, as the speaker recognition algorithms now available are very strong. However, they have yet to be included in any subtitling packages, probably because while the subtitlers themselves are still generating the text, it is not a lot of extra work to include colour changes.
This speaker information can, however, be hugely useful to speaker-independent recognition systems as it allows them to split the files in to single speaker segments. It could also then be a great work-saver once these systems are implemented, by avoiding the need to then re-colour the automatically generated text. As such, these two developments are likely to arrive hand-in-hand.
Currently, a lot of productivity is lost in delivering large video files to subtitlers. Even at modern bandwidths, with modern file delivery mechanisms, this process can lead to the loss of minutes per file in regional offices and tens of minutes for homeworkers. Over the last couple of years many budding online platforms and streaming software clients have emerged that offer the possibility of completely eliminating these download times.
They also offer greater security for client media, as the video is not stored on the subtitler’s machine. The current batch offer fairly basic functionality, but we can expect to see them improving over the next few years until they take over completely from offline clients.
Increasingly, broadcasters are looking to use subtitles as meta-data, allowing them to search their own output accurately for research and compliance purposes. Subtitles are very useful for this, but offer only full text search. To extract better data, Named Entity recognition algorithms are likely to be deployed, allowing for databases storing all occurrences of proper nouns like names and places. This will offer quicker and more focused searching than whole-text search of subtitle output, and also allow for meta-analysis on the popularity and connectedness of topics, personalities.
Although these automated processes will become vastly improved over the next few years, and will become accurate enough to be worth deploying, they will always make mistakes that a human wouldn’t make, struggling with situations of unclear or overlapping dialogue, with new slang and with specialist subjects and terms.
While the role of subtitlers in generating the text for subtitles will shrink over the coming years, their roles in quality control and as the arbiter of human taste and judgement are to become more vital than ever.
Hewson Maxwell is IT manager, Spain for access services at Red Bee Media