X
2009

Google Adds Automatic Captions, Improving YouTube Videos

November 20, 2009 0

Mountain View, California — In addition to recently rolling out real-time Translation and Text-to-Speech service, search engine titan Google, in a significant move toward making millions of YouTube’s massive inventory more accessible to deaf and hearing-impaired people, on Thursday also unveiled automatic captions and timing features for YouTube videos.

Google on Thursday said that it has combined its machine-generated speech-recognition technology with the existing YouTube caption system, which would initially be available only in English and on videos from 13 YouTube “partner channels” but it expects to extend the feature eventually to all videos uploaded to the site.

The technology dubbed as “auto-caps” for short, employs the same voice-recognition algorithms used in Google Voice to automatically generate captions for YouTube videos. With Google Translate, the captions can be in 51 languages. Video uploaders can create a text file to add captions. Moreover, auto timing, enables users to add manually created captions to YouTube.

Ken Harrenstien, a Google software engineer who is deaf, said that numerous captioned videos on YouTube and Google Video indicates that more people are becoming aware of how useful captions can be.

As we have mentioned in the past, captions not only facilitate the deaf and hearing-impaired, but with machine translation they also empower people around the globe to access video content in any of 51 languages, Harrenstien said. “Captions can also enrich search and even enable users to jump to the exact parts of the videos they are looking for.”

Automatic captions are available on some education channels and most Google channels at this point, but will soon be expanded. The technology will also expose YouTube videos to a wider foreign market and make them more searchable, which will make it easier for Google to profit from them.

“Google considers that the world’s information should be accessible to everyone,” said Vint Cerf, a Google vice president who has been described as the “Father of the Internet.”

Harrenstien said a large number of clips on YouTube did not have captions and the new Google technology would generate them automatically. YouTube is initially utilizing the captioning technology only to a few selected channels, most of them specializing in educational content. They include channels from universities like Stanford, Yale, Duke, Columbia and the Massachusetts Institute of Technology, PBS and National Geographic, and Google itself — its corporate videos will be captioned. The company plans to gradually expand the number of channels that work with the automatic captioning technology.

Because the currently available tools are not so perfect, so we want to make sure that we get feedback from the video owners and the viewers before we unfold it for the whole world, Harrenstien said. “Sometimes the auto-captions are good. Sometimes they are not great, but they are better than nothing if you are hearing-impaired or do not know the language.”

“One of the biggest challenges of the video medium is whether it can be made accessible to everyone,” said Cerf, who also holds the title of “Chief Internet Evangelist” at Google.

Addressing a gathering at Googles’ Washington office, Cerf noted that he has a “great personal interest” in the closed caption capability. Cerf, 66, is hearing impaired and has been wearing hearing aids since the age of 13.

Noting that over 20 hours of video are uploaded to YouTube every minute, Harrenstien, noted that “the majority of user-generated video content online is still inaccessible to people like me.”

Google audio engineers said background noise and strong accents pose a challenge to creating precise captions from the spoken word, but Harrenstien said the technology "will continue to improve with time."

“Today I’m more hopeful than ever that we will achieve our long-term goal of making videos universally accessible,” he said in a blog post. “Even with its flaws, I see the addition of automatic captioning as a huge step forward.”

Here is how it works: Prepare a text file with all the words in the video, then upload the file and Google’s automatic speech-recognition technology understands as the words are uttered and creates captions for the video. As Harrenstien sees it, the process should significantly lower the barriers for video owners who want to add captions, but who do not have the time or resources to create professional caption tracks. As for auto timing, just upload a text transcript – no time codes required – and Google will do the rest of the work.

Thursday’s program was attended by representatives of Gallaudet University, the largest US university for the deaf, as well as the National Association of the Deaf, the American Association of People with Disabilities and other groups.

Watch the video above for a more detailed step-by-step information: