There are many reasons to caption online videos. Accessibility compliance is probably one of the most common, especially for schools and organizations subject to the Americans with Disabilities Act or similar state laws. Broadcasters are subject to Federal Communications Commission (FCC) rules, which under the Twenty-First Century Communications and Video Accessibility Act of 2010 now cover the internet delivery of programs that originated on broadcast.
Beyond compliance, captioning is a great way to open up programs to viewers who speak (and read) multiple languages. There are a host of other ways that captions can add value to existing video.
Outsource or DIY?
Less than 10 years ago, it was reasonable for many producers with some captioning needs to get started with the do-it-yourself (DIY) route. Online video platforms were much less mature, so most workflows were strictly tailored for one streaming server and player. Captioning was also more player-dependent. So adding captioning to the mix didn’t seem as onerous, even if it was a lot of work.
Today the DIY route may appear economical at first, but there can be hidden costs when it comes to scale, sustainability, and accuracy.
“We have encountered a handful of universities that have attempted to do this themselves [and] some have stuck with it,” says Kevin Erler, president of online captioning company Automatic Sync Technologies. “Most have discovered that it’s harder than they anticipated.”
Josh Miller, VP of sales and development at captioning vendor 3Play Media, says some enterprise clients that have attempted their own captioning “thought they only had a few videos, (but then) it’s a bigger part of their job.” Those companies discover that “quality and turnaround become serious issues.”
For other organizations, the DIY route just isn’t practical at all. “I was quite sure I didn’t have the technical know-how, nor the time,” says Jeff Brownson, communications coordinator for the State of Oregon’s Deaf and Hard of Hearing Services program. “We have some extremely competent vendors in the area. So for me it was a matter of farming it out to people who really know what they’re doing, and doing it in a professional way.”
The growth in companies providing captioning services for online video makes outsourcing a practical choice. Additionally, most major online video platforms and content delivery networks (CDNs) support captions out-of-the-box, often including APIs that captioning vendors can leverage to further simplify workflows.
Therefore we see an expanding number of organizations across the enterprise, government, and education outsourcing their captioning needs. In this article I will outline the things you should know about captioning that will help when selecting and working with a vendor.
The Uses of Captions
There are a host of benefits to captioning videos that aren’t always obvious if you are focused on compliance. For instance, captioning videos opens them up to a broader audience of persons who may not consider themselves deaf, but who otherwise have hearing difficulties.
“I’ve seen estimates that 20 percent of web users have hearing loss,” Brownson says. “As more websites use video as a means for dissemination, then the need for captioning really grows.”
Captions also make it easier for other viewers. Erler explains that English as a second language viewers “will generally be able to comprehend the written word before they pick up the nuances of speech.”
Brownson says, “There are advantages for hearing users, who can re-read [captions or transcripts], to help make sure they comprehend” the material.
SEO should not be overlooked, either. According to Miller, “Having text associated with the video goes with the model of the internet being text based.” Especially on platforms such as YouTube, captions serve as a transcript that opens much more of a video’s content to search.
Miller says the question the consider is this: “How can we add value by adding captions? It’s a tool and enabler for content consumption, for the hearing impaired and everyone.”
Carol Studenmund, founder of LNS Captioning in Oregon, has seen her clients experience that value-add. “We’ll start working with a city government and it’s all about compliance. Then they start getting our files,” she explains. “They love that they can open up the archives and find every time that ‘river crossing’ was mentioned in city council. Captions are used to review what happened.”
Know Your Needs
A captioning vendor should walk you through the information it will need to provide you with the most appropriate service for your application. But as with selecting any service, the more you have a grasp on your needs, the better off you’ll be.
“A client should know why they’re captioning,” says Studenmund. “It’s good to know who the audience is.” For instance, Miller says he would like to know if a client has an employee who needs accommodation or if it’s a matter of compliance.
According to Erler, other common needs include providing in-video search or creating the transcript for language translation.
It’s in the Transcript
Good captions start with a good transcript, which can also be used for translations and multilanguage support. So you’ll need to decide if you are providing your own transcripts or having the vendor transcribe your videos.
A caption transcript starts with a verbatim text of the speech in the video. Then that must be time-synchronized so that the player displays the captions at the proper moment, consistent with the soundtrack.
If transcription is already part of your workflow, most captioning vendors will be able to work with a client-supplied text-only transcript and sync it with the video. Otherwise, most producers likely will need transcripts. Given that this is the most labor-intensive part of the captioning process, most producers are best off leaving it to the specialists.
There is a common misconception that voice recognition software can be used to create a caption transcript. The prevalence of this belief is due in part to the fact that YouTube offers automated captioning of some videos.
However, anyone who has viewed confusing machine-generated captions or received an unintentionally hilarious Google Voice transcript or inappropriate Siri answer should understand why voice recognition alone isn’t sufficient to create transcripts.
“Does quality matter? Is it because you want to enable search and navigation with a really sophisticated audience? Is content your brand?” Miller asks. “If so, you wouldn’t want too many mistakes” in your captions.
Computers, Humans, and Turnaround
It’s important to know how quickly videos need to be captioned. Some producers — especially those dealing with compliance — can’t or won’t release videos until they’re captioned, while others have more flexibility. Professional transcriptionists can generate accurate transcriptions faster than your average undergraduate, intern, or editor.
A producer should decide if same-day turnaround is needed or if a wait of 1 or more working days is acceptable. Also, not all vendors offer live captioning, so that’s an important upfront question.
Although speech-to-text recognition is not yet adequate by itself, some companies are using this technology, aided and checked by humans, to speed the captioning of on-demand videos. Automatic Sync takes this approach. “We’ve tried to focus on picking a balance between the two, using human skill when necessary, using automation everywhere else,” says Erler.
3Play uses a similar strategy. According to Miller, “We use speech technology, [putting] it through speech recognition first. We have our own editing platform where somebody corrects the draft.” At the same time, “We take the stance that you will never be able to remove the human.”
Companies with roots in live and broadcast captioning, such as LNS Captioning, use only human transcriptionists. For clients needing a fast, same-day turnaround, the company offers “as live” transcriptions, where a trained captioner treats a video like a live broadcast and then adjusts the timing of the transcription to make sure it synchronizes correctly. If an event is also available for live viewing or listening, a transcriptionist may be able to tune in to the stream to further expedite turnaround.
Formats and Platforms
According to Miller, “With most web video, 9 times out of 10 when you add captions you’re adding a separate caption file that gets associated with the video file.” At its most basic, a client sends video files to the vendor, which sends back a caption file. However, a vendor can also take over most or all of the publishing workflow too.
To make this process work, Erler says it’s important to know as much as possible about your video files. “What do you have right now? What format is your content in?” The next questions are, “Where is it going? Where do you intend to publish to?”
“We want to understand what (your) publishing process is,” says Miller. It’s important for a vendor to know if you’re using an online video platform, CDN, or a site like YouTube, and which one it is. This will allow the company to pick the right integration strategy and workflow.
A third-party video platform isn’t required; vendors should be able to work with self-hosted videos too. “Even if it’s just a QuickTime file,” Miller notes, “there are ways to add a caption track to those videos.” Both 3Play and Automatic Sync offer their own APIs for custom integration with a client’s own systems.
However, there can be advantages to using a video platform. “A lot of platforms out there offer APIs that we can integrate with,” says Erler. “Platforms like Kaltura or Brightcove allow you to go into your platform and check off the jobs you want captioned.”
Miller says 3Play’s web-based system also offers integration with video platforms. “It enables the ability to send files back and forth (for captioning) with just a couple of clicks.”
Having a sense of the quantity and frequency of video to be captioned will help the vendor tailor services. Quite simply, “We can turn it around really fast if we know you’re coming,” Studenmund advises.
“Is this a one-off or an ongoing need?” Erler says. “If it’s the latter, we can automate the workflow and take away some overhead.”
Names, Jargon, and Acronyms
It’s also good to tell a captioning vendor if your videos cover specialized or technical topics. “You can get specially trained transcribers in some fields,” Erler notes. “The medical field generates a lot of specifically trained transcribers, and we have a lot of [them] on staff. But there’s no training course for transcribers in Physics.”
“It’s good to have lists of names, technical jargon, acronyms,” Studenmund says. “If we can repopulate a dictionary, accuracy goes way up.”
Miller says that 3Play “even enable(s) people to upload a PDF of vocabulary with their video files.”
Accurate and clear English captions make it much easier to obtain translations into other languages. Some captioning companies may have experienced translators in staff, while many others outsource that service, but then ensure the results can be integrated as captions or subtitles.
Clear Voices for Clear Captions
It should be obvious: To get accurate captions, audio quality matters. “Audio comes to us in all kinds of conditions,” Erler says. It pays off in speed and accuracy to pay attention to microphone placement and take steps to reduce ambient noise.
Of course, that will only improve the overall quality of your video too. “If our transcription experts can’t make it out, your listeners won’t either,” he observes.
Also, if your videos have music or sound effects added in post, Erler suggests making a “pre-mix soundtrack” containing just the primary voice tracks to be captioned.
In the end, Miller says, “The added cost (of captioning) is tiny, compared to the cost of production.” He says the big question asks, “How do you make it as easy as possible while maintaining quality?” You want to select a company that can meet you where you are and make your captioning workflow as simple as possible with the accuracy you need.
Thinking about the future, Erler observes, “Chances are you’re going to end up switching platforms or finding new needs. You want a vendor to be able to move with you.” His advice is, “Make sure your vendor has a wide support for output types, [because] there are dozens if not hundreds of (caption) file formats.”
Sidebar: The Terms of Captioning
Online vs. Offline: “Online” captions are produced live, in real time, by specially trained transcriptions. If you’ve ever watched the captions for a live newscast or basketball game on television, you’ve seen “online” captions in action. Speed is the priority, so they tend to be less accurate than “offline” captions.
“Offline” captions are created in post-production, where transcriptionists create and align them in an appropriate caption file for an on-demand video. With this approach accuracy is the priority, and it’s easier to achieve.
Closed, Open, and Burned In: “Closed” captions are well-known because they are common in broadcast television. They’re called “closed” because the viewer can choose to turn them on or off. This is true for online closed captions as well.
“Open” captions are always displayed by default. However, according to Carol Studenmund, president of LNS Captioning, “Open captions don’t work in all devices.” So if open captions are required she instead recommends using “burned in” subtitles, which are superimposed on the video in post-production.
Sidebar: For the Do-It-Yourselfer
While this article focuses on choosing a captioning vendor, there are resources for those who want to take the DIY route. The Described and Captioned Media Program (DCMP) is funded by the U.S. Department of Education in order to ensure equal access to education for students who are blind, visually impaired, deaf, hard of hearing, or deaf-blind. The DCMP has drafted clear and explicit guidelines and instructions for creating good transcripts and captions, all available at its website (dcmp.org).
This article appears in the 2014 Streaming Media Sourcebook as “Choosing a Captioning Service.”
Mute image via Shutterstock.