WCAG 1.2.2 — Captions (Prerecorded)
Every prerecorded video with an audio track needs synchronised captions. The spoken dialogue, the relevant sound effects, the speaker identification — all of it has to be available as on-screen text, in sync with the video.
What this requires
Every prerecorded synchronised media file (video with audio) must carry captions. Captions are not just dialogue: they include speaker identification, relevant sound effects ("[door slams]", "[laughter]"), music cues, and any other audio information needed to follow the content. They must be synchronised with the video — appearing at the moment the corresponding sound occurs and disappearing when it ends. Auto-generated machine captions count only when they are reviewed and corrected.
How AI coding tools fail this
When asked to "embed this video" or "add a hero video", AI assistants
generate <video src="..."> with no <track> element and no caption
file. The video plays; deaf or hard-of-hearing users get nothing. The
assistant treats captions as an editorial concern that lives outside
the codebase.
The second pattern: when a <track> element is generated, it points at
a path that doesn't exist (/captions.vtt) and is never produced. The
markup looks compliant; the file 404s.
The third: using YouTube's auto-generated captions and never reviewing them. Auto-captions miss speaker changes, mistranscribe technical vocabulary, and routinely produce sentences that are grammatical nonsense. The criterion requires equivalent information, and raw auto-captions usually don't deliver it.
Edge cases
- Open captions (burned into the video) satisfy the criterion but
can't be turned off, which can degrade the experience for hearing
users who don't need them. Closed captions via
<track>are preferred. - Music videos and ambient content still need captions if the audio carries meaning. A lyrical music video without lyric captions fails. A purely instrumental ambient loop with no informational audio doesn't need captions but probably needs an alternative (1.2.1).
- Speaker identification matters when more than one person speaks. "Yes." with no speaker is useless; "SARAH: Yes." or "MARK: Yes." is what's needed.
- Live captions are a separate criterion (1.2.4).
- Multiple language tracks require one
<track>per language with appropriatesrcLangvalues.
How Jeikin handles this
The CLI scanner flags <video> elements that lack any <track kind="captions"> and maps each finding to WCAG 1.2.2. Whether the
caption file is accurate is a manual review — the dashboard records the
reviewer's confirmation that the captions reflect the audio, not a
machine claim that they do.