WCAG 1.2.2 — Captions (Prerecorded)

What this requires

Every prerecorded synchronised media file (video with audio) must carry captions. Captions are not just dialogue: they include speaker identification, relevant sound effects ("[door slams]", "[laughter]"), music cues, and any other audio information needed to follow the content. They must be synchronised with the video — appearing at the moment the corresponding sound occurs and disappearing when it ends. Auto-generated machine captions count only when they are reviewed and corrected.

How AI coding tools fail this

When asked to "embed this video" or "add a hero video", AI assistants generate <video src="..."> with no <track> element and no caption file. The video plays; deaf or hard-of-hearing users get nothing. The assistant treats captions as an editorial concern that lives outside the codebase.

The second pattern: when a <track> element is generated, it points at a path that doesn't exist (/captions.vtt) and is never produced. The markup looks compliant; the file 404s.

The third: using YouTube's auto-generated captions and never reviewing them. Auto-captions miss speaker changes, mistranscribe technical vocabulary, and routinely produce sentences that are grammatical nonsense. The criterion requires equivalent information, and raw auto-captions usually don't deliver it.

Failing examples

Video with no caption track at all:

<video controls>
  <source src="/launch-video.mp4" type="video/mp4" />
</video>

A <track> declared but pointing at a missing or empty file:

<video controls>
  <source src="/launch-video.mp4" type="video/mp4" />
  <track kind="captions" src="/launch-video.vtt" srcLang="en" />
</video>

Passing examples

A native <video> with a reviewed caption track marked as the default:

<video controls>
  <source src="/launch-video.mp4" type="video/mp4" />
  <track
    kind="captions"
    src="/launch-video.en.vtt"
    srcLang="en"
    label="English"
    default
  />
</video>

The .vtt file lives at the declared path and includes speaker identification and non-speech audio:

WEBVTT

00:00:02.000 --> 00:00:05.500
[upbeat music]

00:00:06.000 --> 00:00:09.000
SARAH: We started Jeikin because accessibility...

Edge cases

Open captions (burned into the video) satisfy the criterion but can't be turned off, which can degrade the experience for hearing users who don't need them. Closed captions via <track> are preferred.
Music videos and ambient content still need captions if the audio carries meaning. A lyrical music video without lyric captions fails. A purely instrumental ambient loop with no informational audio doesn't need captions but probably needs an alternative (1.2.1).
Speaker identification matters when more than one person speaks. "Yes." with no speaker is useless; "SARAH: Yes." or "MARK: Yes." is what's needed.
Live captions are a separate criterion (1.2.4).
Multiple language tracks require one <track> per language with appropriate srcLang values.

How Jeikin handles this

The CLI scanner flags <video> elements that lack any <track kind="captions"> and maps each finding to WCAG 1.2.2. Whether the caption file is accurate is a manual review — the dashboard records the reviewer's confirmation that the captions reflect the audio, not a machine claim that they do.