WCAG 1.2.1 — Audio-only and Video-only (Prerecorded)
Prerecorded audio-only and video-only media must carry an alternative that conveys the same information. A podcast without a transcript is invisible to deaf users; a silent product demo without a description is invisible to blind users.
What this requires
For prerecorded audio-only content (a podcast, an interview, a voice memo), a transcript or equivalent text alternative must be available. For prerecorded video-only content with no audio track (a screen recording, a silent product demo, a GIF that conveys information), either a text alternative or an audio track that describes the visual content must be available. The alternative has to convey the same information — a one-line summary of a 40-minute podcast is not an alternative.
How AI coding tools fail this
When asked to "embed this podcast" or "add a video player", AI assistants generate the player markup and stop. The transcript is treated as out-of-scope content rather than as a required equivalent. The resulting page renders correctly for hearing or sighted users and contains nothing at all for the rest.
The second pattern: silent screen recordings or product demo GIFs
embedded as <video autoplay muted loop> with no caption, no
description, and no surrounding text explaining what happens on screen.
The visual demonstrates the product; the page tells the screen-reader
user nothing.
The third: generated <audio> and <video> elements with controls
but no <track> element pointing at a transcript file. The player is
accessible enough to operate; the content is not.
Edge cases
- A transcript is not the same as captions. Captions belong on synchronised media and cover 1.2.2. A transcript is a separate document for audio-only or video-only material.
- Decorative audio or video — a background ambient loop with no informational content — doesn't need an alternative, but it must not autoplay (see 1.4.2).
- Auto-generated transcripts from a speech-to-text service satisfy the criterion only if they are reviewed and corrected. Raw machine transcripts of multi-speaker conversation are usually wrong enough to fail the equivalent-information requirement.
- Animated GIFs that convey information (a "how it works" animation) are video-only content and need a text equivalent. A decorative loop does not.
- Sign-language video without an accompanying audio or text track is video-only for non-signers and needs an alternative.
How Jeikin handles this
This criterion is manual — no automated tool can verify that a
transcript matches the audio. Jeikin tracks it as a guided review item:
the dashboard lists every <audio> and <video> element the scanner
finds and asks the reviewer to confirm an equivalent alternative is in
place. The evidence is a timestamped record that the check happened, not
an automated claim of compliance.