02 · Audio detector

Audio

Listens to spoken language, sound effects, and waveform features in audio and video tracks. Catches what visual classifiers miss when an audio track tells a different story than what is on screen.

Channel

Audio waveforms and embedded speech

Position

02 of 09

What it processes

Listens to spoken language, sound effects, and waveform features in audio and video tracks. Catches what visual classifiers miss when an audio track tells a different story than what is on screen.

Adversarial patterns it is tuned to catch

Voice content unsafe for the listener
Synthesized voice clones used to impersonate trusted speakers
Audio steganography (information hidden inside benign-sounding audio)
Cross-modal mismatch where audio and video disagree

Contribution to the ensemble verdict

Speech-to-text and waveform features both feed forward; the cross-modal detector reads both.

Per-detector outputs are not a final verdict. The ensemble layer reads all nine and decides; the per-detector contribution is preserved in the evidence trace so the verdict remains auditable.

All nine detectors See the pipeline