02 · Audio detector
Audio
Listens to spoken language, sound effects, and waveform features in audio and video tracks. Catches what visual classifiers miss when an audio track tells a different story than what is on screen.
What it processes
Listens to spoken language, sound effects, and waveform features in audio and video tracks. Catches what visual classifiers miss when an audio track tells a different story than what is on screen.
Adversarial patterns it is tuned to catch
- Voice content unsafe for the listener
- Synthesized voice clones used to impersonate trusted speakers
- Audio steganography (information hidden inside benign-sounding audio)
- Cross-modal mismatch where audio and video disagree
Contribution to the ensemble verdict
Speech-to-text and waveform features both feed forward; the cross-modal detector reads both.
Per-detector outputs are not a final verdict. The ensemble layer reads all nine and decides; the per-detector contribution is preserved in the evidence trace so the verdict remains auditable.