02 · Audio detector

Audio

Listens to spoken language, sound effects, and waveform features in audio and video tracks. Catches what visual classifiers miss when an audio track tells a different story than what is on screen.

Channel
Audio waveforms and embedded speech
Position
02 of 09

What it processes

Listens to spoken language, sound effects, and waveform features in audio and video tracks. Catches what visual classifiers miss when an audio track tells a different story than what is on screen.

Adversarial patterns it is tuned to catch

  • Voice content unsafe for the listener
  • Synthesized voice clones used to impersonate trusted speakers
  • Audio steganography (information hidden inside benign-sounding audio)
  • Cross-modal mismatch where audio and video disagree

Contribution to the ensemble verdict

Speech-to-text and waveform features both feed forward; the cross-modal detector reads both.

Per-detector outputs are not a final verdict. The ensemble layer reads all nine and decides; the per-detector contribution is preserved in the evidence trace so the verdict remains auditable.