New AI audio technology that solves the cocktail party problem has been used as evidence in a court case for the first time leading to two convictions.
Deciphering speech in audio recordings can be vitally important when used as audio evidence in court cases and criminal proceedings.
After ten years of working on the so called cocktail party problem Wave Sciences has finally come up with an AI powered solution.
Their patented technology, known as Spatial Release from Masking (SRM), uses the physics of sound propagation to isolate a specific speaker's voice from a noisy environment.
In this groundbreaking first real-world forensic use, SRM technology has been used in a US case. The evidence it provided proved central to the convictions.
The FBI used the technology to convict two hitmen arrested for . The Bureau sought to prove that the hitmen were hired by a family in a child custody dispute. The government body arranged to trick the family into believing that they were being blackmailed for their involvement - and then observed their reactions.
Texts and phone calls were easy for the FBI to access, however recordings of in person meetings were more difficult. The courts choice to authorize the use of the Wave Sciences SRM technology meant that the recorded audio went from being inadmissible to vital evidence.
What is the Cocktail Party Problem?
The cocktail party problem in machine learning is the challenge of isolating a specific speaker's voice from a noisy environment.
Humans excel at this task, effortlessly tuning into a desired conversation amidst the cacophony. How do our auditory systems compare to AI algorithms in tackling this complex problem?
Many factors can contribute to the cocktail party problem. Multiple speakers can be talking simultaneously, making it difficult to distinguish individual voices. Unique vocal characteristics, which can make it challenging to generalize a model to new speakers. Environmental sounds like music, laughter, or other ambient noise also interfere with detecting the specific voices.
Human ears are typically able to zone into their conversation of choice amid the chaos, however, digital recording, AI and machine learning models have typically struggled. Humans excel at understanding context, using prior knowledge or context to interpret speech cues that fills in missing information and can adapt to new listening environments quickly, while AI models often rely on statistical patterns and require retraining or fine-tuning.
How does AI Solve the Cocktail Party Problem?
Wave Sciences AI solution to the cocktail problem uses a microphone array, this is a set of multiple microphones operating simultaneously, similar to the human ear, to capture sound from multiple angles. Their tool first analyzes how sound bounces around a room before reaching the microphone or ear. It then figures out where each sound comes from and is able to suppress any other sound that doesn't come from the same source.
Using a physics-based model, Wave Sciences applies a spatial filter that suppresses sounds from other directions, effectively isolating the desired speaker's voice.
The complex system is able to adapt to changing listening conditions, such as a speaker moving or the introduction of new noise sources.
There are many potentially impactful applications of the SRM technology:
Hearing Aids: SRM can improve speech intelligibility for people with hearing loss, especially in noisy environments.Teleconferencing: The technology can enhance audio quality in meetings with multiple participants, reducing background noise and improving clarity.Voice Assistants: SRM can also improve the accuracy of voice recognition in noisy settings, making voice-controlled devices more reliable.Surveillance: SRM can be used to isolate specific conversations from ambient noise in surveillance recordings.