Researchers have developed deep-learning algorithms that let users pick which sounds filter through their headphones in real time. Either through voice commands or a smartphone app, headphone wearers can select which sounds they want to include from 20 classes, such as sirens, baby cries, speech, vacuum cleaners and bird chirps.
Most anyone who’s used noise-canceling headphones knows that hearing the right noise at the right time can be vital. Someone might want to erase car horns when working indoors, but not when walking along busy streets. Yet people can’t choose what sounds their headphones cancel.
Now, a team led by researchers at the University of Washington has developed deep-learning algorithms that let users pick which sounds filter through their headphones in real time. The team is calling the system “semantic hearing.” Headphones stream captured audio to a connected smartphone, which cancels all environmental sounds. Either through voice commands or a smartphone app, headphone wearers can select which sounds they want to include from 20 classes, such as sirens, baby cries, speech, vacuum cleaners and bird chirps. Only the selected sounds will be played through the headphones.
The team presented its findings Nov. 1 at UIST ’23 in San Francisco. In the future, the researchers plan to release a commercial version of the system.
“Understanding what a bird sounds like and extracting it from all other sounds in an environment requires real-time intelligence that today’s noise canceling headphones haven’t achieved,” said senior author Shyam Gollakota, a UW professor in the Paul G. Allen School of Computer Science & Engineering. “The challenge is that the sounds headphone wearers hear need to sync with their visual senses. You can’t be hearing someone’s voice two seconds after they talk to you. This means the neural algorithms must process sounds in under a hundredth of a second.”
Because of this time crunch, the semantic hearing system must process sounds on a device such as a connected smartphone, instead of on more robust cloud servers. Additionally, because sounds from different directions arrive in people’s ears at different times, the system must preserve these delays and other spatial cues so people can still meaningfully perceive sounds in their environment.
Tested in environments such as offices, streets and parks, the system was able to extract sirens, bird chirps, alarms and other target sounds, while removing all other real-world noise. When 22 participants rated the system’s audio output for the target sound, they said that on average the quality improved compared to the original recording.
In some cases, the system struggled to distinguish between sounds that share many properties, such as vocal music and human speech. The researchers note that training the models on more real-world data might improve these outcomes.
Additional co-authors on the paper were Bandhav Veluri and Malek Itani, both UW doctoral students in the Allen School; Justin Chan, who completed this research as a doctoral student in the Allen School and is now at Carnegie Mellon University; and Takuya Yoshioka, director of research at AssemblyAI.
Rewrite the above as a science news report preferab;y in bbc style
New AI Noise-Canceling Headphone Technology Allows Users to Choose Sounds
Researchers at the University of Washington have unveiled a groundbreaking technology that enables users to select which sounds they want to hear through their noise-canceling headphones. Using deep-learning algorithms, the system, known as “semantic hearing,” allows wearers to filter out unwanted noise and focus on specific sounds of their choosing.
Traditionally, noise-canceling headphones have been limited in their ability to customize what sounds are filtered. However, the new technology empowers users to decide which sounds they want to include in their audio experience. By utilizing voice commands or a smartphone app, wearers can select from 20 different classes of sounds, including sirens, baby cries, speech, vacuum cleaners, and bird chirps. Only the chosen sounds will be played through the headphones, providing a personalized listening experience.
During a presentation at the UIST ’23 conference in San Francisco, the team demonstrated the capabilities of the semantic hearing system. By streaming captured audio to a connected smartphone, the technology cancels out environmental sounds. The deep-learning algorithms process the audio in real time, extracting the selected sounds while preserving spatial cues and delays to ensure a meaningful perception of the surrounding environment.
The researchers tested the system in various settings, such as offices, streets, and parks. They successfully extracted target sounds, such as sirens, bird chirps, and alarms, while eliminating all other background noise. A group of 22 participants who evaluated the system’s audio output reported an overall improvement in sound quality compared to the original recordings.
However, the technology encountered challenges in distinguishing between sounds that share similar properties, like vocal music and human speech. The researchers acknowledge that further training on real-world data could enhance these outcomes.
Senior author Shyam Gollakota, a professor at the University of Washington, emphasized the importance of real-time intelligence in achieving this level of sound customization. He explained that the neural algorithms must process sounds within a fraction of a second to ensure synchronization with a user’s visual senses.