Why Voice and Audio Have More to Say Than the Keyboard

This article was contributed by Greg Doll, VP & GM of Mobile Consumer Electronics, Knowles Corporation.

While remote work was already in place at the beginning of 2020, the effects of the past year have created the beginnings of a “new normal” for workplace interactions.

Whether a company has opted to keep teams working remotely, in-office, or a combination of both, the ability to communicate and collaborate effectively is essential to ensuring workforce productivity and business continuity.

The IT decision-makers for enterprise systems have the task of assessing and improving upon the capabilities of today’s collaboration technologies for enabling the future productivity of teams.

We have all had experiences when team members were ready to roll-up their sleeves on a call to solve that critical problem, or even discuss the next great idea, but the meeting falters from the start due to issues with the conference audio setup.

Comments like, ‘We cannot hear you’ or, ‘Can you please come closer to the microphone?’ will no longer be tenable in the new workplace. Collaborative enterprise audio solutions of the future must support communication experiences that are flexible and frictionless to maximize the potential of the workforce.

Innovations in Voice and Microphone Technology for the Future of Work

Bulky, immobile, and complex are hallmarks of typical enterprise conference systems. These serve a purpose of providing quality voice pick-up in large rooms filled with 5 to 50 people by integrating high-performance mics arrays together with software algorithms like audio beamforming.

The changing profile of the work environment, however, demands rapid and reliable collaboration in a variety of physical settings. Traditional conference rooms are being augmented with a hybrid workforce that is participating from their home, huddled in small groups in breakout rooms, or green spaces in different parts of the campus.

Through a mix of the latest generation hardware and algorithms as well as collaboration tools like Microsoft Teams, Zoom, etc., IT leaders are attempting to erase the friction experienced by teams when they need to collaborate with one or more of their colleagues. Productivity tools like instant messenger, direct calling systems, and other conferencing applications are all beginning to converge as well. As a result, the once-humble laptop is now transforming into a command centre that users count on.

Laptops are lightweight, with a long battery life, big screen, loud audio speakers, and robust connectivity – they naturally fit the bill to enable seamless and rapid communication in these new environments. Audio processing engines in these machines are now able to apply advanced techniques like artificial intelligence (AI) based noise suppression to enable improved audibility in many typical settings. These machines now also use multiple high-performance microphones to ensure that the voice pick-up is precise and zoomed in to the speaker while removing the noise from other directions.

Since voice communication is central to collaboration and the microphone is the first component of the audio signal chain, its specification is critical to the overall performance of the system. Performance of the microphone is measured on features like Signal-to-Noise ratio (SNR), which defines how cleanly a microphone picks up sounds. As a rule of thumb, the higher the SNR of the microphone the less noisy it is, and the farther the person speaking can be from the device, while ensuring quality audio capture. Modern MEMS (Micro-Electro-Mechanical System) microphones that feature sufficiently high SNR (66dB and above) are excellent choices for improving the enterprise audio and voice experiences to any device.

MEMS microphones are widely used in mobile phones and smart speakers as well as other high volume consumer electronic applications, often in multi-mic array configurations to help capture far-field voice from across the room. Most recent laptops generations are being equipped with 3 or 4 microphones to raise the audio and voice pick-up experience as well. MEMS microphones are also very small, less than a pencil tip in size, enabling tight integration for edgeless computing displays on laptops and other audio-conferencing platforms in the enterprise. The combination of performance and small size translates into better meeting experiences, including for small team environments ‘dialing in’ from around one laptop.

The benefits of MEMS multi-mic array configurations also enable selective sound pick-up for applications, referred to as beamforming, due to their high consistency and stability. Selective audio capture has become critical to the new workforce for making important presentations, sales calls, and product pitches from home while zoning out the dog barking in the background.

Future Innovations

As we look to the future, voice solutions for the workplace must be one step ahead. Intelligent audio that can discern between voices, sift through background noise, and learn from its users to automatically enhance the experience is an important evolutionary next step for the future. State-of-the-art sound source localization and tracking algorithms are already deployed in enterprise solutions and will continue to get better.

These systems are also becoming ‘context aware’ with the ability to detect trigger words, process gestures, and aggregate multiple sensor inputs to offer user experiences that just work right from the start. Traditionally, enabling such advanced features required hardware that could run computationally intensive audio and machine learning (ML) algorithms. With advancements in technology, these next-generation features will increasingly find their way in consumer devices thanks to efficient audio and ML capable edge processors.

Conclusion

Enterprise audio is a critical piece of the bigger workplace picture as we look ahead to what is next. Remote and hybrid workspaces are here to stay. The challenge now lies in supporting not only consumer tech at home but also in the enterprise. Collaboration and connection are what many workers say they miss most from in-person interactions. Computing platforms with reliable and high-quality audio input capability, seamless connectivity, and context awareness are an important combination of that bridge supporting the future of work.