The Evolution and Future of Deep Tech

0

Deep tech is a ubiquitous buzzword in the technology industry. Also known as ‘hard tech', the term refers to cutting-edge technologies based on innovative engineering and/or scientific advances. Automatic speech recognition (ASR) is just one example of deep tech. Falling within the realm of Artificial Intelligence (AI), it describes the process of translating human spoken word to text, and the industry has boomed in popularity due to the COVID-19 drive toward voice-first communications. Cambridge-born deep learning company Speechmatics is a pioneer in this field. The ASR software developer recently launched its speech-to-text service on Microsoft Azure Marketplace. In a bid to learn more about the deep tech AI industry, we sat down with Katy Wigdahl, CEO at Speechmatics. 

Katy joined Speechmatics in 2019, originally hired to be the company's Chief Financial Officer due to her extensive 25+ years of senior experience in finance and accounting for the likes of Unilever and Verint Transversal. However, it wasn't long before she skyrocketed up the career ladder and moved into the position she holds today.

Welcome to EM360 Katy! Thank you for joining us today. Can you tell us about how you came to head up Speechmatics and the steps you take to ensure the company's mission to ‘Innovate with Voice' is consistently met?

Thanks for having me. After seven months in my role as CFO of Speechmatics, I became CEO. I was presented with a new challenge: to lead a talented team with a fantastic product during an unprecedented time. Speechmatics has developed a speech recognition engine which understands over 30 languages regardless of dialects or accents and in complete context. At Speechmatics, we have a proud heritage of research and innovation, including 14 PhDs in machine learning and speech and language technology, to ensure a great product is underpinned by cutting edge technology. My job is to lead the team to build on this success. 

My normal process of immersion into a new role – one tried and tested with posts such as Finance Director of Transversal or Senior Commercial Manager at Unilever – would be time spent with colleagues. Of course, I did already know my colleagues, but our conversations were through a CFO lens and with different responsibilities. In a fast-growth tech workplace, change, collaboration, new goals, ideas, achievements, and different viewpoints, can all drive the ability to do good work and the constant dynamism keeps the energy flowing. So, throughout the pandemic, my job is to ensure teams across departments are connected and morale/energy levels remain high, especially as people are working in silos.

It seems to have worked. In the last year, we have signed our biggest ever deals and our future market opportunity looks even more promising. The pace of change is accelerating, and we have optimised the business to be focused on agility to take advantage of it. We have restructured internally to elevate the brilliant, ambitious minds within the business and are doubling down on technology and research to maintain our technical leadership position.

How do you define ‘deep tech' and where does ASR fit into the conversation?

Deep tech is a set of cutting-edge and disruptive technologies based on scientific and engineering discoveries which are set to shape and define the future. Automatic Speech Recognition (ASR) uses machine learning and deep learning technologies to identify and process speech to text. However, one of the biggest challenges for ASR solutions is being able to understand every voice, regardless of dialect, tone, pitch or speaker. There have been instances where users have had to adjust their speech and language, either by slowing down their speech or over-enunciating to be recognised by ASR engines or voice assistants.  
 
Having said this, speech recognition has advanced hugely in recent years, resulting in increased accuracy which goes beyond word error rate but to understand context and meaning. In particular, modern neural network architectures and greater computing power have steered advancements in the field.  
 
Until recently, extensive bespoke work was required to support just a single language. But at Speechmatics we've used our extensive expertise and knowledge of machine learning and neural networks to create a breakthrough machine learning framework for training speech recognition language models which is both cost-effective and scalable.

Speechmatics has been at the forefront of language, accent, and dialect gap closure in speech recognition engines. Could you give us some insight into this D&I in deep tech advocacy? Why does inclusivity in voice hold so much importance and value at Speechmatics?

We've all heard stories about people being misunderstood by their personal voice assistants or closed captioning getting something awkwardly wrong. This has been exacerbated by the pandemic's audio boom - seen with the rise of social media apps like TikTok and ClubHouse - which means being able to have your voice understood regardless of accent, tone, pitch or dialect becomes critical.  
 
With consumer online consumption habits changing with ‘audio first' social media such as Clubhouse, as well as other sites such as Facebook, Twitter and LinkedIn offering their own audio features, understanding speech and language becomes important from both content moderation (being able to flag harmful content through keyword analysis, for example) and inclusivity perspectives (offering captions for hard of hearing, as an example here).  
 
In May this year, the draft of the Online Safety Bill opened more questions around the additional content moderation challenges stemming from being able to monitor online conversations. Moderating social media has always been tricky; adding to the mix a ‘new' form of complex, unstructured data such as speech and trying to monitor for harmful, discriminatory, or negative content becomes a massive undertaking when you have a multitude of accents, dialects, languages and every other speech marker inbetween to contend with. When you take all that into account, being able to police these large volumes of content at scale is incredibly difficult and that's why it is so important to have the right technology in place to understand everyone. 
 
Not only this, but things like live streaming have skyrocketed far beyond what was originally anticipated - from virtual industry events to watching new films and TV series. Despite this acceleration, accessibility for this type of content hasn't kept pace. There are FCC regulations governing the availability of closed captions, but there are mounting concerns about the accessibility of video content for people with disabilities, such as hearing loss. This makes it critical that ASR engines are able to understand speakers of various speech patterns. In a world shifting to audio-first platforms, we no longer have any room to be misunderstood.

What role do you think ASR has played in the global explosion of voice, speech and language technology following COVID-19 and how have you had to innovate and think on your feet to keep Speechmatics products ‘on trend' and relevant to the times?

We have seen a massive boom in the likes of Clubhouse, TikTok, Zoom and other platforms like Twitter launching audio functionality. However, there has also been a big tech market shift which was highlighted by massive deals like the Nuance acquisition by Microsoft. This signalled how lucrative this space is especially as the speech and voice recognition market will be worth $27 billion by 2025.  
 
This valuation isn't surprising, given that COVID-19 has accelerated the amount of audio and video data. We just have to look at the amount of TikToks to Instagram stories to voice notes, Zoom calls and call centre traffic to see the uptick in audio data. This means businesses are sitting on a treasure trove of insights and need speech and language technology engines to adequately draw these out from the data sets.  
 
According to our latest trends report, 65% of businesses say they are considering voice in their five-year strategy, which indicates the growing appetite for this technology. Having said this, there are still concerns around this technology that need to be addressed. According to Speechmatics Trends and Predictions Report 2021, respondents outlined some risks for voice technology in the coming years. The top concerns are linked to data privacy and compliance, poor experience and the technology falling short of accuracy expectations.  
 
The audio data boom means ASR engines need to be able to capture high volumes of audio data for a wide range of applications from content moderation and regulatory compliance to closed captioning and subtitles and customer experience.  
 
Speechmatics is already delivering leading levels of accuracy in speech recognition. This year we were integrated into Microsoft Azure Marketplace in March 2021. This is a testament to our standard of innovation compared to the larger players in this space, which can accurately understand a multitude of speech and language nuances including pitch, tone, dialect and accent.  
 
Over the last 15 years, we have developed bleeding-edge machine learning and deep learning capabilities to deliver the fastest and most accurate transcription against competitors such as Google and Microsoft. We frequently test and benchmark against other providers to ensure we're delivering the best speech and language recognition in the market. By defying the industry convention of developing multiple specialist language packs per language, Speechmatics is on a journey to make all languages truly global.

You've been hailed as one of the most inspirational women leaders in deep tech. What has your experience been like as a CEO in an industry where, as you quote in an interview with WeAreTechWomen, ‘there aren't many women'? Have you got any advice for young women who wish to enter deep tech but are skeptical because of this lack of representation?

Have self-belief, have intellectual curiosity, willingness to learn; don't take things personally, realise the strength of strong partnerships, and that there are different ways of leading that can have strong results. 

Mostly, don't be so hard on yourself! Focus on your own determination to reach your goals, stand up for yourself and take whatever opportunity comes your way. It is also critical to have the ability to influence and impact decisions by understanding and playing to the strengths of the diversity of people you will work with.

Lastly, what key lessons have you learnt over the years in working in deep tech and where do you hope to see the space heading?

Have flexibility in the ways different people work. You will meet many different people and everyone works differently. Take, for example, the difference between working with people with an academic-bias versus those with a commercial-bias and learn how to ensure they are working towards a common goal while understanding the importance of product delivery and research simultaneously.

When it comes to tackling problems you first need to understand which is the right problem to tackle at the right time, and then you need to get the right people actually take on the task of tackling it. This helps with better business efficiency through cross-collaborative work. 

When it comes to myself, I've learnt that you need to constantly reinvent yourself, adapt, move fast and try not to break things while being open-minded. It is important to have an ethical standpoint, and when working in technology, being able to deploy tech that makes a positive impact in the industry. 

This leads me onto my last lesson: deep technology is not something to be frightened of. We are unlocking human potential through deep tech which makes it an additive. At Speechmatics we want to remove the fear factor of deep tech and instead put structure into data which is currently unstructured.

Interested in learning more about deep tech? Subscribe to the YouTube Channel for more educational content in enterprise technology.