Confidential Speech Recognition

AI (Artificial Intelligence) Speech platforms such as Speech-to-text (STT) and text-to-speech (TTS) are rising as a forefront of technology enablers, making an impact on businesses across the sectors.

STT and TTS technologies have made it easy to automate data gathering, data extraction, data analysis, and data analytics with a great pace and best efficiency. STT and TTS technologies have become pivots in healthcare, banking, finance, insurance, automotive manufacturing, tours & travels, e-learning, retail and telecommunication industries.

In the e-learning sector specifically, AI course creation tools are revolutionizing how organizations develop training content, making it faster and more accessible to create engaging educational experiences.

By 2026, the speech-to-text market is projected to grow at 19.2% CAGR with a market size of USD $5.4 billion, and the text-to-speech market is projected to grow at 14.6% CAGR with a market size of USD $5.0 billion.

These speech platforms have found increased adoption in the areas of virtual assistants and voice bots and also some effective implementations in the healthcare industry. One such example is the adoption of speech recognition for more efficient clinical documentation which has also considerably reduced the burden of clinicians.

The emerging and resulting Personal Health Information (PHI) and Electronic Health Record (EHR) in these solutions are very much sensitive in nature and should be protected and secured under privacy rules and regulations like HIPAA. Data breaches has both direct and indirect impacts, affecting organizations, clients, businesses and all stakeholders.

The cost incurred in data breaches is huge as organizations must pay hefty fines for each data breach plus downsizing of brand name & shares comes as a complimentary. Having said this, industries are still experiencing an increase in data breaches year over year. The following data states the intensity of healthcare data breaches -

HIPAA Reported Healthcare Data Breaches.

Year	Number of Data Breaches	Exposed Records in Millions	Cost Per Record
2010	199	5.530	$294
2011	200	13.150	$240
2012	217	2.800	$233
2013	278	6.950	$296
2014	314	17.450	$359
2015	269	113.270	$363
2016	327	16.400	$355
2017	359	5.100	$380
2018	365	33.200	$408
2019	505	41.200	$429
Total	3033	255.18

Source: National Center for Biotechnology Information, U.S

Reported Healthcare Data Breaches.

Year	Number of Data Breaches	Individuals Affected in Millions
2010	207	5.400
2011	236	11.410
2012	222	3.270
2013	294	8.170
2014	277	21.340
2015	289	110.700
2016	334	14.570
2017	385	5.740
2018	−	−
2019	−	−
Total	2244	108.80

Source: National Center for Biotechnology Information, U.S

Following graph indicates that the healthcare industry is the preferred target of attackers because of high commercial value of EHR's.

data breach stats in healthcare industry

Source: National Center for Biotechnology Information, U.S

The same is the story with other industries like banking, finance, insurance, etc. Now it is on organizations to protect and secure data in order to avoid any type of attacks and theft of sensitive data.

As a standard, industries are following the below practices to protect their data and business in turn –

Privacy
Access control policies
Security
Encryption of data stores
Encrypted transit
Authorization and authentication
Standards and Regulations
Govt. rules and regulations

That does help to some extent. But do you know there is still a blind spot where your data can be breached or stolen? Any guess?

Encryption does take care of the security of data when data is at rest and while data is in transit. What about data in use i.e., when the data is decrypted and is on RAM for computation? Is it safe? Did you ever think about security of data when it is in use?

Let me answer these questions - Data in use is not safe and is vulnerable to theft and attacks. Cross site injections, memory scrapping malwares and more can easily expose data that is in use.

Also, the most common of all which is an insider attack or your own trusted administrator misusing the privileges can access the data in use.

In the era of "Cloud" this is getting even worse as you must have trust and faith in the cloud vendors, their infrastructure, and administrators. This leaves data encryption cycle incomplete.

incomplete data encryption cycle

Then how to save your business from this kind of security breach? How to employ zero-trust solutions? How to adhere to strict regulations?

Here comes Fortanix for your rescue with its Confidential AI offering. Fortanix offers an end-to-end data security solution including the runtime data protection which is the focus of this blog.

Fortanix Confidential Computing technology enables you to secure your data in use by running your application within a secure enclave.

Secure enclaves ensure that your application runs in a trusted execution environment by encrypting data when it is in use. The data, that attackers or malicious administrators can access, is encrypted data which is of no use.

Awesome, now you have secured the missing part of your data journey - encryption of data when in Use.

complete data encryption cycle

Below image shows an example where each endpoint in a solution encrypts data in use. The example taken here is a Speech Recognition Application which is being used for natural language understanding within confidential computing.

Each endpoint (STT/NLP/ML/TTS) of following solution is being run inside the secure enclave where data gets encrypted at runtime. The speech from actor gets processed within secure Enclave OS, STT application.

STT then leverage Fortanix DSM (Data Security Manager) to tokenize PHI generated via medical transcribe. The NLP application receives tokenized PHI as an input from STT application. NLP application talks to Fortanix DSM to de-tokenize PHI while running in confidential environment and generate consumable output for further AI/ML application.

The AI/ML application also does inference confidentially and push encrypted results to TTS application. TTS application decrypts results at runtime in confidential environment and speaks out result to end user. This is how you can secure your data and application pipeline by running your solution within confidential computing environment. 

   Fortanix DSM architecture

With the advent of Speech Recognition technology, many STT/TTS SaaS, API and Solutions offerings are growing in numbers. Protecting speech recognition in its data journey is a need of an hour.

Running this technology within confidential computing would definitely give an edge to the business and would definitely help to reduce data breaches.

Confidential Speech Recognition is going to be a norm in future. Why to wait when you have platform set. For more details on adapting Confidential Computing for your offerings, contact Fortanix. We will love to help you to make your business Confidential.

Reference - https://www.ncbi.nlm.nih.gov/