HPE tinker

Fortanix Teams with HPE and NVIDIA to Embed Confidential Computing in AI Factories

Read Press Release

Retrieval Augmented Generation (RAG)

What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is a technology designed to enhance the effectiveness of large language models (LLMs) by utilizing tailored data. RAG leverages specific data or documents as context for the LLM to improve effectiveness, support current information, or provide domain-specific expertise. In simple words, it allows large language models (LLMs) to answer questions about data they weren't trained on.

What is Retrieval-Augmented Generation (RAGs) used for?

Retrieval-Augmented Generation (RAGs) are used to enhance Large language Models' (LLMs) output. By default, Large language Models (LLMs) are trained on vast and diverse public data, and they do not necessarily have access to resent information. This leads to potential inaccuracies, or hallucinations, on unfamiliar data queries that deem the LLM useless.

For organizations that require LLMs to offer precise responses tailored to their domain, the model needs to use insights from their data for specific answers. Retrieval-Augmented Generation (RAGs) have become the industry standard that allows non-public data to be leveraged in LLM workflows, so users can benefit from accurate and relevant responses.

What are the benefits of Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) enhances the response quality of Large Language Models (LLMs) by using current and contextual external data sources. This approach effectively minimizes inaccuracies in the generated answers and delivers tailored, domain-specific information, allowing organizations to get real advantage of their AI deployments.

Are there security risks with Retrieval-Augmented Generation (RAG)?

1. Data Breach and Exposure 

Retrieval-Augmented Generation (RAG) systems rely on vast amounts of data for both retrieval and generation, and this data is stored in vector databases. The security offered by vector databases is immature, so malicious actors could exploit weaknesses to gain access to sensitive and PII data. If not properly secured, this data can be vulnerable to breaches and unauthorized access, leading to data exposure and violation of numerous data privacy laws and regulations, such as GDPR, HIPPA, CCPA, and more.  

2. Model Manipulation and Poisoning 

AI models, including those used in Retrieval-Augmented Generation (RAG) systems, are susceptible to manipulation and poisoning attacks. Bad actors can feed the system with corrupt or misleading data, causing it to generate harmful or misleading responses. This not only undermines the reliability of the AI but also poses significant security risks. 

3. Inaccurate or Misleading Information 

Even with the combination of retrieval and generative models, there is still a risk of producing inaccurate or misleading information. If a Retrieval-Augmented Generation (RAG) system is fed with outdated or incorrect data, the generative model may amplify these errors, leading to the spread of misinformation. 

How can we address Retrieval-Augmented Generation (RAG security) vulnerabilities?

The data security recommendations and best practices mentioned for Large Language Models (LLMs) are equally applicable to Retrieval-Augmented Generation (RAG) models. 

OWASP Top 10 for Large Language Model Applications

https://owasp.org/www-project-top-10-for-large-language-model-applications/ 

NIST AI Risk Management Framework (AI RMF 1.0) Explained  

https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf 

What is a trusted execution environment (TEE) on GPUs?

A GPU TEE isolates workloads in hardware-protected memory so that data and code stay encrypted even during computation. It ensures that no outside process, including system administrators, can see what’s running inside. 

Which GPUs support NVIDIA Confidential Computing?

NVIDIA’s Hopper and Blackwell architectures include built-in confidential computing capabilities. They pair GPU-level encryption with remote attestation to verify that workloads are running in trusted mode. 

What is the performance overhead of NVIDIA Confidential Computing?

The overhead is minimal thanks to hardware-accelerated encryption and optimized memory access. The small trade-off delivers significant gains in data assurance. 

How much slower is confidential computing mode vs normal mode?

This depends on the workload type, but performance loss is often under 5%. For most AI inference and training tasks, the impact is minimal compared to the added protection of keeping data encrypted in use. 

Why is CPU-GPU data transfer a bottleneck?

CPU-GPU transfers can slow down performance because moving data between memory pools adds latency. Confidential computing mitigates this by encrypting and verifying data while in transit, keeping security intact without creating excessive overhead. 

Does NVIDIA encrypt GPU memory at runtime?

Yes, in confidential computing mode. NVIDIA GPUs encrypt the data stored in GPU memory, so sensitive information is protected even when it’s being processed. 

How do you verify the GPU is in confidential computing mode?

Verification is done through attestation, which generates cryptographic proof that the GPU is operating in a secure and verified state. Systems like Fortanix Confidential Computing Manager automate this validation before data is ever decrypted. 

What is GPU attestation in confidential computing?

GPU attestation confirms that both hardware and software components are running trusted, untampered firmware and drivers. It forms part of that larger “chain of trust” that secures workloads from the chip level upward. 

What are some use cases for NVIDIA Confidential Computing?

Common use cases include secure AI training, privacy-preserving analytics, healthcare diagnostics, and regulated data processing. It enables organizations to run sensitive workloads on GPUs without exposing the raw data. 

Can I use confidential computing for AI training?

Absolutely. Confidential computing allows models to train encrypted data inside trusted enclaves, protecting both the training data and the model weights from unauthorized access. 

How do you scale confidential computing across multi-cloud GPU infrastructure?

Scaling effectively requires centralized key management, coordinated attestation across regions and consistent policy enforcement, so workloads can securely move between on-prem and cloud GPU environments.

How do you protect LLM weights from theft in cloud environments?

Encrypt model weights at rest, in transit and during use within a trusted execution environment. Key release should be gated by attestation, so only verified hardware and workloads can access the model. 

What are the emerging attack vectors for confidential GPUs?

Attackers are increasingly targeting side-channel leaks, firmware vulnerabilities and supply-chain tampering. This is why continuous attestation, hardware updates, and runtime monitoring are critical in defending against these threats.

Fortanix-logo

4.6

star-ratingsgartner-logo

As of August 2025

SOCISOPCI DSS CompliantFIPSGartner Logo

US

Europe

India

Singapore

4500 Great America Parkway, Ste. 270
Santa Clara, CA 95054

+1 408-214 - 4760|info@fortanix.com

High Tech Campus 5,
5656 AE Eindhoven, The Netherlands

+31850608282

UrbanVault 460,First Floor,C S TOWERS,17th Cross Rd, 4th Sector,HSR Layout, Bengaluru,Karnataka 560102

+91 080-41749241

T30 Cecil St. #19-08 Prudential Tower,Singapore 049712