Architecture Decisions That Matter for the AI Factory Tech Stack

Most of the AI conversations I’m pulled into today start the same way.

If the last 18–24 months were about experimenting with AI, the next 12 will be about deciding what you’re willing to run in production, and on what architecture.

Most of the prospects I speak with have a similar story. They can point to a long list of proofs of concept:

Internal co-pilots for employees
Customer‑facing agents built on third‑party LLM APIs
Early attempts to fine‑tune models on proprietary data

Then someone on their team asks a harder question: “How do we build the AI factory that will run all of this safely, at scale, across our regions and regulated businesses?”

That’s the real challenge in front of us. You don’t just need AI features; you need an architecture you can trust — one that protects your data and models, satisfies regulators, and can grow with your business.

In this post, I’ll share how we think about those architecture decisions at Fortanix, grounded in the same concepts I walked through in our recent AI factory conversation.

From GPU Clusters to AI Factories

A useful way to frame the problem comes from NVIDIA president and CEO Jensen Huang’s description of AI factories, which he compares to a five‑layer cake:

Layer No. 1: Application—The user‑facing apps, agents, and workflows
Layer No. 2: Model—LLMs, speech, vision and domain‑specific models
Layer No. 3: Chips—GPUs, CPUs and accelerators
Layer No. 4: Infrastructure—Compute, network, storage and orchestration
Layer No. 5: Energy—The power that feeds the entire system

What we see with customers is clear: there is no shortage of interesting work happening at the top of this stack. Teams are building great applications, wiring in impressive models, and piloting new experiences.

Where things get much less clear is below that:

How do you trust the hardware and workloads actually running your sensitive AI?
How do you provide end‑to‑end security for data and models, not just encryption at the edges?
How do you maintain sovereignty as you deploy AI factories across different countries and regulated environments?

Those three questions are trust, security, sovereignty, which will define whether your AI investments can move from isolated POCs to something your business and your regulators are comfortable betting on.

At Fortanix, we organize our answer into four pillars of Confidential AI. The architecture decisions that matter over the next 12 months are, in practice, decisions about whether you adopt these pillars now or defer them and pay the cost later.

Pillar 1: Build Verifiable Trust into Your Stack

The first pillar is a verifiable trust.

In many AI deployments today, trust is still an assumption: “We’re in our cloud VPC, the hypervisor is locked down, so we’re fine.”

That’s not enough anymore, not when you’re running high‑value models on high‑value data.

Our view is that you should be able to prove the integrity of your AI factory before you run sensitive workloads. That’s exactly what our confidential computing manager is designed to do.

It acts as a control plane that:

Takes your existing VMs and containers (running on what we treat as untrusted infrastructure)
Converts them into confidential VMs or confidential containers
Deploys them onto CPUs and GPUs that support confidential computing
Enforces attestation policies for vendors like NVIDIA, Intel, and AMD

Before a workload runs, we verify:

The hardware is genuine and hasn’t been tampered with in the supply chain
The firmware and BIOS are in a known‑good state
The workload you’re about to execute is the one your CI/CD pipeline produced — not a modified image

If you’re prospective thinking about AI factories, this is the first key decision: Will you continue to take infrastructure on faith, or will you make verifiable trust — via confidential computing and attestation — a foundational requirement?

Twelve months from now, regulators and enterprise customers will increasingly expect the latter.

Pillar 2: Treat Keys for Data and Models as Critical Infrastructure

The second pillar is enterprise‑grade key management.

In our discussions with customers, we often say: your model weights and your data are both crown jewels. In many cases, the model is so central that it is fair to say: the model is the company.

If an attacker or insider can get a copy of your weights, they can replicate your model. If they can access the keys protecting your training data and prompts, they can exfiltrate or misuse your most sensitive information.

Our Data Security Manager is built to address exactly this:

It’s a next‑generation HSM — a FIPS 140‑2 Level 3‑class appliance — with a built‑in KMS.
It manages the full lifecycle of keys for data at rest, in motion, and in memory.
It integrates tightly with the confidential computing manager to ensure keys are only released to attested, trusted workloads.

Here’s what that looks like in practice:

Your model artifacts and sensitive data are encrypted at rest.
A confidential VM or container boots and is attested by the confidential computing manager.
Only after that verification succeeds does the workload authenticate to Data Security Manager.
Data Security Manager performs a secure key release, delivering keys directly into that trusted workload without any human ever seeing them.

For you as a prospect, the architectural question is simple but critical: “A year from now, when my AI factory is running real customer data and proprietary models, will I be able to show exactly how keys, workloads, and hardware are tied together or will my encryption story stop at ‘we turn on disk encryption in the cloud’?”

The organizations that answer this well will have a much easier time convincing CISOs, auditors, and regulators that their AI is production‑ready.

Pillar 3: Orchestrate AI Safely at the Application Layer

The third pillar is about the application layer — where all this complexity is supposed to disappear behind a clean interface.

In the early stages, it’s tempting to let every team “just call the model” directly:

One app hard‑codes a call to an external LLM API
Another talks to a GPU cluster via a custom service
A third spins up its own small inference stack

That works for experimentation. It quickly becomes unmanageable when you:

Introduce multiple models (LLMs, speech, vision, domain‑specific)
Need to support multiple regions and sovereign deployments
Have to provide clear governance and observability around usage

Our approach is to build this layer on platforms like NVIDIA AI Enterprise and wrap it in a turnkey orchestration capability that assumes confidential computing and strong key management from the start.

For you, this means thinking now about:

How models are packaged and deployed (as confidential VMs or containers)
How applications discover those models with the right authentication and policy checks
How you audit and govern: which app used which model, on which data, in which region, under which guarantees

The architecture decision here is whether AI will live as a set of isolated experiments, or as a coherent fabric that your teams can safely build on again and again.

Pillar 4: Design for Sovereignty Before It’s Forced on You

The fourth pillar is sovereignty — something our prospects in the public sector, financial services, and healthcare bring up in almost every conversation.

They ask questions like:

“Can we guarantee our data never leaves this country or region?”
“Can we control exactly where our models run, and who can touch them?”
“Can we keep keys and control planes inside our jurisdiction?”

Our Confidential AI stack is built with this in mind:

All components can be deployed entirely within a sovereign region.
We can also deliver them as connected SaaS, where control and data planes respect regional boundaries.
Attestation and key management give you a way to prove, not just assert, that workloads stayed where they were supposed to.

If you’re planning the next 12 months, this is not a “nice‑to‑have” topic to park until legal raises it. It’s a design decision: “Will our AI factory architecture allow us to meet current and future sovereignty requirements without re‑architecting everything region by region?”

Those who design for sovereignty now will be able to scale into new markets and regulatory regimes much faster than those who treat it as a late‑stage patch.

From Interesting Demos to Architectures, You Can Trust

Almost every prospect we talk to has something interesting running with AI today. That’s not the problem.

The problem is confidence.

Confidence that the hardware and workloads can be trusted.
Confidence that keys, data, and model weights are properly protected.
Confidence that applications are using AI in governed, observable ways.
Confidence that sovereignty isn’t being violated behind the scenes.

The architecture decisions you make in the next 12 months will determine whether you can move from isolated experiments to AI factories your business and your regulators can trust.

At Fortanix, our Confidential AI approach is designed to help you make those decisions now, not when it’s too late to change course:

Verifiable trust through confidential computing and attestation
Enterprise‑grade key management that treats data and models as first‑class assets
Secure orchestration at the application layer
Data sovereignty by design, not as an afterthought

If you’re evaluating how to move your AI initiatives from proof of concept to production, these are the pillars I’d encourage you to use when building your architecture.

Architecture Decisions That Matter for the Next 12 Months for the AI Factory Tech Stack