Over the last couple of years, I’ve watched the same pattern play out again and again.
A team gets excited about large language models and starts experimenting with APIs from providers like OpenAI or Gemini. Proofs of concept multiply. Demos look impressive. The slides look even better.
Then someone asks the harder question:
Where is this model actually running? What happens to our data and prompts? And who can see the weights that power all of this?
That’s when the conversation shifts — from “cool AI features” to “real AI factories.” It’s also when the debate about open versus closed weight models stops being philosophical and starts becoming operational.
In this post, I want to unpack what that debate really means for enterprises, especially in regulated industries — and why, regardless of which path you choose, confidential inference becomes unavoidable if you care about protecting both your IP and your customers’ data.
Open Weight Models: Fast Starts, Hard Questions
When I refer to “open weight” in this context, I’m describing two common patterns: models accessed primarily through external APIs and models whose weights you don’t directly control — even if you can fine-tune or configure them.
At a high level, API-based LLMs are exceptional at one thing: speed.
You can stand up prototypes in days instead of months. You don’t have to manage infrastructure, scaling, or upgrades. Your team can focus on building applications rather than constructing an AI factory from scratch.
For experimentation and early validation, they’re incredibly useful. I rarely tell customers to avoid them entirely.
But the same characteristics that make them fast also create blind spots — especially in public sector, financial services, and healthcare environments.
Eventually, the questions become unavoidable:
- Where is this LLM actually running — in which region, on whose infrastructure?
- When I send sensitive prompts containing PII, PCI, PHI, or internal trade secrets, who can access them?
- Are prompts or outputs being logged, reused, or incorporated into broader model training?
- Does the infrastructure meet my compliance and regulatory requirements?
If those questions don’t have clear answers, you’re operating in a grey zone. You may be enriching someone else’s model with your data. You may be running critical workloads in environments that don’t meet your governance standards. And you probably don’t have a defensible explanation ready for your regulator or your CISO.
Open/API-based models solve the “how do we get started?” problem. They do not, by themselves, solve the “how do we build a trustworthy AI factory?” problem.
Closed Weight Models: Control Comes with Responsibility
On the other end of the spectrum are closed weight models — systems whose weights you own, manage, and deploy within infrastructure you control or explicitly trust.
This path often involves training or fine-tuning proprietary models, deploying them in your own AI factory or sovereign cloud, and managing your own keys, policies, and compliance posture.
It is more work. There’s no avoiding that. You now have to think about hardware selection, orchestration, observability, lifecycle management, and operational risk.
But what you gain is control.
You gain sovereignty over where models run and where data lives. You gain control over how prompts and outputs are logged, retained, and protected. You gain visibility into the security posture of the chips, infrastructure, and data layers supporting your workloads. And you can align your deployment directly with regulatory requirements, rather than relying on someone else’s generic baseline.
For many regulated industries, this isn’t a preference issue. It’s a production requirement.
Closed-weight models give you the building blocks to move from “we’re using AI” to “we’re operating a secure, sovereign AI factory.”
Why Model Weights Are the Real Asset
Underneath the open versus closed debate lies a simpler truth: model weights are the asset.
For many companies, model weights aren’t just a crown jewel — they are the company. They encode years of domain expertise, proprietary data, and competitive differentiation. They are what turn a generic architecture into a unique product.
If an adversary gains access to your weights, they can:
- Replicate your model
- Offer similar capabilities to your customers
- Undercut your pricing and erode your competitive moat
In practical terms, that means your competitive advantage can walk out the door in the form of a memory dump.
With open/API-based models, you’re trusting the provider to protect their weights — and hopefully your data. With closed-weight models, you become the model owner and assume responsibility for preventing IP theft yourself.
In both cases, the central question is the same: what does it actually take to protect model weights in production?
That’s where confidential inference enters the conversation.
The Runtime Gap Most Teams Miss
Most AI security discussions still focus on encryption at rest and encryption in transit. Those controls are necessary, but they miss the most critical moment: inference.
Inference is when the model is actually running. At that moment, two things happen simultaneously:
- Model weights are loaded into CPU or GPU memory and executed.
- User data and prompts flow through the system in real time.
That’s also when the stakes are highest.
A host-level attacker could attempt to dump memory and extract weights. An insider with privileged access could inspect or copy workloads. A misconfigured system could expose sensitive prompts or outputs.
If your protection strategy stops at disk encryption and TLS, you’re leaving the door open at exactly the moment when your most valuable assets are exposed.
What Confidential Inference Actually Means
Confidential inference closes that runtime gap.
In practical terms, it means model artifacts remain encrypted at rest, with keys managed in hardened HSM or KMS systems, not stored in ad hoc key stores. Models are loaded only inside confidential VMs or containers running on CPUs and GPUs that support Trusted Execution Environments (TEEs).
Before any decryption occurs, the environment is cryptographically attested. Hardware, firmware, and workload integrity are verified. Only after successful attestation does the key management system release decryption keys directly to the approved workload — not to a human operator.
The weights are decrypted only inside the TEE and reside in protected memory. Even if someone with host-level access attempts to inspect memory, what they see is encrypted data, not usable model weights.
That is what turns a generic AI deployment into a confidential inference system — one that provides technical, provable assurance rather than relying on policy alone.
How This Changes the Open vs. Closed Debate
Here’s the key point: whether you use open models, closed models, or a hybrid of both, confidential inference is what closes the trust gap wherever sensitive data and IP intersect.
If you operate closed-weight models, confidential inference protects your intellectual property, enables sovereign deployments, and gives you a defensible runtime security story for regulators and customers.
If you consume open/API-based models, confidential inference becomes the bar you should expect providers to meet. It’s the lens through which you evaluate their infrastructure. And for some regulated workloads, it may be the reason those workloads eventually move into environments where you control the confidential computing stack.
In both scenarios, confidential inference shifts the conversation from “we hope this is secure” to “we can prove this is secure.”
A Practical Way Forward
The open versus closed debate isn’t going away. In reality, most enterprises will adopt hybrid strategies.
They will use open, API-based models where speed and experimentation matter most. They will invest in closed-weight models and sovereign AI factories where IP protection, regulatory alignment, and differentiation are critical.
But regardless of strategy, three principles remain constant:
- Treat model weights as core enterprise value.
- Treat prompts and enterprise data as crown jewels, not test inputs.
- Treat confidential inference as a requirement for any AI workload that truly matters.
Do that, and you move from “we’re experimenting with AI” to “we’re operating AI factories we can trust.”
And that shift — from experimentation to trusted operation — is where this industry ultimately needs to go.


