When I talk to enterprise leaders about AI factories, there’s a moment where the room goes quiet.
It usually happens right after I say, “Your model weights aren’t just a crown jewel of the company. In many cases, they are the company.”
Then I ask a simple follow‑up: “Are you absolutely sure you’re not exposing them?”
Most teams can’t honestly say yes.
In this post, I want to explain, in plain language, why model weights are so critical, how organizations unintentionally leak their value, and what it really takes to protect them in the way AI at scale demands.
Model Weights Are the New Enterprise IP
Over the last few years, we’ve watched enterprises pour effort and money into building proprietary models: speech‑to‑text tuned for their customers, language models fine‑tuned on years of internal data, decision systems for risk, fraud, underwriting, and beyond.
Those models don’t just sit at the edge of the business anymore. They are business logic. They encode the company’s history, expertise, and differentiation.
That value is concentrated in one place: the model weights.
The weights are what make your model yours. They represent the knowledge distilled from your data and training. If an adversary gets hold of them, they can replicate your model. They don’t need to steal your training code, feature engineering, or documentation. They just need that one artifact.
Once that happens, you’re no longer the exclusive owner of that intelligence. You’re sharing your moat with whoever walked away with a copy.
How Weights Get Exposed in the Real World
The uncomfortable truth is that most companies don’t expose their model weights because they’re careless. They expose them because they treat AI like any other software deployment.
A common pattern looks like this: the team trains a model, checks the artifacts into a repository or object store, and then wires the model into a service that runs on a GPU cluster. It’s the same mental model as deploying a web service or an API. The infrastructure is “trusted,” so people don’t look much further.
But if your model weights are just another file in storage and another blob in memory, they are also accessible to anyone who can reach that storage or memory. That might be a well‑intentioned admin with broad privileges, a compromised host, or a piece of malware that has burrowed its way into the system. The organization thinks, “Our data is encrypted, our perimeter is solid, we’re fine,” while the most valuable artifact in the system is sitting in the clear at runtime.
Another pattern is assuming that running inside a cloud or data center automatically means you have the right kind of protection. You might be on a powerful GPU cluster, but have you actually verified what’s underneath? Do you know whether the firmware and hardware were attested? In many environments, the honest answer is no. You’re operating on faith.
And then there is the human side. In discussions with customers, I’ve emphasized that we see as many insider threats as outsider threats. Outsiders scan for vulnerable VMs, networks, and misconfigurations. Insiders already have context and credentials. If your architecture assumes that everyone with low‑level access will always behave perfectly, you’ve already broken the zero‑trust principle you need for modern AI factories.
In all of these scenarios, the end result is the same: your model weights are more exposed than you think, and you often have no easy way to know if someone copied them.
Why Inference Is the Critical Moment
Most enterprises are comfortable talking about encryption “at rest and in transit.” Those controls are important, but in AI, the most sensitive moment is neither rest nor transit. It’s inference or when AI is in use.
During inference, your model is live. It’s being loaded into memory, sitting on CPUs and GPUs, and serving real traffic. Your two most valuable assets are combined at that moment: your model weights and your customers’ data or prompts.
In many current deployments, this is exactly where protection is thinnest. The model may be encrypted on disk, and your clients may communicate with it over Transport Layer Security (TLS). But once the process starts and the model is loaded, the model weights are sitting unprotected in memory. Someone with sufficient host access, or the ability to dump memory, can potentially walk away with them.
That’s why I keep coming back to confidential computing and confidential inference. Traditional controls stop at the edge of the host. Inference at scale, especially in regulated industries, needs something deeper.
What Confidential Inference Really Means
When I talk about confidential inference, I mean a very specific change in how we run and protect models.
First, model artifacts, especially model weights, should be treated as extremely sensitive objects. They should be encrypted wherever they live at rest, and those encryption keys should be managed by a hardened key management system or hardware security module, not scattered across config files or generic key stores. In practical terms, that means using FIPS‑class HSMs and treating model keys with the same seriousness you treat your most sensitive cryptographic material.
Second, you don’t want to run these models in just any VM or container. You want to run them inside confidential VMs or confidential containers that are tied to trusted execution environments on CPUs and GPUs. The property you’re looking for is that even privileged system software on the host cannot see into the protected workload or its memory.
Third, before you ever decrypt the model weights, you must attest the environment. A confidential computing control plane verifies that the hardware is genuine, that the firmware and Basic Input/Output System (BIOS) are in a known‑good state, and that the workload you are about to run is the one you intended to deploy. Only when that attestation passes does the key management system release the decryption key. And even then, it releases it directly to the attested workload, not to humans.
Finally, you need to decrypt the model weights only within that trusted environment and into encrypted memory. At no point are the model weights sitting in clear text on disk or in memory that the host can trivially dump. If someone with host access tries, what they see is effectively garbage.
When you put these pieces together, you get what I call a confidential inference system. The model owner knows their proprietary model weights are protected. The enterprise consuming the model knows its prompts and data are not exposed to operators or adversaries. And you have a verifiable story about how your AI factory protects its most valuable assets.
The Open vs Closed Question and Hidden Exposure
There’s another important dimension to this story: the difference between open and closed (proprietary) models.
Many teams start their AI journey by wiring APIs to external large language models. That’s a great way to experiment and get prototypes moving. But as soon as you are dealing with sensitive data or regulated contexts, you have to ask hard questions.
Where is this model running? In which region? On what infrastructure? Who can see the prompts that I’m sending? What happens to those prompts after the call returns? What guarantees do I have about the models’ weights themselves?
If you can’t answer those questions clearly, you may be enriching someone else’s model with your data or depending on infrastructure whose security and sovereignty you don’t control. That doesn’t automatically mean “never use external models,” but it does mean being very deliberate about which workloads you run where, and when it’s time to bring models and weights into your own AI factory, under your own confidential computing and key management controls.
Silent Value Leakage
The most worrying aspect of all this is that value leakage rarely looks dramatic. There may be no obvious “red alert” event. No ransomware note. No headlines in the press.
Instead, it shows up gradually: a competitor that suddenly seems to have capabilities suspiciously close to yours; pricing pressure because your differentiation erodes; difficult questions from regulators or customers about how you protect models and data in production.
All the while, the root cause may be simple: the organization has never treated model weights as the central asset they are and has never put the right controls around inference to protect them.
A Question for Every AI Leader
If you are responsible for AI in your organization, there is one question I’d encourage you to sit with: If someone inside or outside your organization decided they wanted a copy of your model weights, how hard would it really be?
If the honest answer is “probably hard, but not impossible,” then you are carrying more risk than you think, and you’re likely leaking enterprise value in ways that are hard to see until it’s too late.
In an era when AI factories are becoming the new industrial backbone, protecting your model weights is no longer a nice‑to‑have. It is a core strategy for preserving the value your company is working so hard to create.


