Most of the AI architectures shipping today were designed for a single environment: stable network, abundant compute, asynchronous responses tolerable at the human timescale. Cloud-first reference architectures from the major hyperscalers and the leading frameworks all assume the same thing — that the model can be a hosted endpoint, latency budgets are measured in hundreds of milliseconds, and a five-second outage is invisible to the user.
Then the factory floor calls.
A production line stamping car body panels at 600 strokes per hour. A robotic arm separating defective batteries on a charge-discharge fixture. A predictive maintenance loop on a fleet of EV chargers spread across rural France. A vision system inspecting solder joints on a PCB that moves through the camera's field of view in 80 ms.
In every one of these environments, the cloud-AI assumptions silently break. The model that worked beautifully in the proof-of-concept demo, with three engineers watching it on a laptop, refuses to hold up in production. Hallucination rates climb. Latency spikes. The control loop misses its deadline. The safety engineer pulls the plug.
The usual diagnosis — "we need a better model" — is almost never the right one. The problem is architectural. Cloud-first AI and physical-system AI live under different physical laws, and you cannot port one onto the other by adjusting hyperparameters.
This is what I see again and again across enterprise pilots in industry, automotive, and energy infrastructure. Below are the five mismatches that cause cloud AI architectures to fail in physical environments, and what an architecture for Physical AI actually has to look like instead.
1. Latency budgets are categorically different
A chatbot that takes 600 ms to respond feels snappy. A voice assistant at 1.2 seconds is acceptable. A summary generation that takes 4 seconds to stream is industry-standard.
A collaborative robot deciding whether to keep its arm extended or retract it has between 5 and 50 milliseconds, depending on the safety classification, before the decision causes harm. A fast-line vision inspection has a hard upper bound set by the conveyor speed: if the part moves through the field of view in 80 ms, the model has to inference, post-process, and emit a control decision inside that window, with margin for the actuator to respond.
The cloud round-trip alone — local network → ISP → cloud region → model serving infrastructure → response → reverse path — is typically 80 to 250 ms in optimal conditions. There is no model architecture that can collapse this. The only way to meet a 50 ms budget is to move the compute to the edge.
That means a different stack. ONNX Runtime or TensorRT instead of a hosted endpoint. INT8 or even INT4 quantization to fit the hardware. A model that has been distilled, pruned, and benchmarked on the actual silicon — not on a comparable A100. The cloud AI team's reflex of "we'll just call the API" is the first thing that has to die.
2. Networks are unreliable on principle, not on accident
In a cloud-AI deployment, network unreliability is treated as an exception path. Retries, exponential backoff, circuit breakers — the architecture is designed to handle the rare failure with grace.
In a physical environment, network unreliability is the normal case. A factory floor has Faraday-cage steel structures that block 4G. An EV charging site in the Massif Central has intermittent fibre that drops three times a day. A connected vehicle moves between cells with millisecond handoffs. A wind farm sits behind a satellite link with 600 ms RTT and 2% packet loss as a baseline.
When the cloud-first architecture tries to operate in this environment, every piece of state that was assumed to live "in the cloud" — the conversation history, the feature store, the feedback loop, the model registry — becomes a single point of failure. The system that worked perfectly in the lab now hangs every 40 minutes when the network blips.
The physical-AI answer is not a faster network. It is a different state architecture. The edge holds enough state to operate autonomously for hours. Cloud is a destination for batched telemetry, not a dependency in the control loop. The model can be updated when connectivity is good, but the inference path has to be local and self-sufficient. This inverts the cloud-AI default: the cloud becomes the optional component, the edge becomes the source of truth.
At Auralink — the EV charging platform I built with my team — this was the architectural choice that made every subsequent decision easier. Three layers: device, edge, cloud. Each layer holds its own state, each layer can operate when the layer above is unreachable, and the protocols between layers (OCPP 2.0.1 and ISO 15118) are designed for partial connectivity by construction. The cloud-first version of the same product would have been unbuildable.
3. Safety envelopes are real, and they cannot be retrofitted
A cloud AI model that hallucinates a wrong answer in a customer-support chatbot causes a frustrated user. A model that hallucinates a wrong answer in a torque-control system on a battery assembly line causes a fire.
The cloud-AI culture treats safety as a layer added on top — content filters, output classifiers, refusal models. In physical AI, safety is a constraint baked into the system topology, not a wrapper around the model output.
This means several specific things. Every model output that can affect a physical action passes through a deterministic check that can override it. The check is implemented in code that has been reviewed by a safety engineer who can read the actual control law, not by an ML engineer who knows the model. The hardware has independent kill paths that do not go through the AI compute. The system has a defined safe state to fall back to when the model's confidence drops below threshold, when latency exceeds budget, when sensor inputs go out of bounds. Every one of these paths has been simulated, tested, and signed off.
This is not the same as cloud-AI "guardrails". It is what certification engineers in automotive (ISO 26262), industrial automation (IEC 61508), and energy infrastructure (NERC CIP) have been doing for decades. The Physical AI architecture either inherits this discipline from day one or it does not ship to a real customer. There is no version of "we'll add safety later" that survives an actual safety review.
4. Hardware constraints are not optional anymore
In the cloud, GPUs come from one of three vendors and the choice is mostly an accounting question. In a physical-AI deployment, the silicon is whatever fits the mechanical, thermal, and power envelope of the target system. NVIDIA Jetson on a robot. Qualcomm RB-series on a connected vehicle. A Hailo-8 on a vision board with a 2-watt budget. A custom ASIC on a charging gateway. An ARM Cortex-M with 512 KB of RAM on a sensor node.
This matters because the model has to be designed for the hardware, not adapted to it. A 7B-parameter LLM that runs comfortably on an A100 is not going to fit on a 32 TOPS edge accelerator without aggressive quantization, pruning, and architecture changes. A vision model that achieves 96% accuracy at FP32 may drop to 91% at INT8 and to 87% at INT4. The right move is often to go with a smaller model that was trained with quantization in mind from the start, not to take a bigger one and squeeze it.
The FinOps angle compounds this. A cloud team comparing inference costs against a $0.0002-per-token API call is making the wrong comparison. A physical-AI deployment has a per-unit hardware cost ($150 to $2000 depending on tier), a per-unit power budget (which determines the cooling design and therefore the BOM cost of the enclosure), and a per-unit thermal envelope (which is non-negotiable in a vehicle that has to operate from −40 °C to +85 °C). The right model is the one that hits the accuracy target inside all three constraints — and that model is almost never the same one the cloud team chose.
5. The feedback loop has different physics
A cloud AI system can collect every user interaction, log it to a data lake, and rebuild the training set continuously. The feedback loop is a question of data engineering and infrastructure cost.
A physical-AI system has feedback loops that are constrained by what the device can capture, what bandwidth allows, what regulation permits, and what the human-in-the-loop can review. A predictive maintenance model running on a wind turbine sees the sensor stream that turbine emits — not the global state. A vision system on a manufacturing line cannot phone home full-resolution images of every defect because the upstream link cannot carry that volume. A medical-grade computer vision model in a hospital cannot retrain on patient images without a regulatory pathway.
This changes how the architecture treats data. Instead of "capture everything, train continuously", the design is "capture summaries, sample exceptions, train on aggregated batches with explicit human review". The feature store moves to the edge. Model updates ship over OTA pipelines that have to handle interrupted downloads, partial flashes, and rollback paths. Drift detection runs locally and emits compressed signals upstream rather than raw data.
Getting this wrong is usually invisible at pilot. It shows up at month four, when the model's accuracy has drifted and the team realizes they have no representative data to retrain on, because the cloud-AI assumption that "we can just collect more" never actually held.
What a Physical AI architecture looks like
None of this means "don't use cloud AI". It means the architecture has different layers, with different responsibilities, and the cloud is one of three or four tiers — not the centre of gravity.
The rough shape is consistent across the deployments I have shipped:
Device tier — runs the inference path that is in the safety envelope. Owns the real-time control loop. Has a defined safe state. Can operate independently for the full mission duration.
Edge tier — aggregates devices, holds shared state across them, runs models that need a wider view (multi-camera vision, fleet-level coordination, site-level optimization). Can operate when the cloud is unreachable; reconciles state when the link returns.
Cloud tier — receives compressed telemetry, holds the model registry, runs the heavy training, manages OTA distribution, owns the long-term data warehouse. Optional in the control loop. Required for capability evolution.
Compliance tier — cuts across all three. Holds the audit trail, the model cards, the EU AI Act classification, the incident playbooks, the retraining triggers. Designed in from day one, not retrofitted before the auditor visits.
The Physical AI Stack we publish at Hyperion is a more detailed version of this — six layers covering hardware abstraction, real-time runtime, model artifacts, observability, governance, and the operational team — but the high-level idea is the same. The cloud is one tier. It is not the system.
Practical checklist for teams crossing the gap
If your team is moving from a cloud-AI proof of concept toward a real industrial deployment, the following questions will tell you whether you are on the right architecture or about to discover the gap the hard way.
- What is the latency budget, and have you measured the actual cloud round-trip from the deployment site at the worst time of day? If you do not have numbers, you do not have a budget.
- What is the safe state, and what triggers entering it? "The model will be confident" is not an answer.
- How long can the device operate with no network? The right answer is hours, not seconds.
- Where does the inference path run, and on what silicon? If the answer involves "the API", you are still in cloud-AI thinking.
- What is the per-unit BOM cost contribution of the AI compute, and is it within the product margin? If you do not know, the product is not deployable.
- How does a model update reach the device, and what happens if the update is interrupted halfway? Rollback paths are not optional.
- What does the audit trail look like for the EU AI Act classification of this system? If the answer is a wiki page, the system is not certifiable.
- Who is the safety engineer who has signed off the deterministic checks around the AI output? If there is no name, there is no system.
None of these questions has a clean cloud-AI answer. They each force the architecture into a Physical AI shape. Teams that ask them in week one ship in 90 days. Teams that ask them in month six pivot or stop.
The architectural choice is the strategic choice
What makes Physical AI hard is not that the models are different. It is that the system around the model has to be different. The cloud-first architecture optimizes for flexibility, scale, and rapid iteration. The physical-AI architecture optimizes for real-time guarantees, safety, and graceful degradation. These are not the same thing, and you cannot smoothly transition from one to the other after the fact.
This is why most enterprise AI pilots in industry stall between proof of concept and production. The proof of concept lived on the cloud-AI assumptions. The production environment refused to honour them. The team blames the model and orders another pilot.
The alternative is to design for the physical environment from the first whiteboard session: define the latency budget, the safe state, the offline duration, the silicon target, the audit trail. Build the architecture that those constraints imply. Then choose the model that fits the architecture, not the other way around.
That is the work. It is not glamorous, it does not appear in most cloud-AI tutorials, and it cannot be outsourced to a generic AI consultancy that has only ever shipped chatbots. It is also the only path I have seen that actually puts AI inside a factory, a vehicle, or a piece of energy infrastructure and keeps it there.
If you are running a pilot that is showing the symptoms in this article — climbing latency, intermittent failures correlated with network blips, safety reviewers refusing to sign, retraining stalled because the data never came back — the diagnosis is rarely a model issue. It is an architecture issue, and the longer the gap stays unaddressed, the more expensive the eventual rebuild becomes.
The good news: once you have shipped one Physical AI system end-to-end, the pattern transfers. The architectural primitives — the three-tier topology, the safe state, the deterministic checks, the OTA pipeline, the audit trail — are the same whether the deployment is a charging network, an assembly line, an autonomous mobile robot, or a smart-grid asset. The first one is the hardest. After that, the engineering economics work in your favour.
