Sensor-driven failure prediction turns maintenance from a calendar into a forecast. This guide covers the full programme: the data foundation (vibration, thermal, and motor-current signatures over OPC-UA and time-series storage), the modeling approaches that fit real plant data (anomaly detection, remaining-useful-life estimation, survival models), edge versus cloud inference, integration with your CMMS and SCADA, and how to quantify ROI in terms maintenance leadership already trusts — downtime avoided and MTBF. Framed against ISO 13374 condition monitoring and IEC 62443 OT security.
Last reviewed: June 2026
Predictive maintenance is a condition-based maintenance strategy that uses sensor data and machine-learning models to estimate the actual health of production equipment and predict failures before they occur. Instead of servicing on a fixed calendar (preventive maintenance) or repairing after breakdown (reactive maintenance), it forecasts when a specific asset will need attention — from continuous condition signals such as vibration, temperature, and motor current, fused with operating context. Done well, it converts unplanned downtime into planned intervention and targets maintenance effort at the assets that actually need it.
Every plant runs one of three maintenance strategies, usually a mix. Reactive maintenance fixes assets after they break — cheap to run until the unplanned failure that stops the line. Preventive maintenance services on a fixed schedule — safer, but it over-services healthy assets and can still miss a failure that arrives early. Predictive maintenance is the third option: use the asset's own condition data to decide when it actually needs attention.
The premise is simple and physical. Mechanical failures rarely happen without warning. A bearing degrades through measurable stages; a misaligned shaft radiates a characteristic vibration; an overloaded motor heats. The signatures of impending failure are present in the data long before the failure itself. Predictive maintenance is the discipline of capturing those signatures, learning what normal looks like for each asset, and acting on the departures.
The economic case is equally simple: unplanned downtime is the most expensive thing that happens in a plant. A predictive programme that converts even a fraction of unplanned stoppages into scheduled interventions pays for itself, because the cost of an hour of unplanned downtime dwarfs the cost of the monitoring. The rest of this guide is about how to build that programme honestly — what data it needs, which models fit real plant data, where the inference should run, how it connects to the systems you already operate, and how to prove the return.
Predictive maintenance is a data problem before it is a modelling problem. The quality, coverage, and context of the condition data set the ceiling on everything a model can do. Two parts: the physical signals you capture, and the pipeline that turns raw signals into trainable, queryable data.
Accelerometers mounted on bearings, gearboxes, and rotating shafts capture vibration spectra. Frequency-domain analysis (FFT, envelope analysis, cepstrum) isolates fault signatures: bearing defect frequencies (BPFO, BPFI, BSF), gear mesh harmonics, imbalance, and misalignment. Vibration is the single richest predictive signal for rotating machinery.
Temperature sensors (RTDs, thermocouples) and thermal imaging detect abnormal heat from friction, electrical resistance, lubrication breakdown, and load anomalies. Thermal trends are a slow-moving but high-confidence indicator — a bearing that is heating is a bearing that is failing.
Motor Current Signature Analysis reads the stator current of an electric motor. Sidebands around the line frequency reveal broken rotor bars, eccentricity, and load-coupled mechanical faults — often without any additional sensor, since the current is already measured by the drive. A low-cost, non-invasive signal.
Pressure, flow, speed, torque, acoustic emission, oil quality, and load data — typically already present in the PLC/SCADA historian. These contextual variables are essential: a vibration spike at full load means something different than the same spike at idle. Fusing condition signals with operating context is what separates a usable model from a false-alarm generator.
Decide what to measure and at what rate. Vibration analysis needs high-frequency sampling (often kHz-range, anti-aliased); thermal and process variables are slow (sub-Hz to a few Hz). Sensor placement is a discipline of its own — a poorly mounted accelerometer corrupts every downstream model. Where assets are already instrumented, the data may exist in the PLC; where they are not, retrofit IIoT sensors are the first capital decision.
Key Decisions
Tooling
Move data off the machine without violating the plant's OT network boundaries. OPC-UA is the dominant industrial interoperability protocol; MQTT (often via Sparkplug B) is common for telemetry; Modbus and proprietary fieldbuses persist on older equipment. The connectivity layer must respect IEC 62443 zone segmentation — condition data flows out of the control zone through a defined conduit, not by exposing PLCs to the network.
Key Decisions
Tooling
Condition-monitoring data is high-volume time-series: timestamped, append-heavy, and queried by window. A purpose-built time-series database (or a historian) handles ingestion rate, downsampling, and retention policies far better than a general-purpose relational store. This is the substrate every model trains and runs against.
Key Decisions
Tooling
Raw signals become model inputs through domain-aware feature extraction: spectral features (band energies, defect-frequency amplitudes), statistical features (RMS, kurtosis, crest factor), and trend features. Labelling is the hard part — true run-to-failure data is scarce because well-run plants do not let assets fail. Health labels often come from maintenance work orders, inspection reports, and known failure events recovered from the CMMS.
Key Decisions
Tooling
There is no single predictive-maintenance algorithm — there is a progression of approaches matched to the data you have. The defining constraint of real plants is that failures are rare by design: a well-run operation does not let assets run to failure, so labelled failure data is scarce. That constraint dictates where to start.
Practical sequencing: most programmes start with unsupervised anomaly detection (needs only healthy data), then graduate to remaining-useful-life and survival models as degradation trajectories accumulate, and finally to supervised fault classification once a curated, fault-labelled history exists.
When you have abundant healthy-operation data and few labelled failures — the common case — anomaly detection is the pragmatic starting point. The model learns the normal operating envelope (autoencoders, isolation forests, one-class SVM, Gaussian mixture baselines) and flags statistically significant departures. It answers 'is this asset behaving abnormally?' without requiring labelled failure examples.
Best Applicability
First deployment on assets with no failure history; broad fleet screening; early-warning layer feeding human review.
RUL models predict how much operational life an asset has left before functional failure. Approaches range from physics-informed degradation models to data-driven regression (gradient-boosted trees, LSTM/temporal CNN on degradation trajectories). RUL turns a binary alarm into a planning horizon — the difference between 'something is wrong' and 'you have roughly N operating hours to act'.
Best Applicability
Assets with observable progressive degradation (bearings, tooling, filters) and enough run-to-failure trajectories to learn a degradation curve.
Survival analysis (Cox proportional hazards, Weibull accelerated-failure-time, random survival forests) models the probability of failure over time as a function of covariates — load, age, duty cycle, condition signals. Borrowed from reliability engineering and actuarial science, these models handle censored data (assets that have not yet failed) natively, which is exactly the data shape a real plant produces.
Best Applicability
Fleet-level reliability planning; maintenance interval optimisation; quantifying failure risk under different operating regimes.
Where labelled fault data exists — historical failures tagged by mode (bearing outer-race defect, gear wear, imbalance) — supervised classifiers map the current signature to a specific fault type. This is the most actionable output for maintenance crews because it names the likely problem, not just its presence. It requires the richest labelled dataset and is usually a later-stage capability built on accumulated, well-curated history.
Best Applicability
Mature programmes with curated failure-mode labels; root-cause acceleration; routing the right specialist to the right asset.
Not sure whether your assets are instrumented enough to start, or which modelling approach fits the data you actually have? Hyperion runs a focused discovery sprint that audits your condition-data foundation, identifies the highest-value assets to monitor first, and produces a pragmatic roadmap from anomaly screening to remaining-useful-life.
Where a predictive model executes is an architecture decision driven by latency, bandwidth, and data governance — not by fashion. For industrial equipment, the answer is frequently "at the edge," for reasons that have as much to do with OT security and data residency as with performance.
Runs the model close to the asset — on an IIoT gateway, industrial PC, or compact edge module. Essential when latency matters (near-real-time vibration analysis), when bandwidth is constrained (raw waveform streams are large), or when OT-network and data-residency rules forbid sending plant data off-site. Edge inference keeps condition data inside the IEC 62443 zone boundary and survives WAN outages.
Aggregates many assets or sites into one model and dashboard. Best for fleet-wide pattern learning, heavyweight training, long-horizon trend storage, and cross-site benchmarking. The tradeoff is bandwidth, latency, and the data-governance question of whether OT telemetry may leave the plant at all — a question that, for sovereign and regulated environments, often pushes the answer back toward on-prem.
Lightweight anomaly screening and feature extraction at the edge; aggregated features and curated events sent centrally for fleet learning, model retraining, and dashboards. Models are trained centrally where compute is cheap, then compiled and pushed to the edge for inference. This pattern respects bandwidth and OT boundaries while still capturing fleet-scale learning.
A prediction that no one acts on has no value. The hardest and most underestimated part of a predictive-maintenance programme is integration — wiring model outputs into the systems and workflows the maintenance organisation already runs, and closing the loop so outcomes improve the model.
The CMMS is where predictive insight becomes action. A model prediction is worthless until it triggers a work order, schedules a technician, and reserves the spare part. Integration means: auto-creating or enriching work orders from model alerts, writing predicted-failure context onto the asset record, and — critically — closing the loop by feeding work-order outcomes back as labels for the next model iteration.
SCADA and the process historian are the source of truth for operating context and often the source of the condition signals themselves. The predictive layer subscribes to historian tags (via OPC-UA) for live context and can surface health indices back into the SCADA HMI so operators see asset health alongside process state — without ever placing the AI in the control path.
Predictions must reach humans through the channels they already use — a notification, a dashboard tile, a prioritised review queue. The design goal is signal, not noise: a predictive programme that floods technicians with low-confidence alerts trains them to ignore it. Alert thresholds, confidence reporting, and a human-review step are what make the system trusted on the shop floor.
A predictive-maintenance business case stands or falls on one figure the plant must supply: the cost of an hour of unplanned downtime on the target line. Everything else builds from there. The levers below are the standard, auditable measures maintenance leadership already tracks — which is exactly why they make a defensible case.
The one number that matters most: establish the fully-loaded cost per hour of unplanned downtime for the specific line before modelling anything. Without it, every ROI claim is a guess; with it, the primary return is a simple product: downtime hours avoided per year × cost per hour.
The headline value. Each prevented unplanned stoppage avoids lost production hours, expedited-repair premiums, and cascading line effects. The ROI calculation is concrete: (downtime hours avoided per year) × (cost per hour of downtime for that line). The cost-per-hour figure is plant-specific and is the single most important number to establish before modelling anything.
Mean Time Between Failures rises as failures are caught and corrected before they cascade; Mean Time To Repair falls when crews arrive knowing the likely fault and arrive with the right part. Tracking MTBF and MTTR before and after deployment gives a defensible, auditable measure of programme impact that maintenance leadership already understands.
Condition-based maintenance replaces calendar-based over-servicing. Parts are changed when condition data warrants it, not on a fixed schedule — reducing both premature part replacement and catastrophic run-to-failure. The saving is the gap between time-based maintenance cost and condition-based maintenance cost across the fleet.
RUL and survival estimates let procurement order parts on a predicted horizon rather than holding large safety stocks 'just in case'. Lower carrying cost, fewer emergency-freight premiums, and better cash-flow — a secondary but real line in the business case.
Predictive maintenance is not a greenfield discipline — it has established standards that give it structure, defensibility, and a shared vocabulary with reliability engineers. Building an AI programme along these frameworks makes it legible and auditable rather than a black box.
Condition Monitoring and Diagnostics of Machines — Data Processing, Communication and Presentation
ISO 13374 defines a reference architecture for condition-monitoring systems, structured as a processing chain: data acquisition (DA), data manipulation (DM), state detection (SD), health assessment (HA), prognostic assessment (PA), and advisory generation (AG). It is the conceptual backbone of any serious predictive-maintenance programme — anomaly detection maps to state detection and health assessment; RUL maps to prognostic assessment.
What It Means for an AI Programme
Structuring an AI predictive-maintenance system along the ISO 13374 processing blocks makes it legible to reliability engineers and interoperable with established condition-monitoring practice. The companion ISO 13379 (diagnostics) and ISO 13381 (prognostics) extend the framework.
Security for Industrial Automation and Control Systems (OT Cybersecurity)
IEC 62443 defines the zone-and-conduit model for OT cybersecurity. Any predictive-maintenance system that taps PLC/SCADA data sits inside this model: the data collector and inference server must be placed in the correct security zone, and all communication with the control zone must pass through a conduit with defined controls (authentication, encryption, integrity).
What It Means for an AI Programme
Pulling condition data for AI must not weaken OT security. The collector belongs in a supervisory zone, not bolted onto the control network; sending raw OT telemetry to a cloud crosses a zone boundary that, for many regulated and sovereign environments, is the deciding factor for on-prem/edge inference.
Condition Monitoring — General Guidelines & Mechanical Vibration Evaluation
ISO 17359 gives the general procedure for setting up condition monitoring; the ISO 10816 / ISO 20816 series defines vibration severity zones (A/B/C/D) for evaluating machine condition by measured vibration. These provide established, defensible thresholds that an AI model's outputs can be cross-checked against.
What It Means for an AI Programme
AI does not replace these standards — it operationalises and extends them. A model can learn asset-specific baselines finer than a generic ISO severity zone, while the ISO zones remain a sanity check and a common vocabulary with the reliability team.
Reading about anomaly screening is one thing; watching it read your data is another. Hyperion runs a CSV-maintenance demo live on this site — upload a CSV of equipment readings and an AI layer previews how it would surface anomalies and triage maintenance attention.
Honesty boundary: the live demo is an illustrative preview, not a calibrated condition-monitoring deployment. It shows the shape of the capability on a small uploaded sample — it is not a substitute for a properly instrumented, validated predictive-maintenance programme built on your real sensor data and failure history. Verify any output against your own data and a qualified engineer before acting.
A factual account of the background behind this work — verified facts, not marketing claims.
Hyperion runs a CSV-maintenance demo live on this site: a visitor uploads a CSV of equipment readings and an AI layer previews how anomaly screening and maintenance triage would read that data. It is demonstrated live, with an honest caveat that it is an illustrative preview, not a calibrated condition-monitoring deployment. It exists to show the shape of the capability, not to substitute for a real, instrumented programme.
Founder Mohammed Cherifi spent 17+ years in automotive and embedded systems engineering, including work at Renault-Nissan-Mitsubishi Alliance, Cisco, and ABB. Predictive maintenance lives at the intersection of sensors, embedded acquisition, OT networks, and production constraints — the exact territory of that background.
Hyperion has built Auralink — an edge-deployed agentic platform with 400+ microservices and approximately 20 AI agents, including a ROS 2 bridge for physical-infrastructure control (architecture described in the arXiv preprint 2603.08736; a preprint, not a peer-reviewed publication). The edge-inference, time-series, and OT-integration patterns that programme exercises are the same ones a predictive-maintenance deployment needs.
Hyperion is an AI and edge-architecture consultancy. The engagement is data-foundation design, model selection, edge-inference deployment, and CMMS/SCADA integration — working alongside your reliability engineers, OT team, and equipment OEMs. Hyperion does not manufacture sensors, does not certify safety systems, and does not replace the domain knowledge of your maintenance organisation. It builds the intelligence layer on top of it.
Preventive (or preventative) maintenance is calendar- or usage-based: service every N hours or N cycles regardless of actual condition. Predictive maintenance is condition-based: it uses sensor data and models to estimate the actual health of each asset and acts only when the data warrants it. Preventive maintenance over-services healthy assets and can still miss a failure that arrives early; predictive maintenance targets intervention to the assets that actually need it, reducing both unnecessary servicing and unplanned failures.
It depends on the modelling approach. Anomaly detection — learning the normal operating envelope and flagging departures — can start with only healthy-operation data and no labelled failures, which is why it is usually the first capability deployed. Remaining-useful-life and supervised fault-classification models need run-to-failure examples or labelled fault events, which are scarce in well-run plants and are often recovered from CMMS work orders and inspection history. A practical programme starts with anomaly detection and graduates to RUL and fault classification as labelled history accumulates.
No. Edge inference runs the model on-premise — on an IIoT gateway or industrial PC near the asset — keeping condition data inside the plant's OT network and IEC 62443 zone boundary. This is the right pattern when latency matters, bandwidth is constrained, or data-residency and OT-security rules forbid sending plant telemetry off-site. A hybrid pattern (edge inference, central feature aggregation for fleet learning) is common, but raw OT telemetry leaving the plant should be a deliberate, governed decision, not a default.
For rotating machinery — motors, pumps, fans, gearboxes, compressors — vibration is the richest single signal, because frequency-domain analysis isolates specific fault signatures (bearing defect frequencies, gear mesh harmonics, imbalance, misalignment). Thermal trends provide a slower, high-confidence confirmation. Motor Current Signature Analysis (MCSA) is valuable because it often needs no extra sensor — the drive already measures stator current. The best results come from fusing these condition signals with process context (load, speed, pressure) from the existing historian.
The predictive layer subscribes to condition and context signals — typically from PLC/SCADA via OPC-UA or from a historian — runs its models, and pushes results back into the systems your teams already use. In the CMMS, that means auto-creating or enriching work orders from model alerts and writing predicted-failure context onto the asset record. In SCADA, health indices can surface in the HMI alongside process state. The loop closes when work-order outcomes flow back as labels to improve the next model. The AI never sits in the control path.
Start with the cost of an hour of unplanned downtime for the target line — the single most important figure, and one only the plant can supply. The primary return is (downtime hours avoided per year) × (cost per hour). Secondary levers include MTBF improvement, MTTR reduction (crews arrive with the right diagnosis and part), the gap between calendar-based and condition-based maintenance cost, and reduced spare-parts carrying cost from horizon-based procurement. A defensible business case tracks MTBF and MTTR before and after deployment so the impact is auditable, not anecdotal.
No. AI operationalises and scales the practice that reliability engineers already perform — it learns asset-specific baselines, watches every asset continuously, and surfaces prioritised candidates for review. The standards-based framework (ISO 13374 processing chain, ISO 10816 / 20816 vibration severity zones) remains the shared vocabulary and the sanity check on model outputs. The right outcome is a reliability team that spends less time on manual data review and more on the judgement calls that need human expertise.
No. Hyperion's scope is the intelligence layer: data-foundation design, connectivity and time-series architecture, model selection, edge-inference deployment, and CMMS/SCADA integration. Sensor hardware, mechanical installation, and any safety certification are handled by the appropriate suppliers and accredited assessors. Hyperion works alongside your reliability engineers, OT team, and equipment OEMs rather than replacing them.
ISO (2015). "ISO 13374: Condition Monitoring and Diagnostics of Machines — Data Processing, Communication and Presentation."
Context: Defines the reference processing architecture for condition-monitoring systems (DA → DM → SD → HA → PA → AG). The conceptual backbone for structuring a predictive-maintenance pipeline.
ISO (2012). "ISO 13379 / ISO 13381: Condition Monitoring — Diagnostics & Prognostics."
Context: Companion standards to ISO 13374. ISO 13379 covers data interpretation and diagnostics; ISO 13381 covers prognostics — the standards basis for remaining-useful-life estimation.
ISO (2018). "ISO 17359: Condition Monitoring and Diagnostics of Machines — General Guidelines."
Context: General procedure for establishing a condition-monitoring programme, from setting measurement parameters through to diagnosis and prognosis.
ISO (2016). "ISO 20816 (supersedes ISO 10816): Mechanical Vibration — Measurement and Evaluation of Machine Vibration."
Context: Defines vibration severity zones (A/B/C/D) for evaluating the mechanical condition of machines from measured broadband vibration. Provides defensible thresholds to cross-check model outputs.
IEC (2018). "IEC 62443 Series: Security for Industrial Automation and Control Systems."
Context: Multi-part OT cybersecurity standard. The zone/conduit model governs where a predictive-maintenance data collector and inference server may sit relative to the control network.
OPC Foundation (2024). "OPC Unified Architecture (OPC-UA) Specification."
Context: The dominant platform-independent industrial interoperability standard for moving machine and historian data into a predictive-maintenance pipeline.
Lei, Y. et al. (2018). "Machinery Health Prognostics: A Systematic Review from Data Acquisition to RUL Prediction."
Context: Mechanical Systems and Signal Processing. A widely cited survey of the predictive-maintenance pipeline, from data acquisition through health indicators to remaining-useful-life prediction.
Hyperion Consulting (2026). "arXiv preprint 2603.08736: Autonomous Edge-Deployed AI Agents for Physical Infrastructure."
Context: Hyperion founder's preprint (not peer-reviewed) covering edge-deployed agent architecture and a ROS 2 bridge. The edge-inference and OT-integration patterns are directly applicable to predictive-maintenance deployments.
Whether you are instrumenting your first critical assets or trying to get a stalled condition-monitoring programme to production, the early architecture decisions — what to measure, how to store it, where to run inference, how it reaches the CMMS — shape everything. Hyperion brings 17+ years of automotive and embedded-systems experience alongside production work in edge-deployed AI. Start with a conversation.
Founder & AI Strategy Lead
Mohammed Cherifi is the founder of Hyperion Consulting, with 17+ years in automotive and embedded systems engineering. He specialises in physical AI deployment — bringing operational experience from Renault-Nissan-Mitsubishi Alliance, Cisco, and ABB to condition monitoring, edge inference, and industrial AI architecture.
From OPC-UA to a production twin — where predictive maintenance fits the data foundation
Computer-vision defect detection on the line — the companion shop-floor AI capability
The 6-layer Physical AI Stack for robotics, edge AI, and industrial automation
ISO 26262 and IEC 62443 considerations for edge AI in industrial environments