Déploiement

Model Serving

Définition

The process of deploying trained ML models to production environments where they can receive inputs and return predictions at scale. Model serving infrastructure must address throughput, latency, versioning, and cost while meeting SLAs.

Termes Connexes

MLOps

The practice of combining Machine Learning, DevOps, and data engineering to streamline the deployment, monitoring, and maintenance of ML models in production. MLOps ensures reliable, scalable, and reproducible ML systems across their entire lifecycle.

Inference

The process of running a trained model on new data to produce predictions or generated outputs. Inference cost and latency are the dominant operational concerns in production AI, particularly for large generative models that can cost cents per request at scale.

Triton Inference Server

NVIDIA's open-source inference serving software that supports multiple frameworks (TensorRT, ONNX, PyTorch, TensorFlow) on GPU infrastructure. Triton is widely used in enterprise deployments requiring maximum throughput from GPU hardware.

Services Connexes

Product Leadership Program

Besoin d'Aide pour Comprendre l'IA?

Réservez un appel de cadrage Physical AI pour discuter de l'application de ces concepts IA à votre secteur et vos défis métier.