技术方法

Mixture of Experts (MoE)

定义

A model architecture where different sub-networks ("experts") specialise in different types of inputs, and a gating network routes each token to the most relevant experts. MoE enables very large model capacity at lower inference cost—Mixtral and GPT-4 are believed to use this approach.

相关术语

Transformer

The dominant neural network architecture for language, vision, and multimodal AI, introduced in the 2017 "Attention Is All You Need" paper. Transformers use self-attention to process all tokens in parallel, enabling training on internet-scale data and powering every major LLM in use today.

Sparse Model

A model architecture that activates only a subset of its parameters for any given input, rather than the full network. Sparse models—enabled by Mixture of Experts designs—achieve larger total capacity while keeping per-inference compute manageable.

Inference

The process of running a trained model on new data to produce predictions or generated outputs. Inference cost and latency are the dominant operational concerns in production AI, particularly for large generative models that can cost cents per request at scale.

了解术语只是第一步，将其落地应用才是第二步。

预约一次 Physical AI 适配性沟通，探讨这些 AI 概念如何转化到您所在的具体行业与业务挑战中。