A model architecture where different sub-networks ("experts") specialise in different types of inputs, and a gating network routes each token to the most relevant experts. MoE enables very large model capacity at lower inference cost—Mixtral and GPT-4 are believed to use this approach.
Book a 30-minute call to discuss how these AI concepts translate to your specific industry and business challenges.