Αρχιτεκτονική

Multi-Head Attention

Ορισμός

An extension of the attention mechanism that runs multiple attention functions in parallel, allowing the model to attend to information from different representation subspaces simultaneously. Multi-head attention is a core component of every transformer-based model.

Σχετικοί Όροι

Attention Mechanism

A neural network component that allows a model to dynamically focus on the most relevant parts of its input when producing each output element. Self-attention is the core innovation of the Transformer architecture and is responsible for LLMs' ability to handle long, complex contexts.

Transformer

The dominant neural network architecture for language, vision, and multimodal AI, introduced in the 2017 "Attention Is All You Need" paper. Transformers use self-attention to process all tokens in parallel, enabling training on internet-scale data and powering every major LLM in use today.

Positional Encoding

A technique that injects information about the position of each token in a sequence into the transformer, compensating for the architecture's lack of inherent sequence awareness. Modern models use learned or rotary positional encodings (RoPE) to support long context windows.

Χρειάζεστε Βοήθεια για να Κατανοήσετε το AI;

Κλείστε μια συμβουλευτική για να συζητήσετε πώς οι έννοιες AI εφαρμόζονται στις προκλήσεις σας.