Architecture

Positional Encoding

Definition

A technique that injects information about the position of each token in a sequence into the transformer, compensating for the architecture's lack of inherent sequence awareness. Modern models use learned or rotary positional encodings (RoPE) to support long context windows.

Related Terms

Transformer

The dominant neural network architecture for language, vision, and multimodal AI, introduced in the 2017 "Attention Is All You Need" paper. Transformers use self-attention to process all tokens in parallel, enabling training on internet-scale data and powering every major LLM in use today.

Multi-Head Attention

An extension of the attention mechanism that runs multiple attention functions in parallel, allowing the model to attend to information from different representation subspaces simultaneously. Multi-head attention is a core component of every transformer-based model.

Context Window

The maximum amount of text (measured in tokens) an LLM can process in a single request, encompassing both the prompt and the generated output. Larger context windows—now exceeding 1 million tokens in some models—enable processing of long documents, codebases, and meeting transcripts in one pass.

Knowing the Terms Is Step One. Applying Them Is Step Two.

Book a discovery call to discuss how these AI concepts translate to your specific industry and business challenges.