A decoding strategy that restricts the model's next-token choices to the smallest set of tokens whose cumulative probability exceeds a threshold p. Used alongside temperature, top-p sampling balances output diversity and coherence in production LLM deployments.
Book a 30-minute call to discuss how these AI concepts translate to your specific industry and business challenges.