Fundamenten

AI Alignment

Definitie

The research and engineering discipline of ensuring that AI systems pursue goals and exhibit behaviours that match human intentions and values. Misalignment risks range from models following instructions too literally to more speculative long-term risks discussed in AI safety literature.

Gerelateerde Termen

AI Safety

A multi-disciplinary research field concerned with preventing AI systems from causing unintended harm—including technical failures, misuse, and long-term societal risks. In the EU, AI safety is operationalised through the AI Act's risk classification and conformity assessment requirements.

Constitutional AI

A training methodology developed by Anthropic in which an AI model is guided by a written set of principles (a "constitution") to self-critique and revise its outputs. Constitutional AI is one approach to building safer, more controllable AI systems at scale.

Reinforcement Learning from Human Feedback (RLHF)

A training technique that refines language model behaviour by learning from human preferences rather than fixed labels. RLHF is a primary method used to align LLMs like ChatGPT and Claude with desired values and reduce harmful outputs.

Gerelateerde Diensten

Eu Ai Act Compliance

Hulp Nodig bij het Begrijpen van AI?

Boek een consultatie om te bespreken hoe AI-concepten op uw uitdagingen van toepassing zijn.