What is AI Alignment?
AI FundamentalsThe research field focused on ensuring AI systems behave in accordance with human values and intentions.
AI alignment addresses the challenge of making AI systems do what humans actually want, not just what they're literally instructed to do. Key approaches include RLHF, constitutional AI, and interpretability research.