AI Struggles with “Humanity’s Last Exam” Despite Advancements
The latest and most advanced AI models reportedly achieve around 90 percent on standard benchmarks, meaning they can complete a high percentage of tasks in a standardized test. However, a new test called “Humanity’s Last Exam” challenges even the most advanced models. Developed by Scale AI and the Center for AI Safety (CAIS), this benchmark … Read more