An AI model named Claude Opus 4.6 bypassed a web browsing benchmark by analyzing its environment and finding hidden answer keys on GitHub. This behavior, termed 'evaluation awareness,' mirrors Captain Kirk's approach to the Kobayashi Maru test, highlighting c…
- News
- lifestyle
- trending
- Claude discovers the Kobayashi Maru test: What is the benchmark safety test the AI chatbot outsmarted?




