Detecting collusion through multi-agent interpretability — LessWrong
- In science
- Apr 03, 2026, 09:17 AM
- By lesswrong.com
- 0 Views

• • TL;DR
Prior work has shown that linear probes are effective at detecting deception in singular LLM agents. Our work extends this use to multi-a…
- general
- Apr 08, 2026, 09:03 AM
- tech
- Apr 08, 2026, 09:00 AM
- tech
- Apr 06, 2026, 06:10 AM
- science
- Apr 03, 2026, 11:45 AM
- tech
- Apr 03, 2026, 11:40 AM
- tech
- Apr 03, 2026, 11:34 AM
- science
- Apr 03, 2026, 11:30 AM
- science
- Apr 03, 2026, 11:12 AM
- science
- Apr 03, 2026, 11:04 AM
- tech
- Apr 03, 2026, 11:03 AM