SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via CI

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via CI

Large language model (LLM)-powered agents have demonstrated strong capabilities in automating software engineering tasks such as static bug fixing, as evidenced by benchmarks like SWE-bench. However, in the real world, the development of mature software is ty…

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and acce… [+257 chars]