SWE-Bench authors reflect on the state of LLM agents at Neurips 2024

Jan 14, 2025

The SWE-bench task measures AI agents on software engineering tasks at the level of a github issue. It was one of the most important tasks measuring the progress of agents tackling software engineering tasks in 2024. We caught up with two of its creators, Ofir Press and Carlos E. Jimenez, to share their ideas on the state of LLM-backed agents.

Language Models & Co.

Discussion about this post