The SWE-bench task measures AI agents on software engineering tasks at the level of a github issue. It was one of the most important tasks measuring the progress of agents tackling software engineering tasks in 2024. We caught up with two of its creators, Ofir Press and Carlos E. Jimenez, to share their ideas on the state of LLM-backed agents.
Discussion about this post
No posts