sounds gucci
General Understanding of Code Change Insights
Benchmarking AI bug diagnosis on VS Code — a large, fast-moving codebase where agents must navigate active development history and cross-repo dependencies. The goal isn't to find the optimal harness + model combination, but to understand how well current models reason about real-world bugs using standard tool-use capabilities.
-
PRs Analyzed
-
Avg Score
-
Success Rate (4+)
-
Model
Score Distribution
Results by Category
Analysis Results
| PR | Issue | Title | Score | Model |
|---|---|---|---|---|
| Loading... | ||||