Reading / 2026-05/2026-05-14t190300-opus-47-low-vs-medium-vs-high-vs-xhigh-vs-max-the-reasoning
Opus 4.7 Low Vs Medium Vs High Vs Xhigh Vs Max: the Reasoning Curve on 29 Real Tasks
Benchmarking Claude Opus 4.7 across five reasoning-effort levels on 29 real GraphQL-go-tools tasks shows a non-monotonic curve: medium effort wins on pass rate, equivalence, and code-review quality, while high, xhigh, and max cost more without improving outcomes.
May 14, 2026 · tech · Stet
Topics
- benchmarks
- llm-engineering
- ai-assisted-coding
- llm-inference
- developer-productivity
Cited by
- AI-assisted coding
AI coding assistants accelerate development but introduce tradeoffs around skill atrophy, codebase design, verification, and security that shape how much value they actually deliver.
- Benchmarks
Benchmarks in multi-agent AI research measure coordination overhead, error propagation, and task performance, exposing how architectural choices translate into real costs across single- and multi-agent systems.
- Developer productivity
Developer productivity spans tooling choices, organizational alignment, and the human skills those tools depend on, with a growing body of sources questioning whether AI-assisted workflows deliver on their promise without eroding the judgment they require.
- LLM Engineering
The practical discipline of building, evaluating, and operating systems that use large language models, spanning knowledge architecture, agent control flow, inference optimization, and the human and organizational costs of getting it wrong.
- LLM inference
LLM inference spans the full stack from VRAM constraints and quantization choices on consumer hardware to latency optimization in production agent services, with tooling debates about transparency, local runtimes, and cost-efficient alternatives to large models.
Related
- Your agent loves MCP as much as you love GUIs topic
- Unsloth topic
- databricks-solutions/ai-dev-kit topic
- Scaling Managed Agents: Decoupling the brain from the hands topic
- Agentic Coding is a Trap topic
- Vision Language Models (Better, Faster, Stronger) topic
- How to build scalable web apps with OpenAI's Privacy Filter topic
- CanItRun — Can my GPU run this LLM? topic