2026-05-14t190300-opus-47-low-vs-medium-vs-high-vs-xhigh-vs-max-the-reasoning

Opus 4.7 Low vs Medium vs High vs Xhigh vs Max: the Reasoning Curve on 29 Real Tasks

A hands-on benchmark of Claude Opus 4.7 across five reasoning-effort levels on 29 real GraphQL-go-tools tasks finds a non-monotonic curve: medium effort wins on pass rate, equivalence, code-review, and cost-efficiency, while high, xhigh, and max spend more without improving quality.

May 14, 2026 · tech · stet.sh, stet.sh

Read at the source →

Topics

benchmarks
llm-engineering
ai-assisted-coding
llm-inference
developer-tooling

Cited by

AI-assisted coding
Using LLMs as coding collaborators spans a spectrum from inline suggestion to fully autonomous multi-agent pipelines, with active debate about reliability, skill atrophy, security exposure, and what human oversight must remain.
Benchmarks
Benchmarks measure model or system capability, but their results are only as meaningful as their design — a recurring problem across LLM, multi-agent, and vision tasks, where tests built for one context are routinely applied to contexts they cannot capture.
Developer tooling
Developer tooling spans the full surface area of software construction — version control, testing, shell ergonomics, AI coding assistants, and platform infrastructure — with a consistent theme: reducing friction without sacrificing correctness or security.
LLM engineering
LLM engineering spans the full stack of building with large language models: training, inference optimization, agent architecture, harness design, and the operational tradeoffs that determine whether model capability translates into reliable software.
LLM inference
LLM inference covers how language models generate tokens from a prompt — spanning hardware constraints, serving architecture, caching strategies, quantization, routing, and cost — and has become its own engineering discipline as scale and cost pressures intensify.

back to /reading

Reading / 2026-05/2026-05-14t190300-opus-47-low-vs-medium-vs-high-vs-xhigh-vs-max-the-reasoning

Opus 4.7 Low vs Medium vs High vs Xhigh vs Max: the Reasoning Curve on 29 Real Tasks

Topics

Cited by

Related