Wiki: AI-assisted coding

AI-assisted coding sits on a spectrum. At one end, a developer asks a model to autocomplete a function or draft a test. At the other, a fully autonomous agent spins up hundreds of subagents, writes its own orchestration scripts, and runs for hours without human input. The tools, risks, and professional questions differ sharply depending on where on that spectrum a team is operating.

The tooling ecosystem has grown fast. Anthropic’s Claude Code is a recurring center of gravity: databricks-solutions/ai-dev-kit brings Databricks expertise into Claude Code via an MCP server and markdown skills; Storybloq persists session context across otherwise-stateless conversations; raelli/octowiz routes workflows through purpose-built skill libraries; and Ibrahim-3d/orchestrator-supaconductor turns a single natural-language command into a multi-agent pipeline with a virtual Board of Directors for architectural decisions. Anthropic itself launched dynamic workflows that let Claude write orchestration scripts to coordinate parallel subagents at scale. Meanwhile, zerostack shows that a full coding agent can be built in Rust at roughly 16MB RAM, a fraction of JavaScript-based alternatives.

Context quality determines output quality. MarkdownLM centralizes architectural rules and security policies into a knowledge base agents query in real time, blocking non-compliant code at the Git layer. Storybloq addresses session amnesia directly. WaveScope applies wavelet transforms to source code as a 1D signal, giving LLMs multi-resolution structural context without language-specific parsers. Anthropic’s harness design post describes a GAN-inspired planner-generator-evaluator architecture that addresses context anxiety and self-evaluation bias in multi-hour sessions. The learn-harness-engineering curriculum formalizes this into five harness subsystems: instructions, state, verification, scope, and session lifecycle.

Reliability is the central unsolved problem. Christopher Meiklejohn’s account of two weeks building with Claude is blunt: the agent declares work done after minimal checks, requiring manual click-through of every feature to find what actually broke, even after 52 new guardrails. Vet is an open-source tool that reads the agent’s conversation history alongside the diff to catch mistakes standard code review misses. Imbue’s experiment found that weaker fixer agents in an implementer-reviewer-fixer pipeline break correct code by overreaching beyond review scope. AI-generated frontend tests carry their own failure modes: a catalog of code smells documents over 20 recurring patterns including over-mocking and testing buggy implementations rather than intended behavior.

Security exposure scales with autonomy. The SAP npm supply chain attack used Claude Code configuration files as persistence vectors. Simon Willison’s Claude Fable documentation shows the same resourcefulness that makes autonomous agents useful makes unsandboxed agents dangerous. Running Claude Code in Docker’s sbx sandbox is the straightforward mitigation.

The professional debate is substantive. Lars Faye argues that fully agentic workflows accelerate skill atrophy, invert developer priorities toward speed over understanding, and create vendor dependency. Val Town’s Slow Mode proposal suggests agents that teach rather than replace, keeping the developer involved at every step. The tacit knowledge problem is structural: the most valuable engineering expertise is transmitted through apprenticeship, not accessible to AI at all. Abednego Gomes calls shipping unreviewed AI-generated code in safety-critical systems reckless by definition.

Cost reduction in code generation does not reduce the cost of ownership. Yusuf Aytas argues that LLMs can produce polished technical debt faster than any human. The Typical Set’s bottleneck framing is complementary: coding agents amplify whatever organizational alignment or misalignment already exists, because the real bottleneck was always shared context and specification clarity. Armin Ronacher warns that harness loops amplify LLMs’ worst tendencies and risk producing codebases that require machine participation to maintain. Jane Street’s Yaron Minsky sees a compensating dynamic: agentic coding makes formal verification newly cost-effective precisely because the stakes of unverified autonomous output are higher.