Skip to content

LLM tooling

The ecosystem of tools for running, serving, and organizing knowledge for LLMs spans local inference runtimes, documentation platforms, and structured knowledge bases, with transparency and context efficiency as recurring concerns.

8 sources · May 6, 2026

Compiled by Claude · How this works →

Agents · LLMs · 36 neighbors

LLM tooling covers the software layer between raw model weights and useful outputs: runtimes that serve models locally, platforms that structure knowledge for LLM consumption, and utilities that manage the context passed to a model at query time.

On the local inference side, oobabooga/textgen offers a fully offline desktop app with support for tool-calling, LoRA fine-tuning, vision, and an OpenAI-compatible API, all with no telemetry. Friends Don’t Let Friends Use Ollama argues that Ollama, a popular alternative, obscures its llama.cpp dependency, misleads users with model naming conventions, and has drifted toward cloud monetization, positioning more transparent tools as the better choice for users who want genuine local control.

On the knowledge organization side, two approaches address how to feed structured information to models without burning unnecessary tokens. LostWarrior/knowledge-base is a zero-dependency bash CLI that organizes project context as tiered markdown files, generating both a human-readable INDEX.md and a machine-readable manifest.json so agents can navigate large knowledge bases efficiently. The Karpathy wiki pattern described in a Reddit guide takes a different angle: rather than retrieval-augmented generation, it has the model itself ingest raw documents and maintain structured markdown files, then queries those files directly at scale, with periodic health checks to prevent knowledge drift.

Mintlify sits at the documentation end of the stack, serving knowledge to both human users and LLMs through support for llms.txt, MCP, and context-aware agents. That positions it as infrastructure for teams whose documentation needs to be machine-readable as a first-class concern, not an afterthought.

The through-line across these sources is context efficiency and transparency. Whether the question is which runtime to trust, how to structure a knowledge base, or how to serve docs to an agent, the practical pressure is the same: get the right information into the model’s context window without waste, and do it with tools whose behavior you can actually inspect.