Labs

Numbers behind the AI pipeline that compiles /wiki. Each cell asks one specific question — does it cost too much, are its citations honest, do its topic tags stay stable — and answers with a chart and a paragraph in plain English.

4 cells · Jun 2, 2026

Read top-down: the question being tested, the headline number, the chart, then a short paragraph on what the number means and what to watch for. The technical receipts (endpoint names, code, version notes) sit at the bottom of each cell for anyone who wants to verify the prose.

Cells marked first-run pending have the question written and the methodology in place — they're waiting on data, not on a decision.

4 live · 0 pending

Citation faithfulness
livelast run · 2026-06-02
Does the wiki say what its sources say? Two AI judges grade every citation; the headline is how often they agree.
- eval
- anthropic
- wiki
Judge agreement89% · 167 claims
Wiki surfacer precision
livelast run · 2026-06-02
A hook auto-injects a wiki note on every prompt, spending context tokens you never asked it to. This measures how often it nags on off-topic prompts (the false-positive rate) and reads the firing threshold off the sweep.
- eval
- wiki
False-positive rate0%
Topic stability
livelast run · 2026-05-15
Do the AI's topic tags stay stable when it re-tags the same article? The chart compares re-tagging with and without a list of existing tags as an anchor.
- eval
- anthropic
- reading
Tag recovery42.2%
Ingest pipeline cost
livelast run · 2026-05-04
What does the AI behind this site cost to run? Total spend so far, split by what it went on.
- cost
- infra
- anthropic
Spent so far$5.71

wiki →