Skip to content

Labs

Numbers behind the AI pipeline that compiles /wiki . Each cell asks one specific question — does it cost too much, are its citations honest, do its topic tags stay stable — and answers with a chart and a paragraph in plain English.

3 cells · May 15, 2026

Read top-down: the question being tested, the headline number, the chart, then a short paragraph on what the number means and what to watch for. The technical receipts (endpoint names, code, version notes) sit at the bottom of each cell for anyone who wants to verify the prose.

Cells marked first-run pending have the question written and the methodology in place — they're waiting on data, not on a decision.

3 live · 0 pending

  • Topic stability
    live last run · 2026-05-15

    When the AI tags a saved article with topics, do those tags stay stable over time? The chart compares tagging the same article with and without a list of existing tags as an anchor.

    • eval
    • anthropic
    • reading
    Tag recovery 42.2%
  • Citation faithfulness
    live last run · 2026-05-07

    Does the wiki say what its sources actually say? Two AI judges weigh in on every citation, and the headline number is how often they agree.

    • eval
    • anthropic
    • wiki
    Judge agreement TBD
  • Ingest pipeline cost
    live last run · 2026-05-04

    How much does the AI behind this site cost to run? Total spend so far, broken down by what it was spent on.

    • cost
    • infra
    • anthropic
    Spent so far $2.34

wiki →