blog

thoughts on AI safety, building things, and whatever I'm thinking about.

On Human-Agent Collaboration: Métis, Centaur Chess, and the Milk ProblemJun 3, 2026 Notes on OpenAI Hackathon: Conway's Law and Building with StrangersMay 30, 2026 Notes from Capital Factory: The Forest, the Grid, and What Demoing Teaches YouMay 4, 2026 On Health Literacy and Design: Raw Eggs, Foucault's Clinic, and the Blue CardMay 1, 2026 On Causal Abstraction: Aunt Hillary, Borges' Map, and Implementing an AlgorithmApr 14, 2026 On Three Kinds of Alignment Work: The Polygraph, the Stag Hunt, and the Hidden MoleApr 11, 2026 On Scalable Oversight: Meno's Paradox and Weak-to-Strong GeneralizationApr 4, 2026 On Goal Misgeneralization: Kripkenstein's Quus and the CoinRun IllusionMar 28, 2026 On Reward Hacking: The Cobra Effect, CoastRunners, and the Math of nanoRLHFMar 22, 2026 Notes from SXSW 2026: Ender at the Terminal and the Runaway BroomMar 19, 2026 On Attribution Graphs: Slime Molds, Heptapod B, and the Basal GangliaFeb 28, 2026 On Alignment Faking: Ketman, the Panopticon, and Why Claude Is Faking ItFeb 21, 2026 On Emergent Misalignment: Ice-Nine, Holographic Personas, and the MathFeb 14, 2026 On AI Control: The Garden of Forking Paths, Time Cops, and Trapping a Scheming AIFeb 7, 2026 On DeepMind's AGI Safety Paper: The Golem, the Maginot Line, and Defense-in-DepthJan 31, 2026 Hello World: On AI's Adolescence, Bootleggers, and the Alien in the DatacenterJan 28, 2026