Posts

All the articles I've posted.

The Research Behind the HLE Score: A Year of AI Behavioral Research

13 May, 2026

The methodology behind the agent, the failure modes it catches, the products that came out of the same research moat, and where the program goes next.
HLE Submission Methodology Paper — FF-STACK v8

13 May, 2026

Full methodology paper for the FF-STACK v8 HLE submission: architecture, filtering policy, calibration, cost, and disclosure.
51.85% on Humanity's Last Exam: How a Solo Researcher Built a Multi-Agent HLE Submission

13 May, 2026

1,119 out of 2,158 on canonical HLE. Single workstation, no GPU cluster, no fine-tuning. The architecture, the numbers, and what's next.

The Research Behind the HLE Score: A Year of AI Behavioral Research