Posts
All the articles I've posted.
-
The Research Behind the HLE Score: A Year of AI Behavior Research
The methodology behind the agent, the failure modes it catches, the products that came out of the same research moat, and where the program goes next.
-
HLE Submission Methodology Paper — FF-STACK v8
Full methodology paper for the FF-STACK v8 HLE submission: architecture, filtering policy, calibration, cost, and disclosure.
-
51.85% on Humanity's Last Exam: How a Solo Researcher Built a Multi-Agent HLE Submission
1,119 out of 2,158 on canonical HLE. Single workstation, no GPU cluster, no fine-tuning. The architecture, the numbers, and what's next.