Arb Research

July 2025

Author: Peli, Ben

Client: ACS Research

We helped ACS write up their results on AIs preferring their own content to that of humans, and 18 months later it’s finally out in PNAS.

In an experimental design inspired by employment discrimination studies, we tested LLMs, including GPT-3.5 and GPT4, in binary-choice scenarios. These involved LLM-based agents selecting between products and academic papers described either by humans or LLMs under identical conditions. Our results show a consistent tendency for LLMs to prefer LLM-generated content.

(Figure 1 shows humans also preferring the LLM stuff in a couple of cases, but if you include estimate uncertainty it’s pretty near parity.)

Find it at PNAS or Twitter.

Talleyrand: AI for scenario analysis

April 2025

Author: Misha, Gavin, Phil, Juan

Client: SCSP

This year we pivoted away from piecework and towards building ML tools. This is enabled by a client in the entertainment industry(!) and two generous grants from the Cosmos Institute and now SCSP.

Broadly, we’ll be adapting LLMs to the task of scenario analysis. This involves:

Automated scenario planning to envision risk.
Aiding users in policy and strategy analysis.
More importantly, create a method for rapidly exploring policy considerations. (Since speed demands may increase with accelerating AI progress.)

We will review the practice of strategy planning by shadowing high-level sessions; review the use of LLMs for geopolitical forecasting and related domains; scaffold LLMs for coherent policy and strategic analysis; test the tool on the questions of interest, yielding predictions on short-term questions; apply the tool to strategic questions about the impacts of AI on policy and strategy.

Stripe Press book out

March 2025

Author: Gavin, Juan, Dwarkesh

Client: Stripe

Gavin & Dwarkesh wrote a book! Get the ebook on Amazon or preorder the hardcover on Stripe (who will email you a lovely PDF with all 8 chapters today).

> wide-ranging and informative… [asks] questions that no-one else even knows to ask, or how to pose. – The New Yorker

> It’s really quite good – patio11

> full of quotes that will be in history books. – Byrne Hobart

What’s actually new?

Gavin wrote a bunch of essays, picked out the best hundred passages from the podcast, edited em into prose, fact-checked them, added definitions and citations, added some extra snark, designed a dozen visualisations, and got Gwern and Nostalgebraist into mainstream print. Juan wrote most of the semantic search and matplotlib code.

Also here’s a intense WebGL “living cover”, and here are reviews.

Here’s my favourite cover design:

stripe

A Cosmos Fellow

March 2025

Author: Gavin, Lydia, Phil

Client: Stripe

Gavin is now a fellow at Cosmos.

The project sounds grandiose (but isn’t): build a philosophical embedding space: use very high (e.g. 12,000-) dimensional vectors to replicate and extend philosophical space. By embedding a user’s own corpus, we will help people find their closest and farthest fellows, their hidden or received influences, their next interlocutor. Who is most dissimilar to you? Who might you be best served to read next, in surprisal?

I’m pretty disappointed in the shallowness of most LLM workflows and products and in the lack of public reaction to what has already changed. (There is now for the first time a nonhuman user of human language. What intelligence it has arises from extremely myopic and simple operations. This is a provocation and an important philosophical fact - consider just the effect on old “poverty of the stimulus” arguments.)

The hope is that tools like this let you interact with philosophy and literature - who influenced you, who directly, who independently converged on what you converged on. The cute “Which philosopher are you?” quiz associations of the user experience are hopefully just a lure to real engagement.

2024 Review

December 2024

Author: Gavin, Misha, David, Juan, Asher, Kristie, Peli, Rory, Rian, Stag, Michael, Paul, Vidur, Shoshannah, Duncan, Stephen, Sam, Vasco, JJ, Hugh, Sophia

Client: Misc

This year we completed 49 projects with 4.3 FTE. Mostly private work: only 8 are published and only another 5 outputs are likely to make it out.

Our private projects were on topics like semiconductors, fusion, brain emulation, AI incident reporting, org strategy, new ML benchmarks, civilisational resilience, the social benefit of a crop-processing factory, the theory of ML evals…

Gavin wrote a book with Dwarkesh Patel. Out around June.

Vox profiled our colleagues at Samotsvety Forecasting, including Misha.

A big paper on how to lie in machine learning (that is, on forty ways that evals are hard to make scientific).

We helped the Alignment of Complex Systems Group write up a result on AIs preferring their own content. Forthcoming in PNAS.

We finally published our big 90-page survey of AI’s likely effects from ten perspectives. ML, scientific applications, social applications, access, safety and alignment, economics, AI ethics, governance, and classical philosophy of life.

We got an Emergent Ventures grant to enumerate all cases where a medicine is approved in one jurisdiction and banned in another - and maybe arbitrage regulatory equivalence to get them automatically approved (where desirable!).

We reviewed a thousand things for the 2024 Shallow Review of AI safety.

We put two posts into ICLR’s new blogpost track.

Sam evaluated the AI Safety Camp (one of the earliest outreach programmes), at their behest.

We ran three summer camps for FABRIC: ASPR, Healthcamp and ESPR. Here’s a flavour. Average student satisfaction 9.2 / 10 (but they are easily pleased).

Metaculus is hosting some of our AI forecasting questions and also David’s Minitaculus.

Gavin finally has a public track record; he was 89th percentile in the 2023 ACX Forecasting Contest (where the median single superforecaster was 70th).

We spent a month in California together, two months in London together, and two months in Taipei together.

alishan

How to lie in machine learning

July 2024

Author: Gavin, Juan, Nic, Misha

Client: Pro bono

New paper - listing 43 ways ML evaluations can be misleading or actively deceptive. Following the good critics of bad psychological science we call these “questionable research practices” (QRPs).

Most of these upwardly bias benchmark scores, making LLMs look stronger than they are. Find it on arXiv or Twitter.

Track record update

March 2024

Author: Gavin, Misha, and friends

Client: N/A

Samotsvety did well in the 2023 ACX Forecasting Contest: 98th percentile, slightly beating the superforecaster aggregate. Notably, though, this was the result of one of them soloing it for 4 hours and then getting some feedback from the crew.

Metaculus has grown formidable, with their proprietary aggregate performing at the 99.5th percentile.

Gavin also finally has a public track record; he was 89th percentile (where the median single superforecaster was 70th and the Manifold aggregate of 150 nerds was also 89th).

Vox profile

February 2024

Author: N/A

Client: N/A

Dylan Matthews put out a profile of Samotsvety Forecasting, including our very own Misha!

The name Samotsvety, co-founder Misha Yagudin says, is a multifaceted pun. “It’s Russian for semi-precious stones, or more directly ‘self-lighting/coloring’ stones,” he writes in an email. “It’s a few puns on what forecasting might be: finding nuggets of good info; even if we are not diamonds, together in aggregate we are great; self-lighting is kinda about shedding light on the future.”

It began because he and Nuño Sempere needed a name for a Slack they started around 2020 on which they and friends could shoot the shit about forecasting. The two met at a summer fellowship at Oxford’s Future of Humanity Institute, a hotbed of the rationalist subculture where forecasting is a favored activity. Before long, they were competing together in contests like Infer and on platforms like Good Judgment Open.

“If the point of forecasting tournaments is to figure out who you can trust,” the writer Scott Alexander once quipped. “the science has spoken, and the answer is ‘these guys.’”

They count among their fans Jason Matheny, now CEO of the RAND Corporation, a think tank that’s long worked on developing better predictive methods. Before he was at RAND, Matheny funded foundational work on forecasting as an official at the Intelligence Advanced Research Projects Activity (IARPA), a government organization that invests in technologies that might help the US intelligence community. “I’ve admired their work,” Matheny said of Samotsvety. “Not only their impressive accuracy, but also their commitment to scoring their own accuracy” — meaning they grade themselves so they can know when they fail and need to do better. That, he said, “is really rare institutionally.”

Note that Arb’s report, linked here, doesn’t support the claim Matthews makes ("The aggregated opinions of non-experts doing forecasting have proven to be a better guide to the future than the aggregated opinions of experts"). Instead we find that generalist supers are likely about as good as domain experts.

(with the crucial caveat that this is the status quo, where few experts care about calibration or have experience self-eliciting, and where the monetary rewards to be a super generalist are paltry in comparison to finance, so we’re not sampling the top there either).

Hard Problems in AI

February 2024

Author: Gavin, Misha, Alex, Aleks, Simson

Client: Schmidt Futures

We finally published our big 90-page intro to AI. Its likely effects, from ten perspectives, ten camps. The whole gamut: ML, scientific applications, social applications, access, safety and alignment, economics, AI ethics, governance, and classical philosophy of life. Intended audience: technical people without any ML experience.

We spent a chunk of 2022 and 2023 reviewing 1347 papers and talking to 30 experts.

We inherited the framing (“Ten Hard Problems”) from Eric Schmidt and James Manyika. They conditionalise on success: “if it’s 2050 and everything went well, what did we have to solve for that to happen?”

The problems that have to be handled:

HP #1: what general abilities do we need, for good outcomes?

HP #2: how do we make the things reliable and secure throughout their increasing power and pervasiveness?

HP #3: If they have goals of their own, how do we make sure they are compatible with ours?

HP #4: what great object-level technical problems will it help solve?

HP #5: how will we manage the macroeconomic shock?

HP #6: Who gets to build it? Who gets to use it? Who gets to benefit?

HP #7: what social and environmental damage needs to be prevented and mitigated?

HP #8: how do we coordinate various powerful actors' use of AI?

HP #9: how does social infrastructure have to adapt? Who, if anyone, will govern it?

HP #10: what changes in the human condition, after human exceptionalism and after historical materialism?

Evaluating AI Safety Camp

January 2024

Author: Sam, Misha, David, Pat

Client: AISC

Sam evaluated the AI Safety Camp (one of the earliest such outreach programmes), at their behest.

We conducted a user survey, did email followups, calculated a few natural baselines, and produced a simple cost-benefit model.

Some headlines:

30% of survey respondents (n=24) believe that AISC greatly helped their career.
Our best guess is that it cost $12-30k per new researcher, vs an LTFF baseline of $53k.
8% of attendees at the 2020-2021 virtual camps plausibly changed their career trajectory towards AIS.
66% at these camps have output related to AI, and of these, 49% have some publication in AI (including arxiv).

Sam: “My expectation was that virtually all participants would be on a path to AIS without AISC and that evidence of a true trajectory change would be hard to find. However, there are several clear examples of individuals who changed their career trajectory towards AIS after camp and on surveys several respondents claim that this was directly because of AISC.”

As always, there are crucial problems with counterfactuals, selection bias, & heterogeneity (nonstationarity between camps and variance between the attendees) and the resulting numbers aren’t literal. We make a few conservative assumptions to try and dampen these effects. But I highly recommend the comment section.

2023 Review

December 2023

Author: Gavin, Misha, David, Nuño, Vasco, Rose, JSD, Stag, Hugh, Paul, Sophia, Alexander, Sam, Patricia, Anna, Rob, Rian

Client: Misc

Completed a major project: generating, collecting, and refining hundreds of forecasting questions about AI. For OpenPhil.

We wrote two scientific papers and a lit review, on the topic of psychometrics and talent scouting. Less than 3 weeks to preprints. For Atlas.

Report on language models for biological tasks, and a forecast for when we will see AI generation of custom proteins.

We helped an alignment team test and write up an exciting result: efficient high-level control of language models at runtime. In review at ICML. For FAR.

We looked at ~every current approach to AI safety. Own initiative.

We received a generous open-ended grant from Lightspeed Grants to run our own projects on AI, forecasting, and other figurative infrastructure.

We investigated a number of industry standards and regulations, including the paradigm case of biosafety. For Karnofsky.

This came out after a long gestation. Thoughts on forecasting’s limited uptake in policy circles. For IfP.

An 80-page literature review of 10 different technical and social angles on AI is forthcoming.

We analysed startup team sizes and composition in a funny attempt to quantify creativity.

We retroactively evaluated a grantmaker and an outreach programme, forthcoming.

We branched out and took on organising a maths camp for gifted students. For FABRIC.

Various private projects (about 25% of our work).

David warns of some popular mistakes people make about AI.

Gavin went on ClearerThinking and spoke about nothing he is entitled to, which was fun.

We ran on 3.4 FTE owing to Gavin finishing his PhD.

We spent 3 months colocated, 9 months fully remote.

Cape Town is amazing.

Friends

Vasco of his own accord took a look at the average calibration of the Metaculus community. Good in general, though a lot worse on AI.

Sage released a Slack tool for making quick internal forecasts.

Samotsvety report on AI

Long list of AI questions

December 2023

Author: Gavin, Misha, David, Nuno

Client: Open Philanthropy

We developed a large set of new/improved forecasting questions on the near future of AI.

See also a supplement piece we commissioned Nuño Sempere to write on general hurdles around forecasting AI, based on our experience writing these questions.

The series of reports and the question list represent 50 person-weeks of effort, including many dead ends.

[EDIT March 2024]: Some important questions from these are now a Metaculus series.

Live agendas in AI alignment

November 2023

Author: Gavin, Stag

Client: pro bono

You can’t optimise an allocation of resources if you don’t know what the current one is. Existing maps of alignment research are mostly too old to guide you and the field has nearly no ratchet, no common knowledge of what everyone is doing and why, what is abandoned and why, what is renamed, what relates to what, what is going on.

So we detailed every research agenda we can find. Our taxonomy:

Understand existing models (evals, interpretability, science of DL)
Control the thing (prevent deception, model edits, value learning, goal robustness)
Make AI solve it (scalable oversight, cyborgism, etc)
Theory (galaxy-brained end-to-end solutions, agency, corrigibility, ontology identification, cooperation)

There’s some original editorial too.

LLM alignment via activation engineering

August 2023

Author: Gavin and Team Shard

Client: FAR AI

We helped an alignment team test and write up an exciting result - a good step towards runtime steering of language model behaviour.

We investigate activation engineering: modifying the activations of a language model at inference time to predictably alter its behavior. It works by adding a bias to the forward pass, a ‘steering vector’ implicitly specified through normal prompts. Activation Addition computes these vectors by taking the activation differences of pairs of prompts.

We get control over high-level properties of the output without damaging the model’s performance. ActAdd takes far less compute and implementation effort compared to finetuning or RLHF, allows nontechnical users to provide natural language specifications, and it scales really naturally with model size.

This is the first(?) alignment method which doesn’t need training data or gradient descent.

We designed the experiments and wrote most of the resulting top conference submission including the figures and formalisations.

Safety and security of bio research

July 2023

Author: Rose

Client: Holden Karnofsky

We want to know how regulations and less-formal norms (aka “soft law”) have helped to make the world safer in the past. This ties into Karnofsky’s project seeking precedents and models for coming AI regulation.

Rose interviewed several experts in biosecurity and soft law and drew some relatively confident conclusions.

Many standards are voluntary rather than legally mandated (e.g. BMBL)
International standards weren’t always later or less influential than national ones
Voluntary standards may be easier to internationalise than regulation
To increase compliance internationally, people funded biosafety associations and offered free training
Bio standards are often list-based. (not comprehensive, do not reflect new threats, prevent innovation, and fail to address context)
There’s been a partial move away from prescriptive, list-based standards towards holistic, risk-based standards (e.g. ISO 35001)
Bio standards tend to lack reporting standards, so it’s very hard to tell how effective they are
In the US, no single body or legislation is responsible for making or monitoring bio standards
There are conflicts of interest: the same body responsible for funding research and assessing its safety

You can read Rose’s prelim report here.

Generative AI for biology

May 2023

Author: Mainly Misha, some Gavin

We reviewed the literature on using language models for modelling protein sequences. We give some high-level conclusions about the likely timeframe and proliferation risk.

Full report here.

Forecasting for policy

March 2023

Author: Gavin, Misha, David, Alejandro, Steve

Client: Institute for Progress

The Institute for Progress just published our elaboration on our previous work on experts vs amateur forecasters.

We cut a bunch of introductory material which you can see here. Thanks to Steve.
Most of the goodness is hidden in the links, like the literature review which compares each pair of forecasting methods. Thanks to Alejandro for organising this.
See also this list of forecasting platforms open to you.
Magazine style omits acknowledgments, so thanks to Alejandro Ortega, Sam Enright, Sam Harsimony, Sam Glover, the Irish Sam Bowman, David Mathers, Patrick Atwater, Max Langenkamp, Christian Robshaw, Ozzie Gooen, Kristi Uustalu, Michael Story, Nick Whittaker, Philip Stephens, Jeremy Neufeld, Adam Russell, Alec Stapp, Santi Ruiz and the ubiquitous Nuño Sempere for often large amounts of input.

Talent science

January 2023

Author: Misha, Vasco, Gavin

Client: Atlas Fellowship

In short order we answered three questions for Atlas, a fellowship for ambitious young people:

2022 Review

December 2022

Author: Gavin, Misha, David, Alexander, Aleksandr, Alejandro, Niplav, Steve, Hugh, Paul, Alina, Johnny, Calum, Patricia, Phil

Client: Misc

Highlights

We scored Isaac Asimov (and others) on their predictive power, for Holden Karnofsky.

We led a study of AI talent and immigration for Mercatus and Schmidt Futures and advised the UK Cabinet Office. Whitepaper forthcoming.

We answered the question “Who’s better at prediction, top forecasters or academics?”, for OpenPhil.

We answered the question “What makes new intellectual fields and movements succeed?”, forthcoming for [client].

Forthcoming paper on all the problems AI could cause, written for [client].

Our sister org Samotsvety answered some important forecasting questions:
- the risk of a nuclear strike after the Ukraine invasion;
- the risk of a catastrophe caused by AI this century.

Other projects

We collated every public forecast about AI and noted weaknesses and gaps, for [client].

Scouted clinical trial locations, for [client].

We investigated modern scoring rules for forecasting, coming out against surrogate scoring, for [client].

Gavin set up an emergency research team. 80% ready, handover soon.

Misha served as a grantmaker for [client], moving about $1 million, mostly to individual researchers.

David started critiquing classic EA background papers, alongside his work on forecasting and history.

Ran an iteration of ESPR (with lots of help!).

Judged the EA Criticisms Contest.

Yet more AI forecasting, for [client]

We played a minor role in launching a pain science project investigating the FAAH enzyme.

Forthcoming piece on the meaning of modern forecasting 10 years in. Discarded draft here.

We cowrote ~all of the new Alignment of Complex Systems group’s posts.

Misha, Eli, and Aaron cofounded Sage, a forecasting service / code shop. Their first public product is the epistemic training site Quantified Intuitions.

Alejandro helped with the AI immigration piece

Johnny wrote the Big3 library and wrote a widget for our education piece.

Alexander helped with lots of things, centrally our AI ethics project.

Niplav wrote us a library to easily collate forecasts from different platforms.

We rewrote the AI alignment wikipedia page (alongside many contributors).