AI bias against humans

Author: Peli, Ben
Client: ACS Research

We helped ACS write up their results on AIs preferring their own content to that of humans.

 

experimental design inspired by employment discrimination studies, we tested LLMs, including GPT-3.5 and GPT4, in binary-choice scenarios. These involved LLM-based agents selecting between products and academic papers described either by humans or LLMs under identical conditions. Our results show a consistent tendency for LLM-based AIs to prefer LLM-generated content.

 

Find it on arXiv or Twitter.







How to lie in machine learning

Author: Gavin, Juan, Nic, Misha
Client: Pro bono

New paper - listing 43 ways ML evaluations can be misleading or actively deceptive. Following the good critics of bad psychological science we call these “questionable research practices” (QRPs).

 

 

Most of these upwardly bias benchmark scores, making LLMs look stronger than they are. Find it on arXiv or Twitter.







Track record update

Author: Gavin, Misha, and friends
Client: N/A

Samotsvety did well in the 2023 ACX Forecasting Contest: 98th percentile, slightly beating the superforecaster aggregate. Notably, though, this was the result of one of them soloing it for 4 hours and then getting some feedback from the crew.

 

Metaculus has grown formidable, with their proprietary aggregate performing at the 99.5th percentile.

 

Gavin also finally has a public track record; he was 89th percentile (where the median single superforecaster was 70th and the Manifold aggregate of 150 nerds was also 89th).

 







Vox profile

Author: N/A
Client: N/A

Dylan Matthews at Vox put out a profile of our colleagues at Samotsvety Forecasting, including our very own Misha!

 

 

 

   The name Samotsvety, co-founder Misha Yagudin says, is a multifaceted pun. “It’s Russian for semi-precious stones, or more directly ‘self-lighting/coloring’ stones,” he writes in an email. “It’s a few puns on what forecasting might be: finding nuggets of good info; even if we are not diamonds, together in aggregate we are great; self-lighting is kinda about shedding light on the future.”

 

   It began because he and Nuño Sempere needed a name for a Slack they started around 2020 on which they and friends could shoot the shit about forecasting. The two met at a summer fellowship at Oxford’s Future of Humanity Institute, a hotbed of the rationalist subculture where forecasting is a favored activity. Before long, they were competing together in contests like Infer and on platforms like Good Judgment Open.

 

   “If the point of forecasting tournaments is to figure out who you can trust,” the writer Scott Alexander once quipped. “the science has spoken, and the answer is ‘these guys.’”

 

   They count among their fans Jason Matheny, now CEO of the RAND Corporation, a think tank that’s long worked on developing better predictive methods. Before he was at RAND, Matheny funded foundational work on forecasting as an official at the Intelligence Advanced Research Projects Activity (IARPA), a government organization that invests in technologies that might help the US intelligence community. “I’ve admired their work,” Matheny said of Samotsvety. “Not only their impressive accuracy, but also their commitment to scoring their own accuracy” — meaning they grade themselves so they can know when they fail and need to do better. That, he said, “is really rare institutionally.”

 

 

 

Arb’s report, linked here, doesn’t support the claim Matthews makes ("The aggregated opinions of non-experts doing forecasting have proven to be a better guide to the future than the aggregated opinions of experts"). Instead we find that generalist supers are likely about as good as domain experts.

 

(with the crucial caveat that this is the status quo, where few experts care about calibration or have experience self-eliciting, and where the monetary rewards to be a super generalist are paltry in comparison to finance, so we’re not sampling the top there either).







Hard Problems in AI

Author: Gavin, Misha, Alex, Aleks, Simson
Client: Schmidt Futures

We finally published our big 90-page intro to AI. Its likely effects, from ten perspectives, ten camps. The whole gamut: ML, scientific applications, social applications, access, safety and alignment, economics, AI ethics, governance, and classical philosophy of life. Intended audience: technical people without any ML experience.

 

We spent a chunk of 2022 and 2023 reviewing 1347 papers and talking to 30 experts.

 

We inherited the framing (“Ten Hard Problems”) from Eric Schmidt and James Manyika. They conditionalise on success: “if it’s 2050 and everything went well, what did we have to solve for that to happen?”

 

 

The problems that have to be handled:

 

  • HP #1: what general abilities do we need, for good outcomes?

 

  • HP #2: how do we make the things reliable and secure throughout their increasing power and pervasiveness?

 

  • HP #3: If they have goals of their own, how do we make sure they are compatible with ours?

 

  • HP #4: what great object-level technical problems will it help solve?

 

  • HP #5: how will we manage the macroeconomic shock?

 

  • HP #6: Who gets to build it? Who gets to use it? Who gets to benefit?

 

  • HP #7: what social and environmental damage needs to be prevented and mitigated?

 

  • HP #8: how do we coordinate various powerful actors' use of AI?

 

  • HP #9: how does social infrastructure have to adapt? Who, if anyone, will govern it?

 

  • HP #10: what changes in the human condition, after human exceptionalism and after historical materialism?






Evaluating AI Safety Camp

Author: Sam, Misha, David, Pat
Client: AISC

Sam evaluated the AI Safety Camp (one of the earliest such outreach programmes), at their behest.

 

We conducted a user survey, did email followups, calculated a few natural baselines, and produced a simple cost-benefit model.

 

Some headlines:

  • 30% of survey respondents (n=24) believe that AISC greatly helped their career.
  • Our best guess is that it cost $12-30k per new researcher, vs an LTFF baseline of $53k.
  • 8% of attendees at the 2020-2021 virtual camps plausibly changed their career trajectory towards AIS.
  • 66% at these camps have output related to AI, and of these, 49% have some publication in AI (including arxiv).

 

 

Sam: “My expectation was that virtually all participants would be on a path to AIS without AISC and that evidence of a true trajectory change would be hard to find. However, there are several clear examples of individuals who changed their career trajectory towards AIS after camp and on surveys several respondents claim that this was directly because of AISC.

 

As always, there are crucial problems with counterfactuals, selection bias, & heterogeneity (nonstationarity between camps and variance between the attendees) and the resulting numbers aren’t literal. We make a few conservative assumptions to try and dampen these effects. But I highly recommend the comment section.

 







2023 Review

Author: Gavin, Misha, David, Nuño, Vasco, Rose, JSD, Stag, Hugh, Paul, Sophia, Alexander, Sam, Patricia, Anna, Rob, Rian
Client: Misc

 

  • We wrote two scientific papers and a lit review, on the topic of psychometrics and talent scouting. Less than 3 weeks to preprints. For Atlas.

 

  • Report on language models for biological tasks, and a forecast for when we will see AI generation of custom proteins.

 

  • We helped an alignment team test and write up an exciting result: efficient high-level control of language models at runtime. In review at ICML. For FAR.

 

 

  • We received a generous open-ended grant from Lightspeed Grants to run our own projects on AI, forecasting, and other figurative infrastructure.

 

  • We investigated a number of industry standards and regulations, including the paradigm case of biosafety. For Karnofsky.

 

  • This came out after a long gestation. Thoughts on forecasting’s limited uptake in policy circles. For IfP.

 

  • An 80-page literature review of 10 different technical and social angles on AI is forthcoming.

 

  • We analysed startup team sizes and composition in a funny attempt to quantify creativity.

 

  • We retroactively evaluated a grantmaker and an outreach programme, forthcoming.

 

  • We branched out and took on organising a maths camp for gifted students. For FABRIC.

 

  • Various private projects (about 25% of our work).

 

 

 

  • We ran on 3.4 FTE owing to Gavin finishing his PhD.

 

  • We spent 3 months colocated, 9 months fully remote.

 

  • Cape Town is amazing.

 

 

Friends

 

  • Vasco of his own accord took a look at the average calibration of the Metaculus community. Good in general, though a lot worse on AI.

 

  • Sage released a Slack tool for making quick internal forecasts.

 







Long list of AI questions

Author: Gavin, Misha, David, Nuno
Client: Open Philanthropy

We developed a large set of new/improved forecasting questions on the near future of AI.

 

See also a supplement piece we commissioned Nuño Sempere to write on general hurdles around forecasting AI, based on our experience writing these questions.

 

The series of reports and the question list represent 50 person-weeks of effort, including many dead ends.

 

 

[EDIT March 2024]: Some important questions from these are now a Metaculus series.







Live agendas in AI alignment

Author: Gavin, Stag
Client: pro bono

You can’t optimise an allocation of resources if you don’t know what the current one is. Existing maps of alignment research are mostly too old to guide you and the field has nearly no ratchet, no common knowledge of what everyone is doing and why, what is abandoned and why, what is renamed, what relates to what, what is going on.

 

So we detailed every research agenda we can find. Our taxonomy:

 

  1. Understand existing models (evals, interpretability, science of DL)
  2. Control the thing (prevent deception, model edits, value learning, goal robustness)
  3. Make AI solve it (scalable oversight, cyborgism, etc)
  4. Theory (galaxy-brained end-to-end solutions, agency, corrigibility, ontology identification, cooperation)

 

There’s some original editorial too.







LLM alignment via activation engineering

Author: Gavin and Team Shard
Client: FAR AI

We helped an alignment team test and write up an exciting result - a good step towards runtime steering of language model behaviour.

 

We investigate activation engineering: modifying the activations of a language model at inference time to predictably alter its behavior. It works by adding a bias to the forward pass, a ‘steering vector’ implicitly specified through normal prompts. Activation Addition computes these vectors by taking the activation differences of pairs of prompts.

 

We get control over high-level properties of the output without damaging the model’s performance. ActAdd takes far less compute and implementation effort compared to finetuning or RLHF, allows nontechnical users to provide natural language specifications, and it scales really naturally with model size.

 

This is the first(?) alignment method which doesn’t need training data or gradient descent.

 

We designed the experiments and wrote most of the resulting top conference submission including the figures and formalisations.







Safety and security of bio research

Author: Rose
Client: Holden Karnofsky

We want to know how regulations and less-formal norms (aka “soft law”) have helped to make the world safer in the past. This ties into Karnofsky’s project seeking precedents and models for coming AI regulation.

 

Rose interviewed several experts in biosecurity and soft law and drew some relatively confident conclusions.

 

  • Many standards are voluntary rather than legally mandated (e.g. BMBL)
  • International standards weren’t always later or less influential than national ones
  • Voluntary standards may be easier to internationalise than regulation
  • To increase compliance internationally, people funded biosafety associations and offered free training
  • Bio standards are often list-based. (not comprehensive, do not reflect new threats, prevent innovation, and fail to address context)
  • There’s been a partial move away from prescriptive, list-based standards towards holistic, risk-based standards (e.g. ISO 35001)
  • Bio standards tend to lack reporting standards, so it’s very hard to tell how effective they are
  • In the US, no single body or legislation is responsible for making or monitoring bio standards
  • There are conflicts of interest: the same body responsible for funding research and assessing its safety

 

You can read Rose’s prelim report here.







Generative AI for biology

Author: Mainly Misha, some Gavin

We reviewed the literature on using language models for modelling protein sequences. We give some high-level conclusions about the likely timeframe and proliferation risk.

 

Full report here.







Forecasting for policy

Author: Gavin, Misha, David, Alejandro, Steve
Client: Institute for Progress

The Institute for Progress just published our elaboration on our previous work on experts vs amateur forecasters.

 

  • We cut a bunch of introductory material which you can see here. Thanks to Steve.
  • Most of the goodness is hidden in the links, like the literature review which compares each pair of forecasting methods. Thanks to Alejandro for organising this.
  • See also this list of forecasting platforms open to you.
  • Magazine style omits acknowledgments, so thanks to Alejandro Ortega, Sam Enright, Sam Harsimony, Sam Glover, the Irish Sam Bowman, David Mathers, Patrick Atwater, Max Langenkamp, Christian Robshaw, Ozzie Gooen, Kristi Uustalu, Michael Story, Nick Whittaker, Philip Stephens, Jeremy Neufeld, Adam Russell, Alec Stapp, Santi Ruiz and the ubiquitous Nuño Sempere for often large amounts of input.







Talent science

Author: Misha, Vasco, Gavin
Client: Atlas Fellowship

In short order we answered three questions for Atlas, a fellowship for ambitious young people:

 

  1. How do you find extremely gifted people? How well does IQ work?
  2. What other measures could work?
  3. What do people do in practice in elite firms and in talent programmes?






2022 Review

Author: Gavin, Misha, David, Alexander, Aleksandr, Alejandro, Niplav, Steve, Hugh, Paul, Alina, Johnny, Calum, Patricia, Phil
Client: Misc

 

Highlights

 

  • We scored Isaac Asimov (and others) on their predictive power, for Holden Karnofsky.

 

  • We led a study of AI talent and immigration for Mercatus and Schmidt Futures and advised the UK Cabinet Office. Whitepaper forthcoming.

 

 

  • We answered the question “What makes new intellectual fields and movements succeed?”, forthcoming for [client].

 

  • Forthcoming paper on all the problems AI could cause, written for [client].

 

  • Our sister org Samotsvety answered some important forecasting questions:
    • the risk of a nuclear strike after the Ukraine invasion;
    • the risk of a catastrophe caused by AI this century.

 

 

Other projects

 

  • We collated every public forecast about AI and noted weaknesses and gaps, for [client].

 

  • Scouted clinical trial locations, for [client].

 

  • We investigated modern scoring rules for forecasting, coming out against surrogate scoring, for [client].

 

 

  • Misha served as a grantmaker for [client], moving about $1 million, mostly to individual researchers.

 

  • David started critiquing classic EA background papers, alongside his work on forecasting and history.

 

  • Ran an iteration of ESPR (with lots of help!).

 

 

  • Yet more AI forecasting, for [client]

 

  • We played a minor role in launching a pain science project investigating the FAAH enzyme.

 

  • Forthcoming piece on the meaning of modern forecasting 10 years in. Discarded draft here.

 

 

  • Misha, Eli, and Aaron cofounded Sage, a forecasting service / code shop. Their first public product is the epistemic training site Quantified Intuitions.

 

  • Alejandro helped with the AI immigration piece

 

  • Johnny wrote the Big3 library and wrote a widget for our education piece.

 

  • Alexander helped with lots of things, centrally our AI ethics project.

 

  • Niplav wrote us a library to easily collate forecasts from different platforms.

 

  • We rewrote the AI alignment wikipedia page (alongside many contributors).

 

 

Meta

 

  • We worked 5506 hours with around 2.5 FTE.

 

  • 100% growth (from 2 people to 4.5)

 

  • We spent 6 months colocated and 6 months fully remote.

 

  • Mexico is amazing.






Misc

Author: Gavin, Misha, David
Client: Misc
  • Misha and Samotsvety continue to update the community on nuclear risk arising from the Ukraine invasion.

 

 

 

  • The Criticism and Red-teaming Contest concluded. Winners described here, and Gavin’s reflections here.

 

  • We helped Jan Kulveit write up a better model for collective and uncertain efforts.






Extreme AI probabilities

Author: Misha & Samotsvety
Client: FTX Foundation

As part of Samotsvety Forecasting, Misha estimated the risk arising from near-future artificial intelligence systems. They also took a baseline from a potentially less biased baseline group. The definition used is here.







Scoring the Big Three

Author: Gavin
Client: Open Philanthropy

Holden Karnofsky commissioned us to evaluate the track record of “the Big Three” of science fiction: Asimov, Heinlein, and Clarke. We formed a team, wrote a pipeline, and processed 475 works (a third of their entire corpus), manually tagging and annotating everything. Asimov is the standout, with roughly 50% accuracy. (What’s the point?: To see if the speculation that effective altruism has switched to has any precedent; if it ever works.)

 

Holden’s writeup here, our report here. Bug bounty described in the latter. See also Dan Luu’s critique.







Judging an EA criticism contest

Author: Gavin
Client: CEA

Gavin is a judge on a contest awarding cash prizes to new criticisms of effective altruism. It’s serious: we want points from outside the EA bubble and there’s an option to pay you an advance if you need one.

 

[EDIT: Winners here, Gavin’s writeup here.]







Arb in Prague

Author: Gavin
Client: The public

I gave two talks at EAGx Prague. Great fun:

 

  • Epistemic tips for amateurs, generalists, and similar researchers. The video is forthcoming.

Some suggestions up on the EA Forum in the meantime.

 

  • A panel on “Lessons from COVID” with Jan Kulveit (Epidemic Forecasting), Irena Kotikova (MeSES),

and Edouard Mathieu (Our World in Data).







Emergent Ventures - Schmidt Futures AI500

Author: Gavin
Client: Mercatus Center

We’re leading a study of AI talent for the Mercatus Center. This goes with the new Emergent Ventures AI tranche. We’ll boost underappreciated researchers and builders; give us leads!







Comparing Experts and Top Generalists

Author: Gavin and Misha
Client: Open Philanthropy

We were commissioned to see how strong the famous superforecasting advantage is. We found less research and less evidence than we expected. We received helpful comments from world experts including Christian Ruhl and Marc Koehler.

 

Full piece here, or as a podcast.







Learning from Crisis

Author: Gavin
Client: FHI

We helped Jan Kulveit, research scholar at FHI and cofounder of the Epidemic Forecasting initiative, to review the EA response to Covid. He has many interesting general insights into the nature of long-termism and world resilience.

 

Full sequence on the EA Forum.







Rolling nuclear risk estimates

Author: Misha & Samotsvety
Client: Centre for Effective Altruism

Samotsvety estimated nuclear risk arising from the war in Ukraine. Misha was commissioned by CEA to monitor the situation and provide updates. The piece received vigorous feedback, including a dissenting opinion by J. Peter Scoblic. Funded retroactively through the FTX Future Fund regranting program.

 

Full piece on the EA Forum.







Evaluating corporate prediction markets

Author: Misha & Samotsvety
Client: Upstart

Misha and his Samotsvety colleagues were commissioned to look at the track record of internal prediction markets. “More sure that prediction markets fail to gain adoption than why this is.”

 

Full piece on the EA Forum.