Jacob Steinhardt

I am an Assistant Professor of Statistics and EECS at UC Berkeley, where I’m also part of BAIR and CLIMB. I am also Founder & CEO of Transluce, a non-profit research lab building open, scalable technology for understanding frontier AI systems.

My research focuses on ensuring machine learning systems are understood by and aligned with humans. The basic problem is that ML models are complex systems that often produce unintended consequences. For instance, ML systems tend to exploit errors in the reward function, leading to unintended behavior that often gets worse as models get bigger. The problem compounds once ML systems interact with each other or with humans, which can lead to strategic incentives and other intrasystem goals.

To tackle this problem, one approach is to understand not just the outputs of neural networks but also their latent activations, which represent the computational process used to generate outputs. By understanding this process, we can hopefully modify it to be more aligned with human intent.

Another approach is to enable humans to better understand complex systems. We have built several systems that consume large datasets and summarize their properties in natural language. More generally, ML models could help humans with important but difficult tasks such as understanding the long-term consequences of an action, automatically discovering failures in an ML or computer system, or predicting future world events.

I seek students who are technically strong, broad-minded, and want to improve the world through their research. I particularly value creative thinkers and curious empiricists who are excited to chart new approaches to the field.

As a graduate student, I was very fortunate to be advised by Percy Liang. During my post-doc year, I worked at OpenAI and Open Philanthropy. I like ultimate frisbee, power lifting, and indoor bouldering.

Current Ph.D. students and post-docs

Ruiqi Zhong (co-advised with Dan Klein)
Meena Jagadeesan (co-advised with Mike Jordan)
Jean-Stanislas Denain
Erik Jones (co-advised with Anca Dragan)
Xinyan Hu (co-advised with Mike Jordan)
Alex Pan
Kayo Yin (co-advised with Dan Klein)
Jiahai Feng (co-advised with Stuart Russell)
Gabriel Mukobi (co-advised with Dawn Song)

I am also fortunate to collaborate with many students who I do not directly advise, as can be seen from my publications page.

Former PhD. students and post-docs

Frances Ding (co-advised with Moritz Hardt → Genentech)
Alex Wei (co-advised with Nika Haghtalab and Mike Jordan → OpenAI)
Collin Burns (co-advised with Dan Klein → OpenAI)
Dan Hendrycks (co-advised with Dawn Song → director of the Center for AI Safety)
Adam Sealfon (post-doc with Mike Jordan → Google Research)

Essays

For more recent writing, see my blog.

AI Alignment Research Overview (October 2019) link
Research as a Stochastic Decision Process (December 2018) link
Long-Term and Short-Term Challenges to Ensuring the Safety of AI Systems (June 2015) link
The Power of Noise (June 2014) link
A Fervent Defense of Frequentist Statistics (February 2014) link
Beyond Bayesians and Frequentists (October 2012) link